Citation
Investigating Real-time Reference Resolution in Situated Dialogue for Complex Problem Solving

Material Information

Title:
Investigating Real-time Reference Resolution in Situated Dialogue for Complex Problem Solving
Creator:
Li, Xiaolong
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (121 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Science
Computer and Information Science and Engineering
Committee Chair:
BOYER,KRISTY
Committee Co-Chair:
GILBERT,JUAN EUGENE
Committee Members:
DORR,BONNIE
WULFF,STEFANIE

Subjects

Subjects / Keywords:
dialoguesystem -- pos -- referenceresolution
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Computer Science thesis, Ph.D.

Notes

Abstract:
A situated dialogue is embedded in a situated environment, where domain-specific task completion is usually a central activity. In a situated dialogue, it is essential to correctly identify the objects that speakers refer to in the environment. This task is referred to as reference resolution. However, reference resolution is a challenging problem in situated dialogue, and in part because of this limitation, most state-of-the-art situated dialogue systems operate within highly constrained domains. This dissertation presents an implementation of a tutorial dialogue system for the domain of Java programming, with real-time reference resolution. The implemented dialogue system identifies and interprets referring expressions in user utterances in real time. The identified referents are used to improve the performance of natural language understanding. This dissertation also examines the impact of different reference resolution approaches on the performance of the implemented tutorial dialogue system. The implemented real-time reference resolution approach in this project has three phases. First, we apply an innovative approach that we developed for more accurate part-of-speech tagging in domain-specific dialogue. This approach does not require an annotated corpus for the target domain. Next, we use a Conditional Random Field to label the semantic structure of the referring expressions. Finally, the learned semantics are used together with contextual information to perform reference resolution in situated dialogue. Offline evaluation of the CRF-based reference resolution approach on an existing tutorial dialogue corpus for computer programming showed an accuracy of 61.6%, which is a dramatic improvement compared to 51.3% from an approach based on a manually defined lexicon Li and Boyer (2016). To evaluate the performance of the two reference resolution approaches, they were implemented in a tutorial dialogue system for Java programming. A human subjects study was conducted to assess the performance of the tutorial dialogue systems with different reference resolution approaches. In the study, 41 human participants were randomly assigned to use these two tutorial dialogue systems. Post-survey results were collected from study participants to evaluate system usability and user engagement. The reference resolution performed by the dialogue systems was automatically logged into a database for manual evaluation. After analyzing the collected data in the study, we did not find a significant difference on user satisfaction nor user engagement in the dialogue systems with different reference resolution approaches. The possible reasons are discussed in Chapter 9. This dissertation is one of the few works that attempts to implement a natural language dialogue system for such a complex domain like Java programming. It is also the only known work that compares different reference resolution approaches in a tutorial dialogue system. In the dialogue system research community, there is an increasing recognition that natural language dialogue systems need to work in more complex domains. Real-time reference resolution in situated dialogue is one of the important challenges to achieve such a goal. This dissertation research has made a step toward real-time reference resolution for a dialogue system operating in a complex domain. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2018.
Local:
Adviser: BOYER,KRISTY.
Local:
Co-adviser: GILBERT,JUAN EUGENE.
Statement of Responsibility:
by Xiaolong Li.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
LD1780 2018 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

INVESTIGATINGREAL-TIMEREFERENCERESOLUTIONINSITUATED DIALOGUEFORCOMPLEXPROBLEMSOLVING By XIAOLONGLI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2018

PAGE 2

c 2018XiaolongLi

PAGE 3

IdedicatethisdissertationtomyfatherFuhaiLi.Iwishhecouldseethis.

PAGE 4

ACKNOWLEDGMENTS IwouldliketoexpressmysincereappreciationtomyadvisorDr.KristyBoyerforher continuousguidance,supportandfriendshipthroughoutmyPh.Dstudy.Ialsowouldlike tothankmyLearnDialoguecolleaguesfortheirgeneroushelpandsupport.Specially,I wouldliketothankFernandoRodrguez,JenniferTsan,andLydiaPezzullofortheirhelp ondocumentediting,JosephWigginsfordataannotation,MickeyVellukunnel,Mehmet CelepkoluandTimothyBrownfororganizingstudies.Thefriendlyandsupportive LearnDialogueculturemademyPh.Dstudymucheasier.Ialsowanttothankmyfamily, especiallymywifeRunqingWang,fortheirunconditionalsupport. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 11 CHAPTER 1INTRODUCTION .................................. 13 2RELATEDWORK .................................. 19 2.1CoreferenceResolution ............................. 19 2.2ReferenceResolutioninSituatedDialogue .................. 22 2.3Summary .................................... 26 3CORPUS ....................................... 28 3.1DataCollection ................................. 28 3.2Annotation ................................... 30 4ONLINEREFERRINGEXPRESSIONEXTRACTION ............. 32 4.1Part-of-speechTaggingforDomain-specicLanguage ............ 32 4.1.1Approach ................................. 34 4.1.2ExperimentsandResults ........................ 36 4.2NounPhraseChunkinginTutorialDialogue ................. 39 4.3Discussion .................................... 42 5SEMANTICINTERPRETATIONOFREFERRINGEXPRESSIONS ...... 44 5.1SemanticInterpretationasSequenceLabeling ................ 46 5.1.1NounPhrasesinDomainLanguage .................. 46 5.1.2DescriptionVector ............................ 48 5.1.3JointSegmentationandLabeling ................... 49 5.1.4Features ................................. 50 5.2ExperimentsandResults ............................ 51 6REFERENCERESOLUTIONFORSITUATEDDIALOGUESYSTEM .... 54 6.1ReferenceResolutioninaSituatedEnvironment .............. 54 6.2ReferringExpressionSemanticInterpretation ................ 55 6.3GeneratingaListofCandidateReferents .................. 56 6.4Ranking-basedClassication ......................... 58 6.5ExperimentsandResult ............................ 58 5

PAGE 6

6.5.1SemanticParsing ............................ 59 6.5.2CandidateReferentGeneration .................... 59 6.5.3IdentifyingMostLikelyReferent .................... 60 7TUTORIALDIALOGUESYSTEMFORJAVAPROGRAMMINGWITH SUPERVISEDREFERENCERESOLUTION ................... 66 7.1UserInterface .................................. 68 7.2SystemFunctionalities ............................. 69 7.3ArchitectureoftheDialogueAgent ...................... 70 7.4NaturalLanguageUnderstandingModule ................... 71 7.4.1ReferenceResolution .......................... 72 7.4.2DialogueActClassication ....................... 73 7.4.3TopicClassication ........................... 73 7.5DialogueManager ................................ 74 7.6KnowledgeBase ................................. 77 7.7SystemUtteranceGeneration ......................... 77 8EVALUATIONOFTHEDIALOGUESYSTEM .................. 79 8.1ProposedHypotheses .............................. 79 8.2UserStudy ................................... 80 8.2.1Participants ............................... 80 8.2.2JavaProgrammingTaskfortheStudy ................. 80 8.2.3Procedure ................................ 81 8.2.4DataCollection ............................. 83 8.3SystemUsabilityEvaluation .......................... 86 8.4UserEngagementEvaluation .......................... 87 8.5OnlineReferenceResolutionEvaluationinTutorialDialogueSystems ... 87 9DISCUSSION ..................................... 92 9.1NullResults ................................... 92 9.2Data-drivenApproachinBuildingDialogueSystems ............ 93 9.3UnderstandingUsers'JavaProgram-AChallengeinBuildingDialogue SystemsForJavaProgramming ........................ 94 10CONCLUSION .................................... 96 10.1HypothesisRevisited .............................. 96 10.2Limitations ................................... 97 10.3FutureWork ................................... 98 APPENDIX APRE-SURVEY .................................... 100 BPOST-SURVEY ................................... 104 6

PAGE 7

REFERENCES ....................................... 115 BIOGRAPHICALSKETCH ................................ 121 7

PAGE 8

LISTOFTABLES Table page 1-1Anexcerptdialoguebetweenauserandthedialoguesystem. ........... 17 3-1Semanticlabelsofreferringexpressions. ...................... 31 4-1Resultsofbaselinetagger(CRFtrainedonsource-domaincorpus),Stanford tagger,andourapproach(CRFtrainedongeneratedtarget-domaincorpus). .. 38 4-2Nounphrasechunkingresult. ............................ 41 4-3Thefeaturesusedfornounphrasechunking. .................... 41 5-1Semanticlabelingaccuracy. ............................. 53 6-1Algorithmtoselectcandidatesusinglearnedsemantics .............. 58 6-2Featuresusedforsegmentationandlabeling. ................... 61 6-3Referenceresolutionresults. ............................. 65 6-4Referenceresolutionresultswithgoldsemanticlabels. .............. 65 7-1Dialogueactset. ................................... 74 7-2Topicsrecognizedbythetopicclassier. ...................... 75 7-3Samplesystemresponseutterances. ......................... 78 8-1AnexcerptdialoguebetweenauserandtheVirtualTA. ............. 85 8-2Anexampleuseractionsavedinthedatabase. ................... 85 8-3Anexamplereferenceresolutioneventsavedinthedatabase. ........... 86 8-4Afalsepositiveexampleofreferringexpressionidentication. .......... 91 8-5Afalsenegativeexampleofreferringexpressionidentication. .......... 91 9-1Acomparisonbetweenhuman-computerdialoguesandhuman-humandialogues. 93 A-1Acompletepre-surveyresultsforstudentsused System Li ............ 102 A-2Acompletepre-surveyresultsforstudentsused System Comparison ...... 103 B-1Acompletepost-surveyresultsforusersused System Li ............. 113 B-2Acompletepost-surveyresultsforusersused System Comparison ........ 114 8

PAGE 9

LISTOFFIGURES Figure page 1-1Excerptoftutorialdialogueillustratingreferenceresolution.Referringexpressions areshowninbold. 1 ................................. 14 1-2Pipelineofonlinereferenceresolutioninasituateddialogue. ........... 16 2-1Relationshipbetweenaccessibilityandreferringexpressionforms. ........ 20 2-2Coreferencerelationexamplediagram. ....................... 23 2-3Bayesiannetworkforreferenceresolution. ..................... 25 2-4Identifyingthemostlikelyreferentusingword-as-classierapproach. ...... 26 3-1TheinterfaceofRipple-atutorialdialoguesystemforJavaprogramming.It includestwowindows:awindow(ontheleft)todisplaystudent'sJavacode andawindow(ontheright)fortextualmessagesbetweenstudentandtutor. .. 29 4-1Stepsforreferringexpressionextraction. ...................... 33 4-2Exampleoftargetsentencegeneration. ....................... 37 5-1AparseoftheouterforloopfromStanfordParser. ................ 47 5-2SegmentationandsemanticlinkingofNP"a2dimensionalarray". ....... 49 5-3Dependencystructureof"a2dimensionalarray". ................. 51 6-1Semanticinterpretationofreferringexpressions. .................. 56 7-1Architectureofthetutorialdialoguesystem. .................... 67 7-2Userinterfaceofthedialoguesystem. ........................ 69 7-3Architectureofthedialoguesystem. ........................ 71 7-4Userintentionidenticationexample. ........................ 76 7-5Structureoftheprogrammingtask. ......................... 77 8-1Ashortinstructionwiththetaskdescription. ................... 82 8-2Ashortinstructionwiththetaskdescription. ................... 83 8-3Systemusabilityscoreinterpretation. ........................ 87 8-4Referenceresolutionprocessinthedialoguesystem. ................ 89 A-1Pre-survey. ...................................... 100 9

PAGE 10

A-2Pre-survey. ...................................... 101 A-3Pre-survey. ...................................... 102 B-1Post-survey. ...................................... 104 B-2Post-survey. ...................................... 105 B-3Post-survey. ...................................... 106 B-4Post-survey. ...................................... 107 B-5Post-survey. ...................................... 108 B-6Post-survey. ...................................... 109 B-7Post-survey. ...................................... 110 B-8Post-survey. ...................................... 111 B-9Post-survey. ...................................... 112 10

PAGE 11

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy INVESTIGATINGREAL-TIMEREFERENCERESOLUTIONINSITUATED DIALOGUEFORCOMPLEXPROBLEMSOLVING By XiaolongLi August2018 Chair:KristyElizabethBoyer Major:ComputerScience Asituateddialogueisembeddedinasituatedenvironment,wheredomain-specic taskcompletionisusuallyacentralactivity.Inasituateddialogue,itisessentialto correctlyidentifytheobjectsthatspeakersrefertointheenvironment.Thistaskis referredtoasreferenceresolution.However,referenceresolutionisachallengingproblem insituateddialogue,andinpartbecauseofthislimitation,moststate-of-the-artsituated dialoguesystemsoperatewithinhighlyconstraineddomains.Thisdissertationpresentsan implementationofatutorialdialoguesystemforthedomainofJavaprogramming,with real-timereferenceresolution.Theimplementeddialoguesystemidentiesandinterprets referringexpressionsinuserutterancesinrealtime.Theidentiedreferentsareused toimprovetheperformanceofnaturallanguageunderstanding.Thisdissertationalso examinestheimpactofdi! erentreferenceresolutionapproachesontheperformanceofthe implementedtutorialdialoguesystem. Theimplementedreal-timereferenceresolutionapproachinthisprojecthasthree phases.First,weapplyaninnovativeapproachthatwedevelopedformoreaccurate part-of-speechtaggingindomain-specicdialogue.Thisapproachdoesnotrequirean annotatedcorpusforthetargetdomain.Next,weuseaConditionalRandomFieldto labelthesemanticstructureofthereferringexpressions.Finally,thelearnedsemantics areusedtogetherwithcontextualinformationtoperformreferenceresolutioninsituated dialogue.O"ineevaluationoftheCRF-basedreferenceresolutionapproachonanexisting 11

PAGE 12

tutorialdialoguecorpusforcomputerprogrammingshowedanaccuracyof61.6%,which isadramaticimprovementcomparedto51.3%fromanapproachbasedonamanually denedlexicon LiandBoyer ( 2016 ). Toevaluatetheperformanceofthetworeferenceresolutionapproaches,theywere implementedinatutorialdialoguesystemforJavaprogramming.Ahumansubjectsstudy wasconductedtoassesstheperformanceofthetutorialdialoguesystemswithdi!erent referenceresolutionapproaches.Inthestudy,41humanparticipantswererandomly assignedtousethesetwotutorialdialoguesystems.Post-surveyresultswerecollected fromstudyparticipantstoevaluatesystemusabilityanduserengagement.Thereference resolutionperformedbythedialoguesystemswasautomaticallyloggedintoadatabase formanualevaluation.Afteranalyzingthecollecteddatainthestudy,wedidnotnda signicantdi! erenceonusersatisfactionnoruserengagementinthedialoguesystemswith di !erentreferenceresolutionapproaches.ThepossiblereasonsarediscussedinChapter 9 Thisdissertationisoneofthefewworksthatattemptstoimplementanatural languagedialoguesystemforsuchacomplexdomainlikeJavaprogramming.Itisalso theonlyknownworkthatcomparesdi!erentreferenceresolutionapproachesinatutorial dialoguesystem. Inthedialoguesystemresearchcommunity,thereisanincreasingrecognitionthat naturallanguagedialoguesystemsneedtoworkinmorecomplexdomains.Real-time referenceresolutioninsituateddialogueisoneoftheimportantchallengestoachievesuch agoal.Thisdissertationresearchhasmadeasteptowardreal-timereferenceresolutionfor adialoguesystemoperatinginacomplexdomain. 12

PAGE 13

CHAPTER1 INTRODUCTION Dialoguesystemsmustmovetowardunderstandingusers'languagewithinsituated environmentstoassistuserswithincreasinglycomplextasks.Situateddialogueisusually embeddedinanenvironmentwheredomain-specictaskcompletionisacentralactivity. Oneoftheessentialrequirementsofsituateddialoguesystemsistoidentifytheobjects thatusersrefertoduringaconversation( Iidaetal. 2010 ; Liuetal. 2014 ; Liuand Chai 2015 ; Chaietal. 2004 ).Identifyingaspeaker'sreferentsis,itself,acrucialpart ofutteranceinterpretation.Identifyingthecorrectreferentforanutterancealsohelps otheraspectsoflanguageunderstandingforexample,byconstrainingthelikelycurrent intention( GorniakandRoy 2007 ). Referenceresolutioninsituateddialogueischallengingbecauseoftheambiguity inherentwithindialogueutterancesandthecomplexityoftheenvironment.Imagine adialoguesystemthatassistsanovicestudentinsolvingaprogrammingproblem.To understandaquestionorstatementthestudentposes,suchas,"ShouldIusethe2 dimensionalarray?",thesystemmustlinkthereferringexpression"the2dimensional array"toanobject 1 intheenvironment. ThisprocessisillustratedinFigure 1-1 ,whichshowsanexcerptfromacorpusof tutorialdialoguesituatedinanintroductorycomputerprogrammingtaskintheJava programminglanguage.Thearrowslinkreferringexpressionsinthesituateddialogueto theirreferentsintheenvironment.Toidentifythereferentofeachreferringexpression,it isessentialtocapturethesemanticstructureofthereferringexpressionoftheobjectit refersto,suchas"the2dimensionalarray"containstwoattributes,"2dimensional"and 1 Theword"object"hasatechnicalmeaningwithinthedomainofobject-oriented programming,whichisthedomainofthecorpusutilizedinthiswork.However,we followthestandardusageof"object"insituateddialogue( Iidaetal. 2010 ),whichfor programmingisanyportionofcodeintheenvironment. 13

PAGE 14

"array".Atthesametime,thedialoguehistoryandthehistoryofusertaskactions(such aseditingthecode)playakeyrole.Todisambiguatethereferentof"myarray",temporal informationisneeded:inthiscase,thereferentisavariablenamed"arra",whichisan arraythatthestudenthasjustcreated. ! ! Tutor : Tutor : Tutor : Student : Tutor : Student : table = new int[10][5]; that is where they initialize the size of the 2 d imensional array [student adds line of code: arra = new int [ s length ()]; ] great! [student adds line of code: new2 = Integer parseInt ( parse1 ); ] does my array look like it is set up correctly now umm...... in the for loop what should you be storing in the array ? :) setTitle ( "Postal Code Generator" ); setDefaultCloseOperation ( EXIT_ON_CLOSE ); setVisible ( true ); table = new int [ 10 ][ 5 ]; initTable (); } /** Extract the individual digits stored in the ZIP code and store their values as private data */ private void extractDigits () { //You must complete this method!! String s = Integer toString ( zipCode ); String parse1 ; Char num ; int arra []; int new2 ; arra = new int [ s length ()]; for ( int i = 0 i < s length (); i ++) { num = s charAt ( i ); parse1 = "" + num ; new2 = Integer parseInt ( parse1 ); arra [ i ]= num ; } Dialogue and task history Environment Figure1-1. Excerptoftutorialdialogueillustratingreferenceresolution.Referring expressionsareshowninbold. 2 Totackletheproblemofreferenceresolutioninthistypeofsituateddialogue,we presentapipelineapproachthatcombinesadomain-specicpart-of-speech(POS)tagger, semanticsfromaconditional-random-eld-basedsemanticparseralongwithsalience featuresfromdialoguehistoryandtaskhistory.Thisapproachincludesthreemainsteps. First,weextractreferringexpressionsfromuserutterances.Second,weinterpretthe semanticsofreferringexpressionsusingaconditionalrandomeldmodel(CRF).The 2 Typosandsyntacticerrorsareshownastheyappearintheoriginalcorpus. 14

PAGE 15

outputsofthissteparetheobjectattributesexpressedbythereferringexpressions. Finally,thelearnedsemanticinformationandcontextualinformationfromthesituated dialogueareusedtoidentifythementionedobjects.ThisprocessisillustratedinFigure 1-2 .WeevaluatethisapproachontheJavaTutorcorpus,acorpusoftextualtutorial dialoguecollectedwithinanonlineenvironmentforcomputerprogramming. Inordertoenableatask-orienteddialoguesystemtoperformreferenceresolutionin areal-timedialoguesystem,weneedtorecognizereferringexpressionsinuserutterances onthey.Tosolvethisproblem,weneedtheaccuratepart-of-speech(POS)tagsofuser utterances.ThisdissertationalsopresentsaninnovativePOStaggingapproachwithin situateddialogue.InacorpusoftextualdialogueforJavaprogramming,theproposed approachshowedalargeimprovementovertheStanfordtagger.Comparedtoatagger trainedonthesamesourcedata(whichincludesdialogue)butwithnodomainadaptation, overallaccuracyimprovedfrom87.14%to92.76%.Fornouns,whichareaprevalentand challengingopenwordclassindomainlanguage,thenewapproachresultsinadramatic improvementfromF1-scoreof0.701to0.903.Accordingly,theF1-scoreofnounphrase chunkingwasimprovedfrom0.81to0.86. Priorworkonreferenceresolutionhasleverageddialoguehistoryandtaskhistory informationtoimprovetheaccuracyofreferenceresolution( Iidaetal. 2010 2011 ; Funakoshietal. 2012 ).However,thesepriorapproacheshaveemployedrelativelysimple semanticinformationfromthereferringexpressions,suchasamanuallycreatedlexicon, orhaveoperatedwithinanenvironmentwithalimitedsetofpre-denedobjects.As thisdissertationdemonstrates,thesepriorapproachesdonotperformwellinsituated dialoguesforcomplexproblemsolving,inwhichtheusercreates,modies,andremoves objectsfromtheenvironmentinunpredictableways.Wecombinethesemanticslearned byaCRF-basedapproachtogetherwithsalienceinformationofobjectsinthesituated environmenttomapreferringexpressionstotheirreferents.Theresultsshowedthatour approachachievessubstantialimprovementovertwoexistingstate-of-the-artapproaches, 15

PAGE 16

withexistingapproachesachieving51.3%accuracyatbest,andthenewapproach achieving61.6%accuracy. Referring Expression Extraction Semantic Interpretation of Referring Expressions Identifying Referents User Utterance from the actionPerformed method. Referring Expression(s) 71: Public void actionPerformed(){...} "the actionPerformed method" Referent Figure1-2. Pipelineofonlinereferenceresolutioninasituateddialogue. Inthisdissertation,wepresentadata-driventutorialdialoguesystemforJava programming.Inthisdialoguesystem,weimplementthereferenceresolutionpipeline presentedabovetoidentifytheuser'sreferentinrealtime.Thetutorialdialoguesystem hasfourmainmodules:naturallanguageunderstanding(NLU)module,dialoguemanager (DM)module,knowledgebase(KB)module,andanaturallanguagegeneration(NLG) module.TheNLUmoduleperformsreferenceresolution,dialogueactclassicationand topicclassicationforaninputuserutterance.TheDMtracksthecurrentprogramming progressanduserintention.WealsoauthorizedasetofrulesfortheDMtogenerate systemdialogueactsforinputuserdialogueacts.TheKBmodulemaintainsknowledge abouttheprogrammingproblemandJavalanguage.FortheNLGmodule,weauthorizea setofsystemutterancesforeachsystemdialogueact.Anexcerptdialoguebetweenauser andthedialoguesystemisshowninTable 1-1 Toevaluatethetutorialdialoguesystemswithdi!erentreferenceresolution approaches,weconductedauserstudywithundergraduatestudentsfromtheCOP 16

PAGE 17

Table1-1. Anexcerptdialoguebetweenauserandthedialoguesystem. VirtualTA Hi,I'myourvirtualTA.IwillbehelpingyouonthisJavaproject. User doesextratDigitsmethodinputanything? VirtualTA ThisJavaprogramconvertsazipcodeforUSmailintoavisualbarcode.When youruntheprogram,itaskstheusertoenterazipcode.Theprogramwillthen convertitintoabarcodeanddisplayit.Butitdoesn'tworkyet!Tomakeitwork, yourtaskistocompletetheextractDigitsmethod.Itneedstoconvertaninteger zipcodeintoanarrayofindividualdigits,andreturnthatarray.Hint:Makingthe intintoaStringrstisaconvenientwaytogetitsindividualdigits. VirtualTA TheextractDigitsmethodislocatedinthePostalFrameclass,whichshouldhave beenopenedforyou.Youarewelcometoviewtheothercode,butyouonlyneedto workontheextractDigitsmethod. User whatdoIneedtodointheextractDigitsmethod? VirtualTA Youwillextractthevedigitsofazipcode.Nowweneedtogeteachofthe5digits ofthatint.Usingastringandthengettingtheindividualcharactersfromitisa goodwaytodothat. User whereistehzipcode VirtualTA Whentheprogramisrun,theusertypesinazipCode.ThecodetogetthezipCode fromtheuserisalreadycomplete.Thezipcodetheuserenteredisstoredasa privatevariable,intzipCodeonline34inPostalFrame. User howtoparseastringintoanarray VirtualTA YoucanusethesyntaxintVariable+""toconvertanintegertoaStringvariable. ... 3502"ProgrammingFundamentalsI"classattheUniversityofFlorida.Weimplemented twodi!erentversionsofthetutorialdialoguesystemwithdi!erentreferenceresolution approaches.TheSystem Liimplementedthereferenceresolutionwithlearnedsemantics withaCRF-basedapproach.ThebaselinesystemSystem Comparisonimplemented areferenceresolutionapproachwithamanuallyauthorizeddomainlexicon.Inthe evaluation,weinvestigatedtheimpactofdi! erentreferenceresolutionapproaches onthetutorialdialoguesystem.Specically,weexaminedthedi! erentapproaches' impactsonusersatisfactionusingSystemUsabilityScale(SUS)instrument Bangoretal. ( 2008 ),anduserengagementusingUserEngagementScale(UES)instrument Brienetal. ( 2018 ).System LihadanaverageSUSscoreof66.7,System Comparisonhadanaverage SUSscoreof68.8.Therewasn'tasignicantdi!erencebetweenthemthesetwoscores (p-value=0.361).System LihadaUESscoreof11.8,andSystem ComparisonhadaUES scoreof12.3.Therewasn'tasignicantdi! erenceneither(p-value=0.236). 17

PAGE 18

Wealsoexaminedtheonlineaccuracyofthetworeferenceresolutionapproaches. System LiandSystem Comparisonhadanaccuracyof21.6%and19.6%.Afterfurther analyzingthecollecteddata,wefoundthelowaccuracywascausedbythereferring expressionselectionapproach.Aftermanuallyannotatingthereferringexpressionsin thecollecteddata,wefoundtheaccuraciesofthesetwomodelsare63.3%and44.9%, respectively. Thisdissertationmakesthefollowingcontributions:1)implementationofatutorial dialoguesystemforJavaprogramming;and2)evaluationofreal-timereferenceresolution approachesinthetutorialdialoguesystembyconductingahumansubjectstudy.We believethesecontributionswillhelpthedialoguesystemresearchcommunitytobetter understandaboutreferenceresolutioninsituateddialoguesystems. Theremainderofthedissertationisstructuredasfollows.Chapter 2 reviewsrelated workonsituatedlanguageunderstanding,andreferenceresolutioninsituateddialogue understanding,summarizingthefeaturesandapproachesusedinpriorwork.Chapter 3 introducesthecorpusofsituateddialogueforJavaprogramming,whichisusedinthis dissertationformodeltrainingandempiricalevaluation.Chapter 4 describestheprocess ofonlinereferringexpressionidentication,whichextractsreferringexpressionsfrom userutterancesinrealtimewhenthedialoguesystemisrunning.Chapter 5 presents thesemanticinterpretationofreferringexpressionsusingaCRF-basedmodel.Chapter 6 describestheapproachforreferenceresolutionwithlearnedsemanticsfromreferring expressionsandcontextualinformationofthetask-orienteddialogue.Wedescribe theimplementationofthetutorialdialoguesystemforJavaprogramminginChapter 7 .WepresentauserstudyforthetutorialdialoguesysteminChapter 8 .Chapter 9 isadiscussionofobservationsmadewhilebuildingthetutorialdialoguesystemand conductingtheuserstudy.ThedissertationisconcludedinChapter 10 bysummarizing thepresentedworkandcontributions. 18

PAGE 19

CHAPTER2 RELATEDWORK Thischapterreviewspreviousresearchonreferenceresolutionwithindi! erenttypes ofsituatedenvironments.Westartwithcoreferenceresolutionintext,whichisclosely relatedtoreferenceresolutioninsituatedlanguageandhasbeenawellestablished researchareafordecades.Then,wecategorize,discuss,andcomparepreviousworkon referenceresolutioninsituatedlanguage. 2.1CoreferenceResolution Coreferenceresolutiondiscoversantecedentsforanaphorasindiscourse.Ananaphora isalinguisticexpressionwhoseinterpretationdependsonanotherlinguisticexpression inthecontext.Anantecedentisalsoalinguisticexpression,whichisusedbeforean anaphoraandcouldbeusedtoexplainit.Forexample,inthesentence"Whenyousee John,givehimthiscard.","John"isanantecedentof"him";"him"isananaphora.A coreferencerelationconsistsofanantecedentandananaphorathatrefertothesame entity.Theremaybemultiplenounphrasesreferringtothesameentity.Coreference resolutionisdi!erentfromreferenceresolutioninasituatedenvironment,however,they sharesomesimilaritieswhichwillbediscussedinSection 2.2 .Referenceresolutionhas beeninspiredbythetheoriesandapproachesdevelopedforcoreferenceresolution,suchas centeringtheoryandranking-basedclassicationapproach( DenisandBaldridge 2008 ). Theoriesforcoreferenceresolution Arielpresentedatheorythatdescribedtherelationshipbetweenaccessibilityof entitiesandreferringbehaviors( Ariel 1988 ).Shearguedthat"naturallanguage primarilyprovidesspeakerswithmeanstocodetheACCESSIBILITYofthereferent totheaddressee."Theaccessibilityofentities,whichindicateshowaccessibleanentity istotheconversationparticipants,is"tiedtocontexttypesinadenitelynon-arbitrary way."Accordingtotheauthor,therearethreetypesofcontextsthatarehighlyrelatedto referenceresolution:communitymutualknowledge,physicalco-presentmutualknowledge, 19

PAGE 20

andlinguisticco-presentmutualknowledge.Communitymutualknowledgeisshared bythespeakersandaddresseesbecauseofbelongingtothesamecommunity.Physical co-presentmutualknowledgeisperceivedbytheconversationparticipantsintheirshared physicalenvironment.Linguisticco-presentmutualknowledgeisconveyedbyprevious utterances,i.e.,dialoguehistory.Allofthesethreekindsofknowledgedeterminethe accessibilityofpossiblereferentsatagivenmoment.Intuitively,thesethreecontext typesprovidemetricstomeasurethesalienceofentitiesinvolvedinaconversation. Theauthorsalsoarguedthattheaccessibilityofentitiesdeterminestheformoftheir referringexpressions.Entitieswithloweraccessibilityneedmorelexicalinformationtobe identied,andviceversa.Moredetailedrelationshipsbetweenaccessibilityandtheform ofreferringexpressionsareshowninFigure 2-1 Figure2-1. Relationshipbetweenaccessibilityandreferringexpressionforms. Groszetal.presentedaframeworkbasedoncenteringtheorytomodellocal coherenceofdiscourse( Groszetal. 1995 ).Centersweredenedasentitiesinan utterancethatservedaslinkstootherutterancesinthediscoursethatalsocontain thesameentities.Eachutteranceinthediscoursewasassignedasetofforward-looking 20

PAGE 21

centersandonebackward-lookingcenter.Thecenteringframeworkprovidedarule-based approachtodescribeaspeaker'sattentionalstatebymonitoringthechangeofcenters. Theauthorsalsoarguedthatattentionalstateswerehighlyrelatedtothechoiceof referringexpressions.Sidneralsopointedoutthecloserelationshipbetweendiscourse structureandreferenceresolution( Sidner 1986 ). Bothaccessibilitytheoryandcenteringtheoryemphasizetheimportanceofsalience informationincoreferenceresolution.Wewillshowthatthissalienceinformationisalso essentialinreferenceresolutioninsituatedenvironments. ModelsforCoreferenceResolution Earlyworkoncoreferenceresolutionused rule-basedapproaches( LappinandLeass 1994 ).Morerecentworkusuallyformulates coreferenceresolutionasaclassicationproblemasdiscussedabove,whichisalso employedbyreferenceresolutioninmostcases.Thedi! erenceisthatthecandidates ofcoreferenceresolutionareotherreferringexpressions,whilereferenceresolutionhas objectsfromthesituatedenvironmentascandidates. Thestraightforwardapproachistoconsiderreferringexpressionsinpairs, < re i ,re j > .Thebinaryoutputofaclassicationfunction f ( re i ,re j )indicateswhether re i and re j havethesamereferent.Somepreviousworkuseddecisiontreesasclassication functions,giventhesimplicityandcategoricalnatureofthefeatures( Mccarthyand Lehnert 1995 ; Soonetal. 2001 ).PonzettoandStrubeusedamaximumentropymodelas theirclassicationfunction( PonzettoandStrube 2006 ). Ranking-basedmodel: Inapieceoftext,therecouldbemultipleantecedentsfora referringexpression.Pairwisematchingmodelsconsiderasinglecandidateatonce,which onlytakeaTrue/Falsedecisionfromabinaryclassier.However,theoutputofabinary classierisusuallyaprobability.Thisprobability,thecondenceofmakingapositive decision,wasabandonedinthismodel.Toemploythiscondencevalue,Yangetal. presentedanapproachusingtwin-candidatesinsteadofasinglecandidateasantecedents ( Yangetal. 2003 ).Inthisapproach,eachdatasamplecontainedoneanaphoraandtwo 21

PAGE 22

candidateantecedents,onlyoneofwhichwastherealantecedent.Themodelconsidered featuresbetweenthesethreereferringexpressionstomakeanaldecision,whichtook thecomparisonbetweentwocandidatesintoconsideration.Themodelachievedbetter performance.Usingasimilaridea,DenisandBaldridgepresentedaranking-basedmodel, whichcreatedmultipleantecedentcandidates foreachanaphora re ( Denis andBaldridge 2008 ).Abinaryclassier f ( re,c i ) [0 1]wasthenusedtocompute thecompatibility p i between re andeach c i .Theseoutputs p i wererankedtoselecta bestcandidatefromthecandidatelistas re 'srealantecedent.Culottaetal.organized candidatesintoclustersandidentiedalltheantecedentsforareferringexpressionatthe sametime( Culottaetal. 2007 ). Specializedmodels: DenisandBaldridgearguedthatdi!erentreferringexpression types,pronouns,denitivenounphrases,anddemonstrativenounphraseswereused di !erently( DenisandBaldridge 2008 ).Thus,theytraineddi!erentmodelsforeachtype ofreferringexpression,whichprovedtobemoreaccurateforcoreferenceresolution. 2.2ReferenceResolutioninSituatedDialogue Referenceresolutioninsituatedlanguagesharessimilaritieswithcoreference resolution.Bothbenetfromsemanticinterpretationofreferringexpressionsandare usuallyformattedasclassicationproblems.However,ascoreferenceresolutionidenties acoreferencerelationbetweenreferringexpressionswithinadiscourse,whereasreference resolutioninsituatedlanguageidentiesreferentsofreferringexpressionsintheirsituated environment.Forexample,inFigure 2-2 ,referringexpressionssuchas"he","his"and "Clinton"appearedlaterinapieceoftextallrefertothereferringexpression"Bill Clinton",whichappearedearlierinthesametext.Inasituateddialogue,asshownin Figure 1-1 bothreferringexpressions"myarray"and"thearray"referto arra ,whichis anarraythatthestudenthadjustcreated. 22

PAGE 23

Figure2-2. Coreferencerelationexamplediagram. Thestateofthesituatedenvironmentalsoplaysanessentialroleinsolvingthis problem.Thissectionsummarizestheapproachesusedinexistingworkonusingreference resolutioninsituatedlanguage. Similartocoreferenceresolution,referenceresolutionisusuallyrepresentedasa classicationproblem.Givenareferringexpression re andacandidatereferent e ,a classicationfunction f ( re,e )isusedtopredicttheprobabilitythat e is re 'sreferentin thecurrentcontext,whichincludeslinguisticcontextandworldstate.Eachcandidate referent e isanentityinthesituatedenvironment,suchas"abluemugonthetable". FeaturesforReferenceResolution Inpreviouswork,therearethreeprimary typesoffeatures:syntacticfeatures,semanticfeatures,andsaliencefeatures.Unlike coreferenceresolution,therearelesssyntacticfeaturesinvolvedforreferenceresolution insituatedlanguage.Coreferenceresolutionsearchesforrelationsbetweenreferring expressions,inwhichthesyntacticrelationshipbetweenthesereferringexpressionsplays animportrole.Forreferenceresolutioninsituateddialogue,thereferentsareinthe situatedenvironment,notinthedialogue.Thesyntactictypeofreferringexpressions, suchasdemonstrativepronounsanddenitepronouns,arethemostcommonlyused syntacticfeatures( Chaietal. 2004 ; Iidaetal. 2010 ).Demonstrativepronounsare pronounspointingtospecicthings,suchas"this"and"that".Denitepronouns,such 23

PAGE 24

as"him"and"it",arepronounsreferringtospecicthings,whicharedi! erentfrom indenitepronouns,suchas"someone"and"anything". Semanticfeatures: Asdiscussedabove,situatedenvironments,includingobjectsin theenvironment,areusuallyrepresentedinsituatedlanguageunderstandingtasksas symbols.Oneofthemostimportantsourcesofinformationforidentifyingthereferentsof areferringexpressionisthesemanticcompatibilitybetweenthem.Chaietal.considered semantictypeswhilecreatinggraphsthatrepresentedtherelationshipsbetweenentities ( Chaietal. 2004 ).Similartocoreferenceresolution,attributesofentitieswerealsoused forreferenceresolutioninsituatedlanguage,suchastheshapeandsizeofentities( Iida etal. 2010 2011 ). Saliencefeatures: Saliencefeaturescapturehownoticeableandimportantanentity isatagivenmoment.Saliencefeaturescontaininformationaboutwhatmakesaspecic entitymoreprominent,suchasmentioninganentityinrecentdiscoursehistory,movingor operatingonanentityinrecentactionhistory,etc. Chaietal.aligneddeicticgestures,pointingandcirclingobjectsinthescene,with referringexpressionsfoundwithinutterancesusingthetemporalco-occurrencebetween them( Chaietal. 2004 ).Iidaetal.studiedreferenceresolutioninsituateddialoguesfor acollaborativegame( Iidaetal. 2010 2011 ).Theyuseddialoguehistoryandoperating historyasfeaturestoexploitthesalienceofentities.Thesefeatureswerecodedbytime intervals,suchas"weatherobject o i wasoperatedinthepast10seconds."Eyegaze featureshavealsobeenusedassaliencefeaturesinsomeresearchtoimprovetheaccuracy ofreferenceresolution( Iidaetal. 2011 ; KenningtonandSchlangen 2015 ). Di!erentfromthesemanticfeaturesusedinpreviouswork,weproposeaCRF-based semanticlabelingapproach.Thisapproachautomaticallylabelsattributesofobjectsin referringexpressions. Approaches. Mostexistingworkformulatedreferenceresolutionasasupervised classicationproblem.Iidaetal.usedoutputfromSVMclassiersasmeasurementsfor 24

PAGE 25

compatibilitybetweenareferringexpressionandthecandidatereferents( Iidaetal. 2010 2011 ).Theyalsotrainedspecializedmodels,apronounmodelandanon-pronounmodel, fordi! erenttypeofreferringexpressions.Funakoshietal.presentedaBayesiannetwork tomodelthegenerativeprocessfromreferenttoreferringexpressions( Funakoshietal. 2012 ).ThestructureoftheBayesiannetworkisshowninFigure 2-3 Figure2-3. Bayesiannetworkforreferenceresolution. InthisBayesiannetwork, W,C,X,D representwords,concepts(attribute),referents, andareferentdomain(asetofreferents),respectively.Thismodelalsoshowshowto resolveareferencetoasetofreferents. Mostpreviousworkemployedsemanticfeatures,whichinsomecaseswereextracted usingamanuallydenedlexicon( Chaietal. 2004 ; Liuetal. 2012 )andinsomeother caseslearnedautomatically( Matuszeketal. 2014 ; Schlangenetal. 2016 ). Weaklysupervisedapproaches: Someworkattemptedtobuildreferenceresolution modelswithlesssupervision.Theseapproachesneedlessmanualannotations,especially forlexicalsemantics,whencomparedtofullysupervisedapproaches.Supervised approachesusuallyusealexicontolabelthesemanticsofreferringexpressions( Iida etal. 2010 ).Thus,thetrainingdataforfullysupervisedapproachescontain pairsandlexicalsemanticsofreferringexpressions.Weaklysupervisedapproachesdo notneedlexicalsemanticsasinput;instead,theirinputsarejustthe pairs. Weaklysupervisedapproacheslearnthealignmentsbetweennaturallanguagetokensin re ,andattributesof e automaticallyusetheco-occurrencesof re and e intrainingdata. Inpreviouswork( KenningtonandSchlangen 2015 ; Schlangenetal. 2016 ),thesemantics ofnaturallanguagetokenswerelearnedusingaword-as-classierapproach.Theinputof 25

PAGE 26

thisapproachwasasetof pairs.Eachreferent e inthedatasetwasanphysical objectinascene.Thegoalofthisword-as-classierapproachwastolearnthealignment betweennaturallanguagetokensin re andvisualfeaturesof e .Foreachnaturallanguage token w ,alogisticregressionclassierwaslearnedgivenalloftheco-occurrenceof e and w intrainingdata.Object e wasrepresentedasan n -dimensionalvectorofvisualfeatures. Classiersweretrainedforeachtoken w inthetrainingdata.Whengivenanewreferring expression re = andascenewithasetofobjects e i ,theclassiersfortokens inthis re wereappliedtoeachobject e i inthescenetondthebestmatchintermsof compatibilitybetween re and e i .ThisprocessisillustratedinFigure 2-4 .Inthisgure, x i isthefeaturevectorofthe i th objectinthescene.Thereisanoutput, ( w T x i + b ),for eachobjectinthescene.Thetoplevelrepresentsnormalizationoveralloftheoutputs fromthelogisticclassier.Withthisword-as-classierapproach,thealignmentbetween naturallanguagetokensandvisualfeaturesofobjectswerelearnedautomaticallywithout explicitmanualannotation. Figure2-4. Identifyingthemostlikelyreferentusingword-as-classierapproach. 2.3Summary Thischaptersummarizespreviousapproachesonreferenceresolutioninsituated language.Accordingtotheliteraturereview,wefoundthatmostpreviousworkperformed referenceresolutioninalimitedsetting,eitheraspecicsettingcontainingaxedsetof objectstoevaluatetheirapproach( KenningtonandSchlangen 2015 ),orinadomainwith verylimitednumberofobjects( Iidaetal. 2010 ).Noneoftheseapproachesinvestigate 26

PAGE 27

real-timereferenceresolutioninasituateddialoguesystem.Di!erentfrompreviouswork, thisdissertationreportsareal-timereferenceresolutionapproach.Inaddition,wepresent animplementationofatutorialdialoguesystemforJavaprogrammingtoevaluateitina real-timesetting. 27

PAGE 28

CHAPTER3 CORPUS Thisdissertationinvestigatesthereferenceresolutionprobleminatutorialdialogue system.Giventhedata-drivennatureofthereferenceresolutionanddialogueunderstanding techniquesusedinthisresearch,weemployacorpusoftutorialdialoguesfromprevious study. 3.1DataCollection Thecorpuswascollectedwithinatutorialdialoguestudyinwhichhumantutorsand studentsinteractedthroughatutorialdialogueinterface,Ripple,thatsupportedremote textualcommunication( Boyeretal. 2011 ).Thetutorialdialogueinterface(Figure 3-1 ) consistsoftwowindowsthatdisplayinteractivecomponents:thestudents'Javacode, thecompilationorexecutionoutputassociatedwiththecode,andthetextualdialogue messagesbetweenthestudentandtutor.Alloftheinformationinthesetwowindows wassynchronizedbetweenthestudent'sscreenandtutors'screeninrealtime.Theentire corpuscontains45Javaprogrammingtutoringsessionsfromstudent-tutorpairs,witha totalof4857utterances,anaverageof108utterancespersession.Eachofthesesessions lastedapproximatelyonehour.Theproblemstudentssolvedduringthistutorialdialogue involvedcreating,traversing,andmodifyingparallelarrays,achallengingtasksincethe studentswerenoviceswhowereenrolledinanintroductorycomputerprogrammingclass. Thedialogueswithinthisdomainarecharacterizedbysituatedfeaturesthatpertain totheprogrammingtask.AportionofuserutterancesrefertogeneralJavaknowledge, andinthesecasesasemanticinterpretationcanbeaccomplishedbymappingtoa domain-specicontology( Dzikovskaetal. 2007 ).Incontrast,manyutterancesrefer toconcreteentitieswithinthedynamicallychanging,user-createdprogramingartifact. Identifyingtheseentitiescorrectlyiscrucialforgeneratingspecictutorialdialoguemoves. Besidesthetutorialdialogue,wealsousedpubliclyavailablecorporaforPOS tagging.WeperformedPOStagginginordertoidentifyreferringexpressionsfromuser 28

PAGE 29

Figure3-1. TheinterfaceofRipple-atutorialdialoguesystemforJavaprogramming.It includestwowindows:awindow(ontheleft)todisplaystudent'sJavacode andawindow(ontheright)fortextualmessagesbetweenstudentandtutor. utterances.Ourtargetdomainisonlinesynchronoustextualtask-orienteddialogue aboutJavaprogramming.Totrainadomain-specicPOStagger,weleveragedtwo di!erentlabeledcorporafromsourcedomains.First,weusedtheCoNLL2000corpusfor phrasechunking( TjongandSang 2000 ),whichisalabeledWallStreetJournalcorpus with10,948sentences.WealsousedtheNPSchatcorpus( ForsythandandMartell 2007 ),asetofannotatedonlineconversationaltextswith10,567utterances.Thetarget corpusisasetoftextualJavaprogrammingtutorialdialogues( LiandBoyer 2015 )that contains4,857utterances(51,721tokens)intotal.TheJavaprogrammingcorpusis task-oriented,containingnotonlyutterancesbutalsotheaccompanyingJavaprogram thattheinterlocutorswerecreatinganddiscussing.Asdescribedbelow,weutilized asubsetoftheseJavaprogramstoextractnounphrasestogeneratethenewlabeled 29

PAGE 30

trainingcorpus.WealsocomparedthisapproachtousingJavasnippetsfromTheJava TutorialwebsitetotestthebenetofusingunrelatedJavacode. 1 3.2Annotation Alloftheutterancesinthe45tutorialsessionsweremanuallyannotatedforthe referringexpressionsthathavereferentsintheparallelJavaprogram.Foreachreferring expression,welabeledsegmentationandsemanticlabelsforeachsegment,sothateach ofthesesemanticsegmentsrepresentsoneattributeintheJavaprogrammingdomain. Theselabeledreferringexpressionswillbeusedtotrainstatisticalmodelstoautomatically annotatereferringexpressionstoprovidesemanticinformationforreferenceresolution. Nounphrasesfromthetutorialdialogueswererstmanuallyextractedand annotated.Therewere364groundednounphrasesextractedmanuallyfromsixtutorial dialoguesessionsusedinthecurrentwork.Eachofthesenounphrasesextractedhasone ormultiplecorrespondingentitiesintheprogrammingartifact.Sinceeachwordinanoun phraseislinkedtoanelementinthedescriptionvector,theindicesinthisvectorwere usedasthelabelforeachword.Annotationofall346nounphraseswasperformedby oneannotator,and20%ofthenounphrases(70nounphrases)weredoublyannotatedby anindependentsecondannotator.Thepercentagreementwas85.3%andtheKappawas 0.765. Wealsoannotatedthesemanticlabelsforeachreferringexpression.Anounphraseis denedasaphrasewhichhasanoun(orindenitepronoun)asitsheadword,orwhich performsthesamegrammaticalfunctionassuchaphrase( Crystal 1997 ).Thesyntactic structureofanounphraseconsistsofdependentswhichcouldincludedeterminers, adjectives,prepositionalphrases,orevenaclause.Forexample,thenounphrase"a2 dimensionalarray"occurswithintheJavaprogrammingcorpus.Itsheadis"array"and itsdependentsare"a"asthedeterminerand"2dimensional"asanadjectivephrase. 1 https://docs.oracle.com/javase/tutorial/ 30

PAGE 31

Eachofthesesemanticsegmentsinvolvesanattributeofitsrealreferentinthesituated environment(theparallelJavaprograminthiscase).Wemanuallyannotatethese semanticsegmentsinreferringexpressions.ThesemantictagsweusedarelistedinTable 3-1 Table3-1. Semanticlabelsofreferringexpressions. Attributes Meaning(inJavaprogramming) Example CATEGORY Categoryofanentity Method,Variable,etc. NAME Variablename;oftenuser-createdextractDigit VAR TYPE Typeofvariable int,String,etc. NUMBER Numberofentities 2 IN CLASS TheclassthatcontainsthisentitypostalFrame IN METHOD ThemethodthatcontainsthisentityactionPerformed DIR PARENT Directparententity For Statement,Method LINE NUMBER Linenumber 67 SUPER CLASS Superclassofthisentity JFrame MODIFIER Accessmodier public,private,etc. ARRAY TYPE TypeofArray int,char,etc. ARRAY DIMENSIONDimensionofarray 2,1 OBJ CLASS Theclassanobjectinstantiates PostalBarCode RETURN TYPE Returntype String,int,etc. OTHER Otherattributes the,extra,etc. 31

PAGE 32

CHAPTER4 ONLINEREFERRINGEXPRESSIONEXTRACTION Oneoftheessentialstepstoimplementreferenceresolutioninatutorialdialogue systemistoidentifyreferringexpressions,whicharenounphrases,inuserutterancesin realtime.ThisisachallengingtaskinatutorialdialoguesystemforJavaprogramming. Languageusedinsuchadialogueisusuallyinformal.Utterancesmaycontainmany domain-speciccomponents,suchasJavaprogramsegments.Toaccuratelyidentify nounphrasesintheseutterances,weneedanaccuratepart-of-speech(POS)tagger.POS taggingisaveryimportantstepfornounphrasechunking,whichistheapproachusedto tagnounphrasesinagivensentence.Sincereferringexpressionsarenounphrasesinan utterance,weneedtorstidentifyallofthenounphrasesinthisutterance.Notallnoun phraseshavereferentsinthesituatedenvironment.Weareonlyinterestedinnounphrases thatrefertoobjectsintheenvironment,inthiscasetheJavacode.Consequently,weneed aclassicationsteptorstidentifythereferringexpressionsthatareinterestingtous. Thischapterincludestwosections.Sectiononereportsonanunsupervisedapproach Idevelopedforpart-of-speechtagginginsituatedlanguage.Sectiontwoconductsnoun phrasechunkingforutterancesintutorialdialogue.Ihavedevelopedandevaluatedthese techniquestodateoncorpora.However,aswillbedescribedinChapter 7 ,Ideploythese approacheswithinareal-timetutorialdialoguesystem.Theprocessofreferringexpression extractionisshowninFigure 4-1 4.1Part-of-speechTaggingforDomain-specicLanguage Inthissection,Ireportanovelbutsimpledomain-adaptationapproachthatI developedtoimprovepart-of-speechtaggingintask-orienteddialogue.Thisapproach automaticallygeneratesanannotateddomain-specictrainingcorpuswithoutanymanual annotation.InacorpusoftextualdialogueforJavaprogramming,experimentsshoweda largeimprovementovertheStanfordtagger.Comparedtoataggertrainedonthesame sourcedata(whichincludesdialogue)butwithnodomainadaptation,overallaccuracy 32

PAGE 33

but why do that when I could just use the string zip from the actionPerformed method but why do that when I could just use the string zip from the actionPerformed method CC WRB VBP DT WRB PRP MD RB VB DT NN NN IN DT NN NN but why do that when I could just use the string zip from the actionPerformed method but why do that when I could just use the string zip from the actionPerformed method CC WRB VBP DT WRB PRP MD RB VB DT NN NN IN DT NN NN !"#$%&''()'$ *+,)$!-.&/0$1-,)2()'$ 13&//(45&6+) $ Figure4-1. Stepsforreferringexpressionextraction. improvedfrom87.14%to92.76%.Fornouns,whicharethemostessentialwordclassfor referringexpressionidentication,thenewapproachresultsinadramaticimprovement fromaF1-scoreof0.701to0.903. Accuratepartofspeech(POS)taggingisessentialformanynaturallanguage processingtasks,includingnaturallanguageunderstandingindialoguesystems.Most POStaggersaretrainedonlargenewswirecorporathatsupportgoodperformance onopen-domainlanguage.However,thesetaggersencounterperformancedegradation whenappliedtodomain-speciclanguage( JiangandZhai 2007 ),whichisoftenusedin task-orienteddialogue.Thisdegradationisduepartlytounknowntokens,butalsodue tohowknowntokensareused.Forexample,inaJavaprogrammingtutorialdialogue,we seeutterancessuchas,"whatImightcoulddoiswriteifstatementstoseewhatrange sum%10isin,"or,"...soStringa=newString(zipCode);wouldwork."Dialoguesystems mustbeabletoparsethiskindofuserutterancetoreactproperly.Thereismuchroom forimprovementindomain-specicPOStagging:ontheJava-programmingdialogues corpususedinthiswork,theStanfordtaggerachieved85.57%accuracy,comparedtoits 97.32%accuracyonthetypeoflanguageonwhichitwastrained( Manning 2011 ). 33

PAGE 34

PreviousworkondomainadaptionforPOStagginghasincludedaddingannotated targetdomaindata( JiangandZhai 2007 ; Daume 2009 )andusingdictionariestomine patternsfromdomainlanguages( Hovyetal. 2015 ; Lietal. 2012 ).Wepresentadi!erent perspectiveonPOStaggingwhichdoesnotrequireanymanuallabeling.Wearguethat generatingagrammaticalsentenceinanewdomainiseasierthanparsingagivensentence fromthesamedomain,assumingthatwecaneasilyextractsomedomainlanguagefrom othersources.Thedomainlanguageisnotannotatedperse,butbecauseofthecontext inwhichitoccurs,itsPOStagcanbeinferred.Wethengenerateanewsetofsentences forourtargetdomain-speciclanguagewithPOStagsknown,andwebuildataggerusing thegeneratedcorpusastrainingdata. Theapproacheswastestedon5sessionsJavatutoringdatacollectedusingRipple (mentionedinthepreviouschapter).Theother40sessionswereusedtogeneratetraining data.Thiswillbediscussedindetaillaterinthischapter.Oursimpleyete!ective methodimprovesupontheStanfordtagger'sperformanceondomain-speciclanguage forJavaprogramming,achieving92.76%accuracycomparedtoStanford's85.57%,and wedosowithoutmanuallytagginganynewdomain-speciclanguage.Thenewapproach achievedarecallof91.9%fornouns(NN)(whichaccountfor17%ofallthetokens) comparedwith58.2%fromabaselinetaggertrainedonthesamesourcecorpuswithout domainadaptationand71.6%bytheStanfordtagger.TheaccuracyforsomeotherPOS tags,suchasadjectives(JJ)andpasttenseverbs(VBD)alsoimprovedsignicantlywith thereportedapproach,asdidoverallprecisionandrecallforallofthePOStags. 4.1.1Approach Thereportedapproachisbasedontheobservationthatopen-domainPOStagging errorsindomain-speciclanguageoftenoccurinnounphrases.Forexample,"if statement"isanounphraseinthedomainofJavaprogramming,buttaggerstrained onnewswirerecognize"if"asasubordinateconjunctioninsteadofanoun.Theyalso cannotrecognizeexamplessuchasthepreviouslymentionedchunkofcode"Stringa= 34

PAGE 35

newString(zipCode);"asnounphrases.Itwouldbechallengingtoinduceagrammar fromanunlabeledcorpusthatcontainsalargeproportionoftokensservinganew grammaticalrole.Moreover,itisdi# culttotagthesephrasesusingpreprocessing,since thecode-like-phrasesusedinnaturallanguagetendtobeinformalandneitherfollow syntacticrulesoftheprogramminglanguagenorthenaturallanguageinwhichtheyare embedded.Ourapproachaddressesthisproblembygeneratinggrammatical(thoughnot semanticallymeaningful)sentencesbysubstitutingdomain-specicnounphrasesinplace ofnounphrasesinpreviouslyannotatedsourcelanguage. TocreateaPOStaggerforthetargetlanguage,weusedanannotatedsource corpus(CoNLL2000( TjongandSang 2000 ))andasetofdomain-specicnounphrases generatedfromacorpusofJavaprograms.Weleveragethemanysimilaritiesbetweenthis domain-speciclanguageandmoreopen-domainlanguagesuchasnewswire:forexample, mostotherpartsofdomain-specicsentences,suchas"whatImightcoulddoiswrite..." and"so...wouldwork"stillfollowEnglishgrammar.Basedonthissimpleidea,we generateacorpusforthetargetdomain,whichisautomaticallyannotatedintheprocess ofgeneration.Theapproachsubstitutesdomain-specicchunksintolabeledsentences fromthesourcecorpusbyreplacingpartofanexistingnounphrasetogenerateatarget trainingcorpus.Finally,aPOStaggeristrainedonthiscorpustoperformPOStagging forthetargetdomain. Domain-specicNounPhraseGeneration .Togenerateasetoflabeled sentencesastrainingdataforPOStagging,thereportedapproachrequiresthatwe rstgenerateasetofdomain-specicnounphrases.ForthedomainofJavaprogramming, weextractednounphrasesfromsourcecodethathadbeencreatedduringdialoguesfrom ouroriginalin-domaincorpus.(Laterinthissectionwerefertothosedialoguesasthe extractionset.ThesedialogueswerenotthesameonesusedtotestthePOStagger.) WebeganbytokenizingeachlineofcodefromtheJavaprograms.Then,we extractedunigrams,bigrams,andtrigramsfromthetokenizedJavacodeandtreated 35

PAGE 36

theseasdomain-specicnounphrases.Eachtokenwastaggedasanoun(exceptthat digitsweretaggedasnumbers).Theresultisasetofdomain-specicphraseswithknown POStagsforeachtoken. LabeledTargetDataGeneration .Givenagrammaticalsentence s source ,which isasentencefromasourcelanguage,if s source containsanoun n source ,wecouldcreate anothergrammaticalsentence s target byreplacing n source withadomain-specicnoun, n target .Recallthatanounphraseis"aphrasewhichhasanoun(orindenitepronoun) asitsheadword,orwhichperformsthesamegrammaticalfunctionassuchaphrase" ( Crystal 1997 ).Foragivensentencefromthesourcecorpusthathasbeentaggedwith POSlabels(suchasCoNLL2000),werstcheckifitcontainsanounphrase.Wereplace theheadofanounphrasein s source withadomain-specicnounphrase.Anexample isshowninFigure 4-2 ,whichshowsthatthedeterminerandadjectivemodierofthe nounphrasearenotreplaced.Thegenerated s target doesnotsemanticallymakesense, butitisgrammatical,anditislabeledwithPOStags.Wegenerateasentence s target for everydomain-specicnounphrasegeneratedbythetechniquedescribedintheprevious subsection.Inthisway,wecreateanannotatedtrainingsetforthetargetdomain. TrainingPOSTaggers .Wetrainedconditionalrandomeld(CRF)POStaggers onthesourcecorpusandthegeneratedtargetdomaintrainingcorpusrespectively La !erty etal. ( 2001 ).Wethentestedthemodelsonthetargetdomaintestingcorpus,which consistsoforiginaldialogues(notgenerateddialogues). 4.1.2ExperimentsandResults First,thetargetcorpuswassplitintotwosets:theextractionsetwith40dialogue sessions,andthetestingsetwith5dialoguesessions.(Eachdialoguesessionrepresents approximatelyonehouroftextualdialogueandcollaborativeconstructionofJavacode.) Thetestingsetcontains687sentencesand6581tokens.WetrainedPOStaggersusing sourcedataandtheautomaticallygeneratedtargetdata,whichservesasthetraining data.Bothofthesetaggersweretestedontheoriginal(notgenerated)dialoguesfrom 36

PAGE 37

Confidence in the pond is widely expected to NN IN DT NN VBZ RB VBN TO String a = new String ( Confidence in the pond is widely expected to take NN IN DT NN VBZ RB VBN TO VB NP another sharp String a = new String ( DT JJ NN NN NN NN NN NN NP take another sharp dive VB DT JJ NN Figure4-2. Exampleoftargetsentencegeneration. thetestingset.WealsocomparedourtrainedPOStaggerswithresultsfromthelatest Stanfordtagger(v3.7.0)( Toutanovaetal. 2003 ). First,wetrainedtheBaselinePOStaggeronallthelabeledsentencesfromthe CoNLL2000corpusandtheNPSchatcorpus.Weexpectedthistaggernottoperformwell becausealthoughitincludeddialogues,itdidnotincludeanydomain-speciclanguagefor thetargetdomain. Next,usingourapproach,wetrainedataggerforthetargetdomainbyleveraging thegeneratedsentences.Foreachextracteddomain-specicnounphrase,werandomly selectedasentencefromCoNLL2000 1 topluginthedomain-specicnounphraseto generatealabeledtargetsentence.Wegenerated96,011targetsentencesinthisstep.A POStaggerwasthentrainedusingthesegeneratedtargetsentencesalongwithallofthe sentencesfromtheNPSchatcorpus.TheBaselineCRFtagger,theStanfordtagger,and theLiApproachtaggerwerealltestedondialoguesinthetestingset. 1 WechoseCoNLL2000becauseithasIOBtags,whichmakesthesubstitutionsimple. 37

PAGE 38

Table4-1. Resultsofbaselinetagger(CRFtrainedonsource-domaincorpus),Stanfordtagger,andourapproach(CRF trainedongeneratedtarget-domaincorpus). totalNNINRBVBZJJNNSVBGVBD Num.6571 1129 5114262172051109956 prec.0.9060.8820.9260.9850.980 0.680 0.7240.7900.711 Baselinerecall0.871 0.582 0.9790.8970.8890.9020.9550.9900.964 F10.879 0.701 0.9520.9390.9370.7760.8240.8790.818 prec.0.900 0.932 0.8170.6970.968 0.668 0.7940.9800.786 Stanfordrecall0.856 0.716 0.9410.8870.9770.8440.9820.9700.786 F10.8590.8100.8750.781 0.972 0.7460.878 0.975 0.786 Liapproachprec. 0.930 0.8870.9260.9800.9810.8540.9110.9330.730 with recall 0.928 0.9190.9820.9180.9540.8590.8360.9800.964 parallelcodeF1 0.9270.9030.9540.948 0.967 0.856 0.8720.9560.831 Liapproachprec.0.9200.8850.9280.9670.9850.7440.8720.9520.743 with recall0.9140.8690.9800.8900.9120.8780.9270.9900.982 generalcodeF10.9150.8770.9530.9270.9470.805 0.899 0.970 0.846 38

PAGE 39

TheaccuracyforBaselineTagger,StanfordTaggerandLiTaggerwere87.14%, 85.57%and92.76%,respectively.TheBaselineTaggerperformedbetterthantheStanford Tagger,sinceitstrainingsetwaspartlyconversationaldata(NPSchatcorpus).Table 4-1 illustratesthecombinedprecision,recall,andF1-scoreforthetestingsetandthesame measurementsforsomeofthemostfrequentlyoccurringPOStags.Theoverallprecision, recall,andF1-scorewereallimprovedbyourapproach.TheF1-scoreincreasedfrom0.879 (Baseline)to0.927(LiApproach),andbotharehigherthantheStanfordtagger(0.859). TheopendomaintaggertrainedwiththeNPScorpusachieved0.834accuracy. Fornounphrasesinparticular,whichconstitutethelargestproportionoftokens (17%),ourapproachperformedparticularlywell.Nounphrasesindomain-specic languagearehardtoidentify:theBaselinetaggerachievedrecallonNNofonly0.582,and theStanfordTaggerperformedworseonNNthanonanyotherfrequentlyoccurringtag intheset,at0.716.OurapproachachievedrecallonNNof0.919.BesidesNNtokens,our approachalsoachievedamuchhigherperformanceonadjectives(JJ),withanF1-scoreof 0.856comparedto0.776forBaselineand0.746forStanford. TheJavacodeweusedtogeneratethedomain-specictrainingcorpuswasparallel withthedialogues,whichisnotalwaysavailable.Toexaminewhetherthisapproachcould useunrelatedJavacode,wecollected1968linesofJavacodefromOracle'sTheJava TM Tutorials.Withthesameapproach,wegenerateddomain-specictrainingdataandtested onthesametestset.Thismodelachieved0.913accuracy,slightlylowerthanthemodel trainedwithparallelcode,butstillmuchhigherthanmodelswithoutdomainadaptation. 4.2NounPhraseChunkinginTutorialDialogue Nounphrasechunkingisatypeofsyntacticanalysiswhichlabelsallnounphrases inasentence( TjongandSang 2000 ).WiththePOStagsgeneratedusingtheapproach presentedabove,weperformednounphrasechunkingoftutorialdialogueutterances usingalinearchainconditionalrandomeld(CRF)( La !ertyetal. 2001 ).Inatutorial dialoguesystem,thisprocesswillndallnounphrasesinuserutterances.Thesenoun 39

PAGE 40

phrasesarepotentiallyreferringexpressionswhichrefertosomeobjectsintheshared programmingenvironment.Wefollowedtheapproachinpriorworktoperformnoun phrasechunking( ShaandPereira 2003 ).Thisapproachistestedonanexistingcorpus andwillbedeployedinthedialoguesysteminChapter 7 .WeuseaBIOtaggingschema, whichannotateseachwordinaninputsentence.Eachwordisassignedwithatag:B indicates"beginningofaphrasechunk",Iindicates"inaphrasechunk",andOmeans "outofaphrasechunk".Forexample,inannotatedsentence"but/Owhy/Odo/B-VP that/B-NPwhen/OI/B-NPcould/Ojust/Ouse/B-VPthe/B-NPstring/I-NPzip/I-NP from/Othe/B-NPactionPerformed/B-NPmethod/B-NP",B-NPindicatesthebeginning ofanounphrase,I-NPmeansthecorrespondingwordisinsideanounphrase,Omeans thecorrespondingwordisnotinanyphrasechunk.So,"the/B-NPstring/I-NPzip/I-NP" formsacompletenounphraseaccordingtotheannotation.Giventhistaggingschema,we trainedaconditionalrandomeldtaggertotagallofthenounphrasesforagiveninput sentence. Linearchainconditionalrandomeld(CRF)isadiscriminativegraphicalmodelfor sequentialdatatagging.Inthisnounphrasechunkingapplication,weusedittoassign BIOtagstoeachtokeninainputwordsequence W = w 0 ,w 1 ,...,w n .Givenaword sequence W ,theprobabilityofaspecictagsequence A = a 0 ,a 1 ,...,a n iscalculatedas: p ( A | W )= 1 Z ( W ) exp ( n i =1 m j =1 j f j ( i,w,a i ,a i 1 )) Thetagsequencewiththehighestprobabilityisselectedastheoptimalannotation: A = argmax i p ( A i | W ) Fortrainingdata,weuseddatafromthesharedtaskforCoNLL-2000( Tjongand Sang 2000 ).ThiscorpuscontainspartoftheWallStreetJournalcorpuswithBIO annotationsofphrases.Itcontains211727tokensintotal. 40

PAGE 41

ThisCRF-basedapproachemployedlexicalfeaturesandPOStagsofwordsina sentenceasfeatures.Brill'stransformation-basedlearningapproachwasoneofthemost inuentialPOStaggingapproaches Brill ( 1995 ).Someofthefeaturesaresimilartothe rulesusedinBrill'swork.AcompletelistoffeaturescanbefoundbelowinTable 4-3 Table4-2. Nounphrasechunkingresult. tag precisionrecallF1#ofinstances B-NP 0.75 0.910.822352 BaselineI-NP 0.87 0.750.801913 B-NP,I-NPcomb0.80 0.840.824265 B-NP 0.79 0.910.852352 ProposedI-NP 0.84 0.940.891913 B-NP,I-NPcomb 0.810.920.86 4265 ThenounphrasechunkingresultsareshowninTable 4-2 .Thedomainadaptation approachincreasedtheF1-scoreofnounphrasechunkingfrom0.82to0.86.Thenew approachimprovedtherecallfrom0.84to0.92. Table4-3. Thefeaturesusedfornounphrasechunking. features thewordinlowercase thelastthreelettersoftheword thelasttwolettersoftheword ifthewordisinuppercase ifthewordistitlecase ifthewordisanumber theword'sPOStag thelasttwolettersoftheword'sPOStag thepreviouswordinlowercase ifthepreviouswordisinuppercase ifthepreviouswordistitlecase ifthepreviouswordisanumber thepreviousword'sPOStag thefollowingwordinlowercase ifthefollowingwordisinuppercase ifthefollowingwordistitlecase ifthefollowingwordisanumber thefollowingword'sPOStag 41

PAGE 42

4.3Discussion Qualitativeexaminationshowsthewaysinwhichtheproposedapproachimproved overpriorapproaches.Theexamplesentenceusedintheintroductionwastaggedas:"... so IN,IN String NN,NNP a NN,DT = NN,JJ new NN,JJ String NN,NNP ( NN,-LRBzipCode NN,NN ) NN,-RRB; NN,: would MD,MD ".Thetagoftherstsubscript(blue)wasfromtheproposed approach,andthesecond(gray)wasfromthebaselinetagger. Theproposedapproachalsoperformedverywellondetectingchangeofusagefor domain-specictokens,suchas"theifstatement"and"theforloop."Theproposed approachcorrectlytagged"if"and"for"inthesecasesasNN,whileinphrasessuchas"if Iuse..."and"...forthismethod...,"theywerecorrectlytaggedasIN.NeithertheBaseline northeStanfordTaggercoulddothis.Toillustrate,consideranexcerptsentencefromthe testset:"that DT,DT line NN,NN you PRP,PRP just RB,RB typed VBD,VBD can MD,MD be VB,VB put VBN,VBN in IN,IN the DT,DT ( NN,-LRB) NN,-RRBof IN,IN the DT,DT for NN,IN loop NN,NN ". InearlierworkondomainadaptationforPOStagging,researchershaveused semi-supervisedapproaches,whichemployasmallannotatedcorpusofthetarget languageandalargeannotatedsourcelanguagecorpustotrainaPOStaggerfor thetargetlanguage( JiangandZhai 2007 ; Daume 2009 ; FinkelandManning 2009 ; GarretteandBaldridge 2013 ; Planketal. 2014 ).Therehasalsobeensomework usingunsupervisedapproachestoperformdomainadaptation,suchasbyemploying structuralcorrespondencelearning( Blitzer 2006 ),andwordclusterslearnedfrom unlabeledtargetdataset( Owoputietal. 2013 ).Crowd-sourcinghasalsobeenleveraged toimplementdomainadaptationforPOStagging( Hovyetal. 2015 ; Lietal. 2012 ).The approachreportedinthischaptergenerateslabeledtrainingdataforthetargetlanguage automaticallyandthusdramaticallysimpliestheproblem. Thischapterhasreportedasimplebute! ectivedomainadaptationapproachfor POStagging.Bothquantitativeandqualitativeevaluationbasedonacorpusofinformal textualdialoguesforJavaprogrammingdemonstratedthee!ectivenessoftheapproach 42

PAGE 43

comparedtoaBaselineapproachandtheStanfordtagger.Theperformanceofthe reportedapproachwasparticularlyevidentonchallengingnounphrasesinthetarget language.Experimentsshowedthatevenwhenusingdomaintokensunrelatedtothe targettestingcorpus,thereportedapproachdramaticallyimprovedPOStaggingontarget language.Thisisanessentialstepforaccuratereferringexpressionextraction. 43

PAGE 44

CHAPTER5 SEMANTICINTERPRETATIONOFREFERRINGEXPRESSIONS ThischapterpresentsanovelapproachIcreatedtoperformsemanticinterpretation ofreferringexpressionswithinasituatedenvironment.Recallthatasituateddialogueis embeddedinanenvironment,wherethedialogueusuallyfocusesonadomain-specictask withinthisenvironment.Referringexpressionsarenounphrasesusedtorefertoentities inthesituatedenvironment.InthecontextoftutorialdialogueforJavaprogramming, asshowninFigure 1-1 atthebeginningoftheintroduction,nounphraseslike"the 2dimensionalarray",and"theforloop"allrefertosomeentityintheparallelJava program.ThesenounphrasesarereferringexpressionsinthesituateddialogueforJava programming. Theapproachpresentedinthischapterperformsjointsegmentationandlabelingof thenounphrasestolinkthemtoattributesofentitieswithintheenvironment.Itisanew waytoprovidesemanticinformationforreferenceresolutioninasituatedenvironment. EvaluationresultsonacorpusoftutorialdialogueforJavaprogrammingdemonstratethat aConditionalRandomField(CRF)modelperformswell,achievinganaccuracyof89.3% forlinkingsemanticsegmentstothecorrectentityattributes.Thisworkisasteptoward enablingdialoguesystemstoperformaccuratereferenceresolution. Previousapproachesforsemanticinterpretationincludedomain-specicgrammars ( Lemonetal. 2001 )andopen-domainparserstogetherwithadomain-speciclexicon ( Rose 2000 ).However,existingtechniquesarenotsu#cienttosupportincreasingly complextask-orienteddialoguesduetoseveralchallenges.Forexample,domain-specic grammarsbecomeintractablewhenappliedtomoreill-formeddomains,andopen-domain parsersmaynotperformwellacrossdomains( McCloskyetal. 2010 ). Toaddressthesechallenges,thischapterpresentsasteptowardreferenceresolution insituateddialoguesforcomplexproblem-solving,inwhichthenumberofpotential entities(e.g.aJavavariableorapieceofcode)isinnite.Thepresentworkfocuses 44

PAGE 45

onthesemanticinterpretationofnounphrases,whichtendtobearsignicantsemantic informationforeachutterance.Althoughnounphrasesaretypicallysmallintheir numberoftokens,theircomplexityandsemanticsvaryinimportantways.Forexample, inthedomainofcomputerprogramming,twosimilarnounphrasessuchas"the2 dimensionalarray"and"the3dimensionalarray"refertotwodi!erententitieswithin theproblem-solvingartifact.Inferringthesemanticstructureofthenounphrasesis necessarytodi!erentiatethesetworeferenceswithinadialogue,togroundtheminthe task,andtorespondtothemappropriately.Coreferenceresolutionfocusesondiscovering thecoreferencerelationshipbetweenpairsofnounphrasesinapieceofnaturallanguage text( Culottaetal. 2007 ; LappinandLeass 1994 ),whichissimilartotheultimategoal ofreferenceresolutionincomplexproblemsolving.However,di!erentfromcoreference resolution,referenceresolutionlinksnaturallanguageexpressionstoentitiesinareal worldenvironment.Comparingwithnaturallanguageexpressions,realworldentities containricherinformationthatcouldbeutilizedinthetaskofreferenceresolution.In addition,thesituatedcharacterofthedialoguesgeneratedincomplexproblemsolving introducesmoreuncertaintytothemeaningofnounphrasesusedtorefertoanentity thanthatinapieceofself-containnaturallanguagetext;e.g.saying"thatvariable"by highlightingavariableinJavacode.Tofullyunderstand"thatvariable"requiresmore contextinformationintheenvironmentinwhichthisnounphraseisgenerated. Thecurrentapproachleveragesthestructureofnounphrases,mappingtheir segmentstoattributesofentitiestowhichtheyshouldbesemanticallylinked.Inorderto overcomethelimitationofneedingtofullyenumeratetheentitiesintheenvironment,we representtheentitiesasautomaticallyextractedvectorsofattributes.Wethenperform jointsegmentationandlabelingofthenounphrasesinuserutterancestomapthemto theentityvectors(usedtodescribeentitieswithintheenvironment).Inthisway,the semanticsofnounphrasescouldbegroundedbylinkingsegmentsofnounphrasesto attributesofentitiesintheenvironment.TheresultsshowthataConditionalRandom 45

PAGE 46

Fieldperformswellforthistask,achieving89.3%accuracy.Moreover,eveninthe absenceoflexicalfeatures(usingonlydependencyparsefeaturesandpartsofspeech),the modelachieves71.3%accuracy,indicatingthatitmaybetoleranttounseenwords.The exibilityofthisapproachisdueinparttothefactthatitdoesnotrelyonasyntactic parsersabilitytoaccuratelysegmentwithinnounphrases,butratherincludesparse featuresasjustonetypeoffeatureamongseveralmadeavailabletothemodel.Finally,in contrasttomethodsbasedonbag-of-wordssuchaslatentsemanticanalysis,thereported approachmodelsthestructureofnounphrasestofacilitatespecicgroundingwithinan artifact. 5.1SemanticInterpretationasSequenceLabeling Tointerpretthedialogueutterancesasdescribedabove,ourapproachfocusesrst uponnounphrases,whichcontainrichsemanticinformation.Thissectionintroducesthe approach,basedonConditionalRandomFields,tojointlysegmentthenounphrasesand linkthosesegmentstoentitieswithinthedomain. 5.1.1NounPhrasesinDomainLanguage Anounphraseisdenedas"aphrasewhichhasanoun(orindenitepronoun) asitsheadword,orwhichperformsthesamegrammaticalfunctionassuchaphrase" ( Crystal 1997 ).Thesyntacticstructureofanounphraseconsistsofdependentswhich couldincludedeterminers,adjectives,prepositionalphrases,orevenaclause.Forexample, thenounphrase"a2dimensionalarray"occurswithintheJavaprogrammingcorpus.Its headis"array"anditsdependentsare"a"asthedeterminerand"2dimensional"asan adjectivephrase.Inthissimplecasethesyntacticboundariesalsoindicatesemantic segments,asthesedependentsindicateoneormoreattributesofthehead.Ifthis relationshipwerealwaystrue,thesemanticstructureunderstandingtaskwouldbea labelingtaskthatonlyrequiresassigningasemantictagtoeachsyntacticsegmentofthe nounphrase.Butthisisnotalwaystrue,inpartbecauseasyntacticparsertrainedon anopen-domaincorpuswillnotnecessarilyperformwellondomainlanguage( McClosky 46

PAGE 47

NP NP PP DT JJ IN NP NN the outer for loop Figure5-1. AparseoftheouterforloopfromStanfordParser. etal. 2010 ).Forexample,inthenounphrase"theouterforloop,"whichalsooccurs intheJavaprogrammingcorpus,theheadofthenounphraseis"forloop,"butthe syntacticparse(generatedbytheStanfordparser)ofthisnounphraseunderstandably (butincorrectly)identiesthisheadaspartofaprepositionalphrase(Figure 5-1 ). Toaddressthischallenge,thischapterdescribesajointsegmentationandsemantic labelingapproachthatdoesnotrequireaccuratesyntacticparsingwithinnounphrases. Inthisapproachtheheadanddependentsofeachnounphraseareeachreferredtoasa segment,withexactlyonesegmentperdependent,andoneormorewordspersegment. Identifyingthesesegmentscorrectlyisessentialtocorrectassignmentofsemantictags. Pipelinemethodsforsemanticsegmentationrelyonstableperformanceofanopen domainparser,butasdescribedabove,thisassumptionisnotdesirableforgrounding somedomainlanguage.Wethereforeutilizejointsegmentationandlabeling,andapply aConditionalRandomFieldapproach( La !ertyetal. 2001 ),anaturalchoiceforthe sequentialdatasegmentationandlabelingproblem. 47

PAGE 48

5.1.2DescriptionVector Thegoalistogroundeachnounphrasetoanentitywithintheproblem-solving artifact,whichconstitutesthe"world"inthisdomain.Todothis,wewilllinkeach semanticsegmentinanounphrasetoanattributeofanentityintheworld.Becausethe worldcancontainanyofaninnitesetofuser-createdentities,representationcannotrely uponexhaustivelyenumeratingtheentities.Torepresentanentityinthedomain,we deneadescriptionvector V whichdenestheattributetypesforentitiesinthedomain. Then,anentity O inthedomainisrepresenteduniquelybyaninstanceof V .Thevalues ofeach V i indicatethevalueoftheattributeof O ,asillustratedinTable 3-1 .This denitionofthedescriptionvectorreliesuponthestructureofthedomainbyfactorizing theattributesofentities.Withthisrepresentation,interpretinganounphraseinvolves linkingeachsegmentofthenounphrasetoacellinthedescriptionvector.Formally,we representanounphraseasaseriesofsegments: NP = where s i isthe i t h segmentinthisnounphrase.Anounphraseisalsoasequenceof words: NP = whereeach w j isthe j t h wordinthenounphrase.Thereforeeachsegmentisaseriesof words: s i = where l isthelengthofsemanticsegment i .Givenanounphrase,thesegmentation problemisthusmaximizingthefollowingconditionalprobability: p ( | ) Complementarytothesegmentationproblemisthesemanticlinkingproblem,whichisto link s i toanattribute a i ,whichisthelabelofthe i t h attributeintheentitydescription 48

PAGE 49

"a 2 dimensional array" w 1 w 2 w 3 w 4 NUM ARR_DIM ARR_DIM CATEG. a 1 a 2 a 2 a 4 s 1 s 2 s 3 a 1 a 2 a 3 NUM ARR_DIM CATEG. Figure5-2. SegmentationandsemanticlinkingofNP"a2dimensionalarray". vector.Thatis,wewishtomaximizetheprobabilityoftheattributelabelsequencea giventhesegmentsofthenounphrase: p ( | ) Takingconsecutivewordswiththesameattributelabelasthesamesemanticsegment,the nounphrasesegmentationandsemanticlinkingproblemisthen: argmax a { p ( | ) } Inthetagsequence ,if a i and a i +1 arethesame,then w i and w i +1 are assignedtothesamesemanticsegmentwithtag a i .Theprocessofsegmentationand semanticlinkingisillustratedinFigure 5.1.2 5.1.3JointSegmentationandLabeling Inordertoperformthisjointsegmentationandlabeling,weutilizeaConditional RandomField(CRF),whichisaclassicapproachforsequencesegmentationandlabeling ( La !ertyetal. 2001 ).Giventhelinearnatureofourdata,weemployalinearchainCRF. Specically,givenasequenceofwords w ,theprobabilityofalabelsequence a isdened 49

PAGE 50

as p ( a | w )= 1 Z ( w ) exp ( n i =1 m j =1 j f j ( i,w,a i ,a i 1 )) where f j ( i,w,a i ,a i 1 )isafeaturefunction.Theweights j ofthisfeaturefunctionare learnedwithinthetrainingprocess.Thenormalizationfunction Z ( w )isthesumoverthe weightedfeaturefunctionforallpossiblelabelsequences: Z ( w )= a exp ( n i =1 m j =1 j f j ( i,w,a i ,a i 1 )) Theoptimallabeling a istheonethatmaximizesthelikelihoodofthetrainingset, whereKisthenumberofnounphrasesinthecorpus. a = argmax ( K i =1 log P ( a ( i ) | w ( i ) )) 5.1.4Features Next,weintroducethefeaturesusedtotraintheCRF.Thefeaturefunction f j ( i,w,a i ,a i 1 )wasdenedasabinaryfunction,inwhichwisafeaturevalue.Weuse bothlexicalandsyntacticfeatures.InatrainedCRFmodel,thevalueof f i ( i,w,a i ,a i 1 ) isknowngivenacombinationofparameters( i,w,a i ,a i 1 ).Thefeaturesusedinthe CRFmodelincludewordsthemselves,wordlemmas,partsofspeech,anddependency relationshipsfromthesyntacticparse.Theworditself,lemmatizedwords,andparts-of-speech haveallbeenshownusefulwithinsegmentationandlabelingtasks,sotheyaremade availablehere( XueandPalmer 2004 ).Eachofthesefeaturesisrepresentedascategorical data.Forexample,awordisrepresentedasitsindexinalistofallofthewordsthat appearedinthecorpus. Thedependencystructureofnaturallanguagehasalsobeenshowntobeimportantin semanticinterpretation( PoonandDomingos 2009 ).Thischapteremploysadependency featurevectorextractedfromdependencyparses.Theheadwordofeachnounphraseis therootofthedependencytree.Eachdependentisasub-treedirectlyunderthehead. 50

PAGE 51

head det array a 2 dimensional amod dependent 2 dependent 1 Figure5-3. Dependencystructureof"a2dimensionalarray". Wedesignthedependencyfeatureasasequenceofdependencylabelsasfollows.Givena dependencytree,wordsineachsemanticsegmentofthenounphraseareassignedatag accordingtotherelationshipbetweenthemandthehead.Therelationshipbetweeneach segmentandheadisdenedbythedependencytypeinthedependencytree.Forexample, thedependencytreeof"a2dimensionalarray"isshowninFigure 5-3 .Thedependency featuresare .Inthisway,thedependencyinformationfroman open-domainparserisencodedasafeaturetothesemanticlabelingmodel. 5.2ExperimentsandResults ThegoaloftheexperimentsistodeterminehowwellthetrainedCRFcansegment nounphrasesandlinkthesesegmentstothecorrectattributeofentitiesintheworld.This sectionpresentstheexperimentsusingCRFstrainedandtestedontheJavaprogramming tutorialdialoguecorpus.Asdescribedbelow,theresultswereevaluatedbycomparing withmanuallylabeleddata.Nounphrasesfromthetutorialdialogueswererstmanually extractedandannotatedastotheirslotsinthedescriptionvectordescribedinSection 5.1.2 .Therewere364groundednounphrasesextractedmanuallyfromthesixtutorial dialoguesessionsusedinthecurrentwork.Eachofthesenounphrasesextractedhasone ormultiplecorrespondingentitiesintheprogrammingartifact.Sinceeachwordinanoun 51

PAGE 52

phraseislinkedtoanelementinthedescriptionvector,theindicesinthisvectorwere usedasthelabelforeachword.Annotationofall346nounphraseswasperformedby oneannotator,and20%ofthenounphrases(70nounphrases)weredoublyannotated byanindependentsecondannotator.Thepercentagreementwas85.3%andtheKappa was0.765.Toextractfeatures,thelemmatizationandsyntacticparsingwereperformed withtheStanfordCoreNLPtoolkit( Manningetal. 2014 ).Then,aCRFwastrainedto predictthelabelforeachwordinanewnounphrase.Thetrainingwasperformedwith thecrfChaintoolbox( SchmidtandSwersky 2008 ). Weuseten-foldcross-validationtoevaluatetheperformanceoftheCRFinthis problem.Resultswithdi!erentfeaturecombinationsareshowninTable 5-1 .Manually labeleddataweretakenasgroundtruthforcomputingaccuracy,whichisdenedasthe percentageofsegmentscorrectlylabeled.Recallthatconsecutivewordswiththesame labelinanounphrasearetreatedasasegment.Therefore,ifasegment s CRF identied bytheCRFhasthesameboundaryandthesamelabelasasegment s Human inthe nounphrasecontaining s CRF ,thissegment s CRF willbecountedasacorrectsegment. Otherwise, s CRF willbecountedasincorrect.Theaccuracyisthencalculatedasthe numberofcorrectsegmentsidentiedbytheCRFdividedbythenumberofsegments annotatedmanually.AscanbeseeninTable 5-1 ,allofthemodelsperformsubstantially betterthanaminimalchancebaselineof43%,whichwouldresultfromtakingeach wordasasegmentandassigningitwiththemostfrequentattributelabel.Theresults demonstrateimportantcharacteristicsofthesegmentationandlabelingmodel.First, unlikemostprevioussemanticinterpretationwork,oursemanticinterpretationofnoun phrasesdoesnotrelyonaccuratesyntacticparsewithinnounphrases.Rather,weuse adependencyparsefromanopen-domainparserasonlyoneofseveraltypesoffeatures providedtothemodel.Thesedependencyfeaturesimprovedthemodelinmostfeature combinations(Table 5-1 ).Thefeaturecombinationofwords,lemmas,anddependency parsesachievedthebestaccuracy,whichis4.8%higherthanthemodelthatonlyused 52

PAGE 53

wordfeatures.Thisdi!erenceisstatisticallysignicant(Wilcoxonrank-sumtest;n=10; p=0.02). Table5-1. Semanticlabelingaccuracy. features accuracy word 84.5% word+lemma 85.5% Word+Dep 87.2% lemma+Dep 89.1% word+lemma+Dep 89.3% word+lemma+POS 86.9% word+lemma+POS+Dep88.7% POS+Dep 71.3% Notably,thecombinationofpart-of-speechfeaturesanddependencyparsefeatures stillperformedat71.3%accuracy,indicatingthattosomeextent,themethodmaybe toleranttounseenwords. 53

PAGE 54

CHAPTER6 REFERENCERESOLUTIONFORSITUATEDDIALOGUESYSTEM Referenceresolutioninsituateddialoguesinacomplexenvironmentareoftenfraught withhighambiguity.InChapter 5 ,wepresentedourapproachtoextractreferring expressionsfromuserutterancesinrealtime.Giventheextractedreferringexpressions, weneedtoidentifytheirreferentsinthesituatedenvironment,whichistheproblemof referenceresolution.Inthischapter,IreportanovelapproachthatIdevelopedtoaddress thesechallengesbycombiningthelearnedsemanticstructureofreferringexpressions withdialoguehistoryintoaranking-basedmodel.Ievaluatedthenewtechniqueona corpusofhuman-humantutorialdialoguesforcomputerprogramminginthischapter. Theexperimentalresultsshowasubstantialperformanceimprovementovertworecent state-of-the-artapproaches.Thisreportedapproachmakesastridetowardautomated dialogueincomplexproblem-solvingenvironments,andwillbeusedinthetutorial dialoguesystemdescribedinChapter 7 6.1ReferenceResolutioninaSituatedEnvironment Thissectiondescribesanewapproachtoreferenceresolutioninsituateddialogue.It linkseachreferringexpressionfromthedialoguetoitsmostlikelyreferentobjectinthe environment.Ourapproachinvolvesthreemainsteps. First,referringexpressionsfromthesituateddialoguearesegmentedandlabeled accordingtotheirsemanticstructure.Usingasemanticsegmentationandlabeling approachIhavepreviouslydeveloped( LiandBoyer 2015 ),aconditionalrandomeld (CRF)isusedforthisjointsegmentationandlabelingtask,andthevaluesofthelabeled attributesarethenextracted(Section 6.2 ).Theresultofthisstepis learnedsemantics whichareattributesofobjectsexpressedwithineachreferringexpression.Then,these learnedsemanticsareutilizedwithinthenovelapproachreportedinthischapter.As Section 6.3 describes,dialogueandtaskhistoryareusedtoltertheobjectsinthe 54

PAGE 55

environmenttobuildacandidatelistofreferents,andthenasSection 6.4 describes,a ranking-basedclassicationapproachisusedtoselectthebestmatchingreferent. Forsituateddialoguewedene E t asthestateoftheenvironmentattime t E t consistsofallobjectspresentintheenvironment.Importantly,theobjectsinthe environmentvaryalongwiththedialogue:ateachmoment,newobjectscouldbecreated ( | E t | > | E t 1 | ),andexistingobjectscouldberemoved( | E t | < | E t 1 | )astheuserperforms taskactions. E t = { o i | o i isanobjectintheenvironmentattimet } Weassumethatalloftheobjects o i areobservableintheenvironment.Forexample, insituateddialoguesaboutprogramming,wecanndalloftheobjectsandextracttheir attributesusingasourcecodeparser.Then,referenceresolutionisdenedasndinga best-matching o i in E t forreferringexpression RE 6.2ReferringExpressionSemanticInterpretation Insituateddialogues,areferringexpressionmaycontainrichsemanticinformation aboutthereferent,especiallywhenthecontextofthesituateddialogueiscomplex. Approachessuchasdomain-speciclexiconsarelimitedintheirabilitytoaddressthis complexity,soweutilizealinear-chainCRFtoparsethesemanticstructureofthe referringexpressionaspresentedinSection 5 .Thismoreautomatedapproachcanalso potentiallyavoidthemanuallaborrequiredincreatingandmaintainingalexicon. Inthisapproach,everyobjectwithintheenvironmentmustberepresentedaccording toitsattributes.Wetreatthesetofallpossibleattributesofobjectsasavector,and foreachobject o i intheenvironment,weinstantiateandpopulateanattributevector Att Vec i .Forexample,theattributevectorforatwo-dimensionalarrayinacomputer programcouldbe[CATEGORY=array,DIMENSION=2,LINE=30,NAME= table,...].Weultimatelyrepresent E t = { o i } asthesetofallattributevectors Att Vec i andforareferringexpressionweaimtoidentify Att Vec j ,theactualreferent. 55

PAGE 56

Sinceareferringexpressiondescribesitsreferentseitherimplicitlyorexplicitly,the attributesexpressedinitshouldmatchtheattributesofitsreferent.Wesegmentreferring expressionsandlabelthesemanticsofeachsegmentusingtheCRFandtheresultisa setofsegments,eachofwhichrepresentssomeattributeofitsreferent.Thisprocessis illustratedin(Figure 6-1 (a)).Aftersegmentingandlabelingattributesinthereferring expressions,theattribute"values"areextractedfromeachsemanticsegmentusingregular expressions(Figure 6-1 (b)),e.g.,value"2"isextractedfrom"2dimensional"tollin the"ARRAY DIM"elementinanempty Att Vec .Theresultisanattributevectorthat representsthereferringexpression. Figure6-1. Semanticinterpretationofreferringexpressions. 6.3GeneratingaListofCandidateReferents Oncethereferringexpressionisrepresentedasanobjectattributevectorasdescribed above,wewishtolinkthatvectortotheclosest-matchingobjectintheenvironment. Eachobjectisrepresentedbyitsownattributevector,andtheremaybealargenumber ofobjectsin E t .Givenareferringexpression R k ,wewouldliketotrimthelisttokeep onlythoseobjectsthatarelikelytobereferentfor R k 56

PAGE 57

Therearetwodesiredcriteriaforgeneratingthelistofcandidatereferents.First,the actualreferentmustbeinthecandidatelist.Atthesametime,thecandidatelistshould beasshortaspossible.Wecanparedownthesetofallobjectsin E t byconsideringfocus ofattentionindialogue.Earlyapproachesperformedreferenceresolutionbyestimating eachdialogueparticipant'sfocusofattention( LappinandLeass 1994 ; Groszetal. 1995 ).AccordingtoAriel'saccessibilitytheory( Ariel 1988 ),peopletendtousemore precisedescriptionssuchaspropernamesinreferringexpressionsforreferentsinlong termmemory,anduselessprecisedescriptionssuchaspronounsforreferentsinshort termmemory.Inaprecisedescription,thereismoresemanticinformation,whileina morevaguedescriptionlikeapronoun,thereislesssemanticinformation.Thus,thesetwo sourcesofinformation,semanticsandfocusofattention,worktogetherinidentifyinga referent. Ourapproachemploysthisideaintheprocessofcandidatereferentselectionby trackingthefocusofattentionofthedialogueparticipantsfromthebeginningofthe dialoguethroughdialoguehistoryandtaskhistory,ashasbeendoneinpriorworkwe useforcomparisonwithinourexperiments( Iidaetal. 2010 ).Wealsousethelearned semanticsofthereferringexpression(representedasthereferringexpression'sattribute vector)aslteringconditionstoselectcandidates. Thecandidategenerationprocessconsistsofthreesteps. 1. Candidategenerationfromdialoguehistory DH DH = Here, O d = isasequenceofobjectsthatwerementionedsince thebeginningofthedialogue. T d = isasequenceoftimestamps whencorrespondingobjectswerementioned.Alloftheobjectsin E t thatwereever mentionedinthedialoguehistory, { o i | o i DH & o i E t } ,willalsobeaddedinto thecandidatelist. 2. Candidategenerationfromtaskhistory TH .Similarly, TH = ,whichisall oftheobjectsin E t thatwereevermanipulatedbytheuser,willbeaddedintothe candidatelist. 57

PAGE 58

Table6-1. Algorithmtoselectcandidatesusinglearnedsemantics Givenareferringexpression R k whoseattributevector Att Vec k has beenextracted. foreachelement att i of Att Vec k if att i isnotnull foreach o in E t if att i == o.att i add o intocandidatelist C k 3. Candidategenerationusinglearnedsemantics,whicharethereferent'sattributes. Givenasetofattributesextractedfromareferringexpression,allobjectsin E t with oneofthesameattributevalueswillbeaddedintothecandidatelist.Theattributes areconsideredseparatelytoavoidthecaseinwhichasingleincorrectlyextracted attributecouldruleoutthecorrectreferent.Table 6-1 showsthealgorithmusedin thisstep. 6.4Ranking-basedClassication Withthelistofcandidatereferentsinhand,weemployaranking-basedclassication modeltoidentifythemostlikelyreferent.Ranking-basedmodelshavebeenshownto performwellforreferenceresolutionproblemsinpriorwork( DenisandBaldridge 2008 ; Iidaetal. 2010 ).Foragivenreferringexpression R k anditscandidatereferent list C k = { o 1 ,o 2 ,...,o N k } ,inwhicheach o i isanobjectidentiedasacandidate referent,wecomputetheprobabilityofeachcandidate o i beingthetruereferentof R k p ( R k ,o i )= f ( R k ,o i ),where f istheclassicationfunction.(Notethatourapproachis classier-agnostic.AswedescribeinSection 6.5.3 ,weexperimentedwithseveraldi!erent models.)Then,thecandidatesarerankedby p ( R k ,o i ),andtheobjectwiththehighest probabilityistakenasthereferentof R k 6.5ExperimentsandResult Toevaluatethenewapproach,weperformedasetofexperimentsthatcompareour approachwithtwostate-of-the-artapproaches.WeusethecorpusdescribedinSection 3 58

PAGE 59

6.5.1SemanticParsing Thereferringexpressionswereextractedfromthetutorialdialoguesandtheir semanticsegmentsandlabelsweremanuallyannotated.Alinear-chainCRFwastrained onthatdataandusedtoperformreferringexpressionsegmentationandlabeling( Liand Boyer 2015 ).Thecurrentworkreportstherstuseofthatlearnedsemanticsapproach forreferenceresolution. Next,weproceededtoextracttheattributevalues,astepthatourpreviouswork didnotaddress.FortheexampleshowninFigure 6-1 (b),fromthelearnedsemantic structure,wemayknowthat"2dimensional"referstothedimensionofthearray,the attribute"ARRAY DIM".(Inthecurrentdomainthereare14attributesthatcomprise thegenericattributevector V ,suchasARRAY DIM,NUM,andCATEGORY.)To actuallyextracttheattributevalues,weuseregularexpressionsthatcaptureourthree typesofattributevalues:categorical,numeric,andstrings.Forexample,thevaluetype of"CATEGORY"iscategorical,like"method"or"variable".Itsvaluesaretakenfroma closedset."NAME"hasvaluesthatarestrings."LINE NUMBER"'svalueisnumeric. Forcategoricalattributes,weaddthecategoricalattributevaluesintothesemantictag setoftheCRFusedforsegmentation.Inthisway,theattributevaluesofcategorical attributeswillbegeneratedbytheCRF.Forattributeswithtextstringvalues,wetake thewholesurfacestringofthesemanticsegmentasitsattributevalue.Theaccuracyof theentiresemanticparsingpipelineis93.2%using10-foldcross-validation.Theaccuracy isdenedasthepercentageofmanuallylabeledattributevaluesthatweresuccessfully extractedfromreferringexpressions. 6.5.2CandidateReferentGeneration WeappliedtheapproachdescribedinSection 6.3 oneachsessiontogeneratealistof candidatereferentsforeachreferringexpression.Inaprogram,therecouldbemorethan oneappearanceofthesameobject.Wetakealloftheappearancesofthesameobjectto bethesame,sincetheyallrefertothesameartifactintheprogram.Theaveragenumber 59

PAGE 60

ofgeneratedcandidatesforeachreferringexpressionwas44.8.Thepercentageofreferring expressionswhoseactualreferentswereinthegeneratedcandidatelist,or"hitrate"is 90.5%,basedonmanualtagging.Thisperformanceindicatesthatthecandidatereferent listgenerationperformswell. Areferringexpressioncouldbeapronoun,suchas"it"or"that",whichdoesnot containattributeinformation.Inpreviousreferenceresolutionresearch,itwasshown thattrainingseparatemodelsfordi!erentkindsofreferringexpressionscouldimprove performance( DenisandBaldridge 2008 ).Wefollowthisideaandsplitthedataset intotwogroups:referringexpressionscontainingattributes, REF ATT ,(270referring expressions),andreferringexpressionsthatdonotcontainattributes, REF NON (76 referringexpressions). Thecandidategenerationapproachperformedbetterforthereferringexpressions withoutattributes(hitrate94.7%),comparedtoreferringexpressionswithattributes(hit rate89.3%).Sincethecandidatelistforreferringexpressionswithoutattributesrelies solelyondialogueandtaskhistory,94.7%ofthosereferentshadbeenmentionedinthe dialogueormanipulatedbytheuserpreviously.Forreferringexpressionswithattribute information,thegenerationofthecandidatelistalsousedlearnedsemanticinformation. Only70.0%ofthosereferentshadbeenmentionedinthedialogueormanipulatedbythe userbefore. 6.5.3IdentifyingMostLikelyReferent WeappliedtheapproachdescribedinSection 6.4 toperformreferenceresolutionon thecorpusoftutorialdialogue.ThedatafromthesixmanuallylabeledJavatutoring sessionsweresplitintoatrainingsetandatestset.Weusedleave-one-dialogue-outcross validation(whichleadstosixfolds)forthereferenceresolutionexperiments.Ineach fold,annotatedreferringexpressionsfromoneofthetutoringsessionsweretakenasthe testset,anddatafromtheothervesessionswerethetrainingset.Wetestedlogistic regression,decisiontree,naiveBayes,andneuralnetworksasclassierstocomputethe 60

PAGE 61

p ( R k ,o i )foreach(referringexpression,candidate)pairfortheranking-basedmodel.The featuresprovidedtoeachclassierareshowninTable 6-2 Table6-2. Featuresusedforsegmentationandlabeling. LearnedSemanticFeatures(SF) SF1:whetherREhasCATEGORYattribute SF2:whetherRE.CATEGORY==o.CATEGORY SF3:whetherREhasRE.NAME SF4:whetherRE.NAME==o.NAME SF5:RE.NAME # o.NAME SF6:RE.VAR TYPEexist SF7:RE.VAR TYPE==o.VAR TYPE SF8:RE.LINE NUMBERexist SF9:RE.LINE NUMBER==o.LINE NUMBER SF10:RE.ARRAY DIMENSIONexist SF11:RE.ARRAY DIMENSION==o.ARRAY DIMENSION SF12:CATEGORYofo DialogueHistory(DH)Features DH1:whetheroisthelatestmentionedobject DH2:whetherowasmentionedinthelast30seconds DH3:whetherowasmentionedinthelast[30,60]seconds DH4:whetherowasmentionedinthelast[60,180]seconds DH5:whetherowasmentionedinthelast[180,300]seconds DH6:whetherowasmentionedinthelast[300,600]seconds DH7:whetherowasmentionedinthelast[600,innite]seconds DH8:whetherowasnevermentionedfromthebeginning DH9:StringmatchingbetweenoandRE TaskHistory(TH)Features TH1:whetheroisthemostrecentobjectmanipulated TH2:whetherowasmanipulatedinthelast30seconds TH3:whetherowasmanipulatedinthelast[30,60]seconds TH4:whetherowasmanipulatedinthelast[60,180]seconds TH5:whetherowasmanipulatedinthelast[180,300]seconds TH6:whetherowasmanipulatedinthelast[300,600]seconds TH7:whetherowasmanipulatedinthelast[600,innite]seconds TH8:whetherowasnevermanipulatedfromthebeginning TH9:whetheroisinthecurrentworkingwindow Toevaluatetheperformanceofthenewapproach,wecompareagainsttwoother recentapproaches.First,wecompareagainstaranking-basedmodelthatusesdialogue historyandtaskhistoryfeatures( Iidaetal. 2010 ).Thismodelusessemanticsfrom 61

PAGE 62

adomain-speciclexiconinsteadofasemanticparser.(Iidaetal'sworkwasextended byFunakoshietal.( Funakoshietal. 2012 ),butthatworkreliesuponahandcrafted probabilitydistributionofreferentstoconcepts,whichisnotfeasibleinourdomain sinceithasnoxedsetofpossiblereferents.)Therefore,wecompareagainsttheir2010 approach,implementingitinawaythatcreatesthestrongestpossiblebaseline:webuilt alexicondirectlyfromourmanuallylabeledsemanticsegments.First,wesplitallof thesemanticsegmentsintogroupsbytheirtags.Then,foreachgroupofsegments,any tokenthatappearedtwiceormorewasaddedintothelexicon.Althoughthenecessary datatodothiswouldnotbeavailableinarealapplicationofthetechnique,itensures thatthelexiconforthebaselineconditionhasgoodcoverageandcreatesahighbaseline forournewapproachtocompareagainst.Additionally,forfairnessofcomparison,for eachsemanticfeatureusedinourmodel,weextractedthesamefeatureusingthelexicon. Therewerethreekindsofattributevaluesinthedomain:categorical,string,andnumeric (asdescribedinSection 6.5.1 ).Weextractedcategoricalattributevaluesusingthe appearanceoftokensinthelexicon.Weusedregularexpressionstodeterminewhether areferringexpressioncontainsthenameofacandidatereferent.Wealsousedregular expressionstoextractattributevaluesfromreferringexpressions,suchaslinenumber.We alsoprovidedtheIidabaselinemodel( Iidaetal. 2010 )withafeaturetoindicatestring matchingbetweenreferringexpressionsandcandidatereferents,sincethisfeaturewas capturedinourmodelasanattribute. Wealsocomparedourapproach(wecallitLiapproachhere)againstaveryrecent techniquethatleveragedaword-as-classierapproachtolearnsemanticcompatibility betweenreferringexpressionsandcandidatereferents( KenningtonandSchlangen 2015 ). Tocreatethiscomparisonmodel,weusedaword-as-classiertolearnthesemantics ofreferringexpressionsinsteadofCRF.Thisweaklysupervisedapproachrelieson co-appearancebetweenwordsandobject'sattributes.Wethenusedtheresultingsemantic compatibilityinaranking-basedmodeltoselectthemostlikelyreferent. 62

PAGE 63

Thethreeconditionsforourexperimentareasfollows. IidaBaselineCondition:Featuresincludingdialoguehistory,taskhistory,and semanticsfromahandcraftedlexicon( Iidaetal. 2010 ). KenningtonBaselineCondition:Featuresincludingdialoguehistory,taskhistory, andlearnedsemanticsfromaword-as-classiermodel( KenningtonandSchlangen 2015 ). Liapproach:Featuresincludingdialoguehistory,taskhistory,andlearnedsemantics fromCRF. Withineachoftheseexperimentalconditions,wevariedtheclassierusedtocompute p ( R k ,o i ),testingfourclassiers:logisticregression(LR),decisiontree(DT),naive Bayes(NB),andneuralnetwork(NN).Theneuralnetworkhasonehiddenlayerandthe best-performingnumberofperceptronswas100(weexperimentedwithbetween50and 120). Tomeasuretheperformanceofthereferenceresolutionapproaches,weanalyzed accuracy,denedasthepercentofreferringexpressionsthatweresuccessfullylinkedto theirreferents.Wechoseaccuracyforourmetricfollowingstandardpractice( Iidaetal. 2010 ; KenningtonandSchlangen 2015 )becauseitprovidesanoverallmeasureofthe numberof( R k o i )pairsthatwerecorrectlyidentied.Fortherarecasesinwhichone referringexpressionreferredtomultiplereferents,theoutputreferentofthealgorithmwas takenascorrectifitselectedanyofthemultiplereferents. TheresultsareshowninTable 6-3 .Wefocusoncomparingtheresultsonreferring expressionsthatcontainattributeinformation,showninthetableas REF ATT REF ATT accountsfor78%ofallofthecases(270outof346).Amongthethreeapproaches,our approach(Liapproach)outperformedbothpriorapproaches.ComparedtotheIida 2010approachwhichachievedamaximumof55.2%accuracy,ourapproachachieved 68.5%accuracyusinganeuralnetclassier,andthisdi! erenceisstatisticallysignicant basedontheresultsofaWilcoxonsigned-ranktest( n =6; p =0 046).Ourapproach outperformedtheKennington2015approachevenmoresubstantially,asitsbest 63

PAGE 64

performancewas46.3%accuracy( p =0 028).Intuitively,thebetterperformanceof ourmodelcomparedtotheIidaapproachisduetoitsabilitytomoreaccuratelymodel referringexpressions'semantics.Comparedtoalexicon,semanticlabelingndsoptimal segmentationforareferringexpression,whilealexiconapproachextractsdi! erent attributeinformationfromreferringexpressionsseparately.Notethatourapproach andtheIida2010approachachievedthesameperformanceon REF NON referring expressions.Sincethesereferringexpressionsdonotcontainattributeinformation, thesetwoapproachesusedthesamesetoffeatures. Interestingly,themodelusingaword-as-classierapproachtolearnthesemantic compatibilitybetweenreferringexpressionsandreferent'sattributesperformstheworst. Webelievethatthereasonforthispoorperformanceismainlyfromthewayitperforms semanticcompositions.Itcannotlearnstructuresinreferringexpressions,suchasthat "2dimensional"isasegment,"dimensional"representsthetypeoftheattribute,and"2" isthevalueoftheattribute.Theword-as-classiermodelcannotdealwiththiscomplex semanticcomposition. Thecombinedaccuracyfor REF ATT and REF NON werealsocalculatedusinga neuralnetworksmodel.Theproposedapproacheshadanaccuracyof61.6%,andthe baselineapproachusinglexiconhadanaccuracyof51.3%. Theresultsreportedabovereliedonlearnedsemantics.Wealsoperformedexperiments usingmanuallylabeled,gold-standardsemanticsofreferringexpressions.Theresultin Table 6-4 showsthatranking-basedmodelshavethepotentialtoachieveaconsiderably betterresult,73.6%,withmoreaccuratesemanticinformation.Giventhe85.3% agreementbetweentwohumanannotators,themodelperformsverywell,sincethe semanticsofwholeutterancesinsituateddialoguealsoplayaveryimportantrolein identifyingagivenreferringexpression'sreferent. 64

PAGE 65

Table6-3. Referenceresolutionresults. experimentalcondition f ( R k ,o i ) classier accuracy REF ATT REF NON LR 0.5000.440 Iida DT 0.5370.453 2010 NB 0.4660.413 NN 0.552 0.373 LR 0.4627 0.3867 Kennington DT 0.37690.3333 2015 NB 0.32090.4000 NN 0.42160.4000 LR 0.6310.440 Li DT 0.6310.453 approach NB 0.4930.413 NN 0.685 0.373 Table6-4. Referenceresolutionresultswithgoldsemanticlabels. models accuracy REF ATT REF NON LR+SEM gold0.684 0.429 DT+SEM gold0.643 0.429 NB+SEM gold0.511 0.377 NN+SEM gold0.736 0.325 65

PAGE 66

CHAPTER7 TUTORIALDIALOGUESYSTEMFORJAVAPROGRAMMINGWITHSUPERVISED REFERENCERESOLUTION Thischapterpresentsanend-to-endtutorialdialoguesystemforJavaprogramming whichimplementsreal-timereferenceresolution.Asdiscussedintheliteraturereview inChapter 2 ,mostexistingtask-orienteddialoguesystemsaredesignedtointeractwith usersinhighlyconstraineddomains( Wenetal. 2016 ; Striketal. 1997 ).Thesesystems eitherdonotneedreferenceresolutionfunctionalityduetothesimplicityofthedomain ( Wenetal. 2016 ),orperformreferenceresolutionusingverysimpleapproaches,suchas keywordmatchingandadomain-speciclexicon( Vanlehnetal. 2002 ).Di!erentfromthe constraineddomainspreviousdialoguesystemsoperateson,thisdissertationfocuseson thedomainofJavaprogrammingtutoring.Insuchadomain,tutorialdialoguesfrequently mentionobjectsintheJavaprograminquestion.Thedialogueswithinthisdomainare characterizedbysituatedfeaturesthatpertaintotheprogrammingtask.Aportionof userutterancesrefertogeneralJavaknowledge.Inthesecases,semanticinterpretation ofauser'srequestcanbeaccomplishedbymappingtoadomain-specicontology(e.g., ( Dzikovskaetal. 2007 )).Incontrast,manyutterancesrefertoconcreteentitieswithinthe dynamicallychanging,user-createdprogramingartifact.Identifyingtheseentitiescorrectly iscrucialforunderstandingauser'sutteranceinthespecicprogrammingcontext,and thengeneratingspecictutorialdialoguemoves. ThischapterpresentsanaturallanguagetutorialdialoguesystemforJavaprogramming thatimplementsreal-timereferenceresolutionfornaturallanguageunderstanding.This dialoguesystemtracksuserintentionandtheworldstatetoprovideatask-relatedcontext foruserutteranceunderstandingandsystemdialogueactgeneration.Here,userintention meansthecurrentsubproblemthattheuserisfocusingon,suchas"creatingainteger arraytostore5digitsofazipcode".Worldstatemeansthecompletedstepstowardthe solutionofaprogrammingproblem.Thewholetutorialdialoguesystemsoftwareincludes threeparts,auserinterfacemodule(UI),adatabasemodule,andanagentmodule.The 66

PAGE 67

architectureofthewholesystemisillustratedinFigure 7-1 .TheUImoduleisanEclipse plugin,whichprovidesanintegrateddevelopmentenvironmentforJavaprogramming. Thedatabasemodulelogsthedatageneratedwhenauserinteractswiththetutorial dialoguesystem.Theagentmoduleimplementsallofmachinelearningfunctionalities ofthedialoguesystem.TheUImoduleandtheagentmoduleareimplementedina client-serverarchitecture.Theycommunicateusingsocketpackages.Thisarchitecture enablesustoimplementtheUIandtheagentusingdi!erentprogramminglanguages whichbestservetherequirementsofthetwomodulesrespectively.TheUImodule capturesuserutterancesaswellasuser'sprogrammingactions,andsendsthemtothe agentmodule.Theagentmoduleprocessestheseuserinputsandgeneratespropersystem utterancesaccordingly.Allofthegenerateddatainthisprocessareloggedintothe database. !"#$%&'(#$)*+#%,-./#'(0% 12#'(%,3#$4#$0% 5"#$%56#$*'+#% 5"#$%*+78'% 39"(#:%% 56#$*'+#% ;*(*<*"# % !"#$ %&'&(&)*&$$ %&+,-./,)$ 0,12*$$ 3-4++25&($ $$$$$67$ 3-4++25&($ 68$ #+&($9):&)/,)$ $$$$%&*,;)2<&($ =,(->$?:4:&$ $$$$$0(4*@&($ A),B-&>;&$$ $$$$$$$C4+&$ !"D$ .+&($.E&(4)*&$ +F+:&G$.E&(4)*&$ !"#$"%&"'$()%*$ +,)-"#.*$()%*$ Figure7-1. Architectureofthetutorialdialoguesystem. Toevaluatehowdi!erentreferenceresolutionapproachesimpacttheperformance ofthedialoguesystem,Iimplementedtwodi! erentreferenceresolutionapproaches.One ofthereferenceresolutionmodulesusedlearnedsemanticsfromaCRF-basedapproach, whichismynovelreferenceresolutionapproachasdescribedinChapter 6 .Theother 67

PAGE 68

referenceresolutionmoduleisusedforcomparisonandusesarecentstate-of-the-art approachthatreliesuponamanuallycreateddomain-speciclexicon.Bothofthese approachesusecontextualinformation,includinguserbehaviorhistoryanddialogue historyforreferenceresolution.Recallthat"userbehaviorhistory"inthistutorial dialoguesystemmeanstheeditingactionsconductedbytheuser,and"dialoguehistory" meanstheobjectsthatwerementionedpreviouslyinthetutorialdialogue.Inthisway, wecanaccesstheimpactofanimprovedreferenceresolutionapproachwithinareal-time dialoguesystembycomparingthesystem'sperformancewiththetwodi!erentreference resolutionmodels. Section 7.1 describesthefunctionalitiesandimplementationoftheuserinterface module.Section 7.2 denestheboundariesofthedialoguesystem'scapabilities,i.e.what functionalitiesthissystemisabletoperform.Section 7.3 introducesthearchitecture ofthedialoguesystem.Section 7.4 describestheapproachesusedtoimplementuser utteranceunderstandinginthissystem.Section 7.5 describestheimplementationofthe dialoguemanagermodule.Section 7.6 presentstheencodeddomainknowledgeinthis dialoguesystem.Section 7.7 describestheutterancegenerationimplementation. 7.1UserInterface TheuserinterfaceisillustratedinFigure 7-2 .Thisuserinterfaceisembeddedin Eclipse,awidelyusedintegrateddevelopmentenvironment(IDE)forJavaprogramming. Theuserinterfacehastwopanes,alogon/o!paneandadialoguepane.Thelogon/o! panedisplaysuser'slogon/o!status.Userslogintothedialoguesystemusingtheir Googleaccounts.Thisuserinformationisusedtodistinguishdi! erenttutorialsessions. Thedialoguepanedisplaysthetutorialdialoguebetweenauserandthedialoguesystem. Whenauserlogsintothedialoguesysteminthelogon/o!pane,thesystemgreets theuserandstartsatutoringsessionforJavaprogramming.Theusercantalktothe dialoguesysteminthedialoguepaneusingtextualmessages.Inaddition,theUImodule implementsasetoflistenersinEclipsetocaptureuser'sprogrammingactions,including 68

PAGE 69

sourcecodeediting,sourcecodeselecting,leopening,leclosing,andlecreating.All oftheuserutterancesandprogrammingactionsaresenttotheagentmoduleasinputsto thetutorialdialoguesystem.Thesedataarealsologgedintoalocaldatabaseforfurther analysis. Figure7-2. Userinterfaceofthedialoguesystem. 7.2SystemFunctionalities Today'sstate-of-arttask-orienteddialoguesystemsarestillfarfromengagingin naturallanguagedialoguewithahumanuserasahumanspeakercoulddo.Thelimitation ofthesesystemslieswiththeirabilitytohandleaconversationonvarioustopicsand granularities.Thus,task-orienteddialoguesystemsusuallyoperateonlyinaspecic domain,suchasanemployeeinformationqueryinacompany( Corbinetal. 2015 )or restaurantinformationrequests( Wenetal. 2016 ). Beforebuildingatask-orienteddialoguesystem,weneedtoclearlydenethe functionalityboundariesofthesystem.Weneedtodenethetopicsonwhichthesystem 69

PAGE 70

willbeabletoholdareasonableconversationwiththeuser,andhowthesystemshould handleout-of-topicuserutterances.Inthisway,wecanprovideuserswithareasonable expectationonthesystemfunctionalities. Mysystemisabletoreasonablyholdaconversationwithahumanuserandhelpthe usertocompleteaJavaprogrammingproblem.Icategorizethefunctionalitiesintoseveral types.Thesekeyfunctionalitiesincludethefollowingitems: Properlystartandendaconversationwithahumanuser Toconductaconversationwiththeuser,thedialoguesystemgreetstheusertodraw theuser'sattentionandgetreadytostartaconversation.Whenthesessionisover, thesystemclosestheconversation. Understandandproperlyrespondtoauserutteranceaboutprogramprogress Theknowledgebaseofthisdialoguesystemincludesknowledgeaboutthe programmingproblem.Theprogrammingproblemismodeledasatreestructure, asshowninFigure 7.6 .Tocompleteatask,theuserneedstocompleteasetof subtasksthatarethechildrenofthecurrenttaskinthetreestructure.Inthisway, whentheuserisconfusedaboutthecurrenttask,thesystemhelpstheusertobreak itdownintosmallersubtasksthatareeasiertoworkwith. UnderstandandproperlyrespondtoauserutteranceaboutbasicJavaconcepts ThesystemunderstandsuserutterancesaboutbasicJavaknowledgeinthe programmingcontext,suchashowtocreateanarray,andprovideaproper response. Detectuser'sout-of-topicutterancesandprovideresponse Duringaninteraction,auser'sutterancecouldbeo!topic.However,thesystemis onlydesignedtoholdnaturallanguageconversationonaspecicJavaprogramming problem.Thesystemattemptstodetectsuchuserutterances,andprovidesa responsewiththegoaloffocusingontheprogrammingproblem. Monitortheprogrammingactionsoftheuserandgeneratepropersystemutterances Thisdialoguesystemismixedinitiative,whichmeansthatboththeuserandthe dialoguesystemcouldstartaconversation.Thesystemisdesignedtodetectthe momentsthatusersmayneedhintsfromthesystem. 7.3ArchitectureoftheDialogueAgent Followingatypicaldialoguesystemarchitecture,thedialoguesystem(theagent module)hasfourmainmodules,asshowninFigure 7-3 .Thenaturallanguageunderstanding (NLU)moduleperformsreferenceresolution,topicclassication,anddialogueact 70

PAGE 71

!"#$%!&#$'()#% "Is my for loop correct?" !"# $ !"#$%*+,#%-,./(0%-1#(2% #%&'$()*&)+,)$ $(-&)+.&'$ CREATE_for_loop { event_type : TYPING added_text : "++" affected_line : for( int i =0;i<=5;i) Line_number : 80 } /01"2304567/686*% $ 1302904567:,-&4%*' $ 3-*456-7,.0.27)8'$ % 1302904;,'4<,,7 $ 599:;<7,.0.27)8'$ % *=<>-6?7)8'$72+7.(2 % 599:;<7@.A3.0.2" % =,'<-$>*?*&$9'?:@&'$ /6?<,8A&$$ B,<6:C$ /D $ !"E $ !,A)$BF'?%&$ $1FA)@6)8$ 3&;&''6)8$ 0G7'&%%6,)$$ 0G*'?:+,)$ >&H?)+:$()*&'7'&*?+,)$ $,;$3&;&''6)8$0G7'&%%6,)%$$ (-&)+;C6)8$$ 3&;&'&)*%$ my for loop' % my for loop' % NAME = for', CATEGORY = FOR_STATEMENT ... for ( int i =0; i <=5; i ++) % <'2B$'C%4'(0B'0#%;#(#$'/+(% 'You will specify the end condition of the for loop, which tells the loop to stop. In this case, you can set the index i to <= 4. /6?<,8A&$2:*$$ 1
PAGE 72

7-3 .TheinputstotheNLUmodulearetextualuserutterances,thecurrentprogress oftheprogrammingtask,andthecurrentuserintention.TheoutputoftheNLU moduleincludesuser'sreferentsinthecurrentutterance,dialogueactofthecurrentuser utterance,andthesemanticsofuserutterance.Thissectiondescribestheimplementation ofthesubmodulesofthenaturallanguageunderstandingmodule. 7.4.1ReferenceResolution AsdiscussedinChapter 2 ,perceiveda!ordancesbasedontheuser'sperceived objectsinthesituatedenvironmentsuggestlikelyuseractions.Forexample,akey suggeststheactionof"openingadoor".InaJavaprogrammingproblem,theuser's perceivedreferentcouldalsosuggestpossibleactions.Forexample,whenausermentions atwo-dimensionalarray,themostlikelyactionassociatedwithitmaybe"askhowto createatwodimensionalarray".Iuseadata-drivenapproachtodiscovertherelationships betweenthementionedobjectsandthesuggestedactions.Referenceresolutionisalso essentialtounderstandingauserutterancewithinacontext.Forexample,whenthe Javaprogrammingproblemaskstheusertocreateanintegerarraycalled"zipCode", theusercouldsay"Idon'tknowhowtocreatezipCode."Weneedtondthereferent of"zipCode"intheJavacode,andinferthattheuserisaskingabout"howtocreatean integerarray".Thenwecanformaquerytotheknowledgebasetorequestananswer. Twodi! erentreferenceresolutionapproacheswereimplementedinthisdialogue systemforthepurposeofcomparison.Version1implementsthereferenceresolutionusing ourapproach,thelearnedsemanticsasdescribedinSection 6 .Forcomparison,Icreated abaselinereferenceresolutionmoduleusingthesameapproachasversion1exceptthat itusesamanuallydenedlexicontorepresentreferringexpressions'semanticsinsteadof learnedsemanticsusingaCRF-basedapproach. InChapter 4 ,wepresentedtheapproachforreferringexpressionextraction,which extractsallnounphrasesinauserutterance.Notallofthesenounphrasesreferto objectsintheparallelJavaprogram,soweidentifyreferringexpressionsfromthese 72

PAGE 73

nounphrases.Inthistutorialdialoguesystem,werstapplyasetofrulestolterthe extractednounphrases.Recallthatourreferenceresolutionapproachescalculatea compatibilityprobabilityforeachreferringexpressionandcandidatepair. compatibilityprobability = f ( referringexpression,candidate i ) candidate i isthe i th candidatereferentinthecandidatelistforthereferring expressioninquestion.Thecandidatewiththehighestcompatibilityprobabilityispicked asthereferent.Weusethegeneratedcompatibilityprobabilitybythereferenceresolution moduleasameasuretodecideifanounphrasereferstoanobjectintheJavaprogram. Anynounphrasethathasa0.90orhighercompatibility( f ( nounphrase,candidate i ) > 0 90)withanyofitscandidateswastakenasareferringexpression. 7.4.2DialogueActClassication Dialogueactsarespecializedspeechactsthatmodeltheillocutionaryforceof utterances( Austin 1962 ).Anillocutionaryactindicatesthespeaker'sintentioninsteadof theuserutterance'ssurfacemeaning.Forexample,whenacustomerinarestaurantasks awaiter:"Doyouhavesalt?"Thesurfacemeaningoftheutteranceisaquestionwhich askswhetherthewaiterhassalt.Theillocutionaryactofthisutteranceisconveyingthe customersrequestthatshewantssomesalt. Fordialogueactclassication,Iuseamaximumentropymodel.Themaximum entropymodelusesthreetypesoffeatures:wordunigrams,bigrams,andtrigramsfrom eachuserutterance.IusetheannotationschemaproposedbyCan( Can 2016 ).Thetag setisshowninTable 7-1 .Themodelistrainedusing4857utteranceswhicharelabeled withdialogueacttagsfromtheRipplecorpus Boyeretal. ( 2010 ).Theclassication accuracyofthetrainedmodelwas73.6%. 7.4.3TopicClassication Dialogueactsrepresentthecategoryofutterance-levelintentions,whichiscategorical andabstract.Insomecases,knowingtheuserutterance'sdialogueactwillbeenough forthesystemtogenerateareasonableresponse,suchasagreetingdialogueactfrom 73

PAGE 74

Table7-1. Dialogueactset. DialogueActTag Explanation SampleUtterance Question(Q) Ageneralquestionabout thetask whatwouldbethebestwaytodo that? Evaluation Question(EQ) Aevaluationquestion aboutthetask isn'tthatalsodeclaredinthesame place? Statement(S) Astatementofafact Iwastryingtogureoutthebestway todothat Grounding(G) Acknowledgementabouta previousutterance fairenough Extra-Domain(EX) Anyutterancethatisnot relatedtothetask I'mnotverygoodatJavayet Positive Feedback(PF) Positiveassessmentof knowledgeortask yeait'sastring Negative Feedback(NF) Negativeassessmentof knowledgeortask ireallydon'tseethepointmuchof thisloopreally Lukewarm Feedback(LF) Assessmenthavingboth positiveandnegative aspects kindof Greeting(GRE) Greetings hello auser.However,insomecases,suchastorespondtoauser'squestion,thedialogue systemneedstoquerytheknowledgebase.Torecognizethetopicsofuserutterancesin thedialoguesystem,atopicclassieristrained.Thetopicsthisclassierrecognizesare listedinTable 7-2 .Thetopicclassierwasalsoimplementedusingamaximumentropy model.Ittakeswordunigramstotrigramsofuserutterancesasfeatures.Wemanually selected492utterancesfromtheRipplecorpusandtaggedthemwithtopiclabels.These 492utteranceswereusedasatrainingsettotrainthetopicclassier.Theaccuracyofthe classierwas63.7%. 7.5DialogueManager Thedialoguemanagertakesuserdialogueacts,useractions,andrecognizestopics ofuserutterancesasinputs.Itselectsasystemresponseaccordingtotheseinputsand recognizesuserintention.Forexample,whenausersays"Hi",thedialogueactclassier predictsthisutteranceasaGRE,agreetingdialogueact.Thenthedialoguemanager generatesasystemdialogueactGRE,whichwillbepassedtotheNLUmoduleto 74

PAGE 75

Table7-2. Topicsrecognizedbythetopicclassier. Topics Explanation SampleUtterance GET SUBSTRING thewaytogetasubstring fromastring okaysoshoulditbe zipString.substring(i,i+1)? GET ZIP DIGITS thewaytoextractasingle digitfromazipcode howdoIextractindividualdigits CONVERT ZIPCODE TO STR convertthevariable zipCodetostring Can'tmanuallyturnanintegerintoa string? CREATE FOR LOOP thewaytocreateafor statement whatarethethreethingsweneedfor aloop? USE A LOOP necessityofusingafor statement wouldaforloopbebest? STORE ZIP DIGITS thewaytostoredigits withanarray? PROGRESS abouttheprogress howdoistarttheextractDigits method? STRING 2 INT convertastringtointeger Integer.parseString()? CREATE DIGITS ARRAY thesyntaxtocreatean arrayforthedigits whichis5,correct?ordoesit depend? DECLARE ARRAY thesyntaxtodeclarean array Howdowedeclareanarray? INPUT ZIPCODE gettheinputzipcode soineedsomethingtellingittoget zipcode? CHAR 2 INT convertcharactertointeger canIparseacharactertoanint HOW TO RUN thewaytorunthe program howtorunit? AM I RIGHT requesttocheckuser's code doesthatmakesense? OOD outofdomaintopics Meh,this[keyisstuck. instantiateitasasystemutterance"hi"or"hello",etc.Inanotherexample,ausermay say,"HowdoIcreateanarraytoholdthezipcodedigits?".Asetofruleswereauthored forthedialoguemanagertogenerateasystemresponse. Userintentionindicatesthesubtaskthattheuserisworkingon.Userintention givesthesystemessentialcontextualinformationaboutthedialogue.Asdiscussedin therelatedworksection,itcoulddramaticallyconstrainthepossibleexplanationofuser utterances.Inthisdialoguesystem,thecurrentuserintentionisusedtodividetheJava programmingproblemintosub-domains.ThewholeJavaprogrammingproblemisa domainforthetutorialdialoguesystem.Eachsubtaskoftheprogrammingproblemforms 75

PAGE 76

!"#$%&'(&)**+,($!)-.$ public class PostalFrame extends JFrame implements ActionListener { $ /** the numerial representation of the zip code */ private int zipCode ; $ /** Extract the individual digits stored in the ZIP code and return them. */ private int [] extractDigits () { //You must complete this method!! String zipcode = zipCode + "" ; int [] digits = new int [5]; for ( int i =0; i <5; i ++) { digits [ i ] = zipcode .charAt ( i ) '0' ; } barCode .clearCode (); return digits ; } ... } $ /01234056+7/+(+8$ 1403!056+79':#5-8& $ /0123405:+(+859")& $ 1403!05;'&5<''7 $ 3==>?@5:+(+859")& $ 1A@B04!59")&58'5+,8 $ 3==>?@56+7/+(+8$ 40!C4@56+7/+(+8$ D$ Figure7-4. Userintentionidenticationexample. asub-domainforthedialoguesystem.Inthisway,thedialoguesystemcouldbeseenasa combinationofasetofsmallerdialoguesystems.Foreachsub-task,wefocusonamuch smallersub-domain,comparedwiththedomainforthewholeprogrammingproblem. ToidentifyuserintentioninthedomainofJavaprogramming,weneedtounderstand theuser'sJavasourcecode.Givenaprogrammingtask,therecouldbemultiplewaysto solveit,i.e.therearemultiplepathstofollowifweimagineeachstepinthesolutionisa nodeonagraph. Therststeptounderstandtheuser'sprogramtoisperformasyntaxparsingsothat weknowwhichtypeofvariablesweredeclared,whichvariableswereassigned,etc.This informationhelpsustoidentifywhichsteptheuserisworkingon.Forexample,creating anintegerarrayatthebeginningofthe"extractDigits()"methodindicatesthattheuser iscreatinganarraytoholdthe5digitsofazipcode. Wecreatedarule-basedalgorithmtointerprettheuser'sJavaprogrambymapping eachlineofJavaprogramontoastepinthesolution.Therewere96rulesdenedforthe intentionidentier.AnexampleofuserintentionidenticationisshowninFigure 7-4 76

PAGE 77

PostalFrame extractDigits () calcAndDrawCDigits () drawZipCode () createAndInitStringZip ... ... ... Figure7-5. Structureoftheprogrammingtask. 7.6KnowledgeBase Tosupportareasonabledialogue,theknowledgebasecontainsthreetypesof knowledge:subtaskstructureoftheprogrammingproblem,knowledgeaboutJava languagefeatures,andknowledgeneededtosolvetheprogrammingproblem. Thesolutiontotheprogrammingproblemisdenedasatreestructure,asshownin Figure 7.6 .Therootofthetreeisthewholeprogrammingtask.Eachnodeinthetreeis asubtask.Inthistutorialdialoguesystem,thewholeprogrammingtaskistocomplete amethodinaJavaclasscalledPostalFrame,whichtranslatesave-digitzipcodeinto asetofbarcode.Forthesubtaskofcompletingthemethod"extractDigits()",thereare somesmallersubtasksthatneedtobecompleted.Withthistreestructure,wecould understandtheuser'sprogressandprovidehintswhentheuserhasquestionsaboutthe currentsubtask.AsdescribedinSection 7.5 ,weusearule-basedalgorithmtomapeach lineofauser'sprogramtoanodeinthistreestructure.Thisgivesusveryimportant contextualinformationforthedialogue. 7.7SystemUtteranceGeneration Asetof99systemutteranceswasauthoredtobeselectedbythedialoguemanager. Table 7-3 showssamplesystemresponsestouserquestionsondi! erenttopics.Foreach topic,wecreatemultiplesystemresponseswithdi!erentlevelofdetail.Whenthesystem detectsthatauserasksasimilarquestion,thesystemgivesanewresponsewithamore detailedexplanation. 77

PAGE 78

Table7-3. Samplesystemresponseutterances. Topicsofquestion SystemResponse GET SUBSTRING 'Togetasubstringofastringvariable,thesyntaxis stringVariable.substring(start,end+1)' GET ZIP DIGITS 'Thereareseveralwaystobreakanintapartintoits individualdigits... CONVERT ZIPCODE TO STR 'YoucanusethesyntaxintVariable+""toconvertan integertoaStringvariable.' CREATE FOR LOOP 'Aforlooptakestheform:for(startcondition;nish condition;incrementstatement)' USE A LOOP 'Wecanstartwithaforloop.Itshouldloopthrough thezipcodetogetouteachindividualdigit.' STORE ZIP DIGITS 'Youneedanintarraytoholdthe5digitsofa zipcode.' STRING 2 INT 'Tocastastringofanumberintoaninteger,youcan usetheparseIntmethod:... CREATE DIGITS ARRAY 'Thesyntaxtocreateanarrayistype[]arrayName.' DECLARE ARRAY 'Forexample,youcandoint[]digits=...' INPUT ZIPCODE 'Whentheprogramisrun,theusertypesinazipCode ...' CHAR 2 INT 'Toconvertacharofadigitintointeger,youcando char digit 0 .' HOW TO RUN 'TorunaprograminEclipse,youcanrightclick... OOD 'Icanhelpyouwithmanyaspectsofthisproject,but Imightnot... 78

PAGE 79

CHAPTER8 EVALUATIONOFTHEDIALOGUESYSTEM Thischapterdescribesahumanuserstudytoevaluatethenovelreferenceresolution approachintheimplementedtutorialdialoguesystem,andcompareitwithacomparison approach.Thetutorialdialoguesystemwiththereferenceresolutionapproachbasedon learnedsemanticsisdenotedasSystem Li.Thisisthetreatmentcondition.Thebaseline tutorialdialoguesystemwithreferenceresolutionapproachusingamanuallycreated lexiconisdenotedasSystem Comparison.AsmentionedinChapter 6 ,thebaselinemodel wasadoptedfrom Iidaetal. ( 2010 ).Thisisthecomparisoncondition. Thegoalsoftheuserstudyaretwofold.First,weevaluatethetwodialoguesystems' usersatisfactionanduserengagementbyanalyzingstudyparticipants'post-surveydata. Inaddition,wewouldliketoinvestigatetheperformanceofthetworeference resolutionapproachesinSystem LiandSystem Comparisonintermsofaccuracy.Todo this,wemanuallyexaminedthenaturallanguageinputusersprovided,andratedwhether thesystemproperlyidentiedthereferent(s)withintheuserinput. Inthischapter,therstsectionintroducestheuserstudyprocedureandbriey describesthecollecteddata.Inthesecondsection,ahypothesistestisconductedto compareusersatisfactionanduserengagementofthetwodialoguesystems.Finally,the thirdsectioncomparesthereferenceresolutionperformanceofthetwodialoguesystems. 8.1ProposedHypotheses Thisdissertationfocusesonthreehypotheses. Hypothesis1.System LiwilloutperformSystem Comparisononaccuracyof referenceresolution. InChapter 6 ,wecomparedtworeferenceresolutionapproacheswitho"ine evaluation.Wemanuallytaggedthereferringexpressions.Inanonlinedialogue system,thesystemwillautomaticallyextractreferringexpressionswhileconversing withahumanuser.Theaccuracyofreferringexpressionextractionalsoplayakey roleinthereferenceresolutionpipeline.Wewouldliketoexamineifthereference resolutionapproachwithlearnedsemanticsstillhasahigheraccuracyinsuchan onlinedialoguesystem,givennoisyreferringexpressions. 79

PAGE 80

Hypothesis2.System Liwillo!erahigherusersatisfactionthanSystem Comparison. ThegoalofthistutorialdialoguesystemistotutorcollegestudentsonJava programming.Iwouldliketoknowhowsatisedthehumansubjectsarewhile usingtheproposeddialoguesystem.Iwanttoexaminethedi!erenceofthedialogue systemintwoproposedconditionsintermsofusersatisfaction.IexpecttheLi approachhasahigherreferenceresolutionaccuracythanthecomparisonapproach inareal-timedialoguesystem.Thiswillprobablyimprovetheuserutterance understandingfunctionalityofthetreatmentcondition,whichwillmakethesystem generatemorereasonableresponses.So,myhypothesisisthatthetreatment conditionhasahigherusersatisfactionthanthecomparisoncondition.Iwill measurestudents'satisfactionusingtheirself-reportedsatisfactioninthepost-survey results. Hypothesis3.System LiprovidesahigheruserengagementthanSystem Comparison. Userengagementisanotherwidelyusedmetrictoevaluateadialoguesystem ( Sidneretal. 2005 ).Itmeasureshowfrequentlythehumanusertalkstothe dialoguesystem.WewouldliketoexamineifusersengagemorewithSystem Li thanSystem Comparison.AsIdiscussedinHypothesis2,thetreatmentcondition willprobablygeneratemorereasonablesystemresponse,whichwilllikelyincrease userengagement. 8.2UserStudy Thissectionintroducestheprocedureoftheuserstudyandthecollecteddata. 8.2.1Participants StudentparticipantswererecruitedfromaundergraduateintroductoryJava programmingclassCOP3502"ProgrammingFundamentalsI"attheUniversityof Floridainthe2018Springsemester.Studentsvoluntarilyparticipatedinthisstudy andwerecompensatedwithsmallamountofcreditsfortheclass.Thisstudyhad43 participantsintotal,twoofwhomparticipatedinapilotstudy.Duringthepilotstudy, wetalkedtotheparticipantsforfeedbacktoimprovethesystemforthefollowingstudy sessions.Datawerecollectedforallofthe43sessions,butonlythedatafromthe41 sessions'wereanalyzedtoaddresstheresearchquestions,duetopotentialinuencefrom thecommunicationwiththeparticipantsinthepilotstudy. 8.2.2JavaProgrammingTaskfortheStudy ThestudyadoptedaJavaprogrammingtaskthatwaspreviouslyusedinanother researchstudyfordialogueactmodelingintask-orientedtutorialdialogue( Boyer 2010 ). 80

PAGE 81

TheprogrammingtaskwasdesignedforundergraduatestudentsinanintroductoryJava programmingcourse.Theprogrammingtaskexaminedtheuseof"for"statement,array andStringconcepts.WeprovidedapartiallyimplementedJavaprogramwhichtooka 5-digitzipcodeasinputandconverteditintoapostalbarcode.Whenauserranthis program,itopenedagraphicaluserinterface(GUI)topromptforainputzipcode. Wheninputwasentered,theGUIconverteditintoabarcodeanddisplayedit.The programseparatedthevedigitsintheinputintegerandconvertedeachsingledigitinto abarcode.Theonlymissingmethodintheprovidedprogramwasthe"extractDigits()" method,whichtookaintegerzipcodeasinputandreturnedanintegerarray.This integerarraycontainedtheveseparateddigits.Ataskdescriptionwasprovidedtoeach participantatthebeginningofeachstudysession.Thetaskdescriptioncanbefoundin Figure 8-1 and 8-2 8.2.3Procedure Forrecruitment,wepresentedarecruitmentspeechintheCOP3502classtobriey introducetheresearchstudy,andcollectedstudentvolunteers'contactinformation throughaGoogleform.Thevolunteers'availabilitywasthencollectedusingaDoodle poll.Studentparticipantswereassignedtodi!erentstudysessionsaccordingtotheir availability. Thetwodialoguesystemswereinstalledon12LearnDialoguegroup-ownedlaptops. Thelaptopswerenumbered.System Liwasinstalledonoddnumberlaptops,and System comparisonwasinstalledonevennumberlaptops.Ineachstudysession, wepreparedasimilarnumber(thenumberofparticipantswereoddinsomestudy sessions)oflaptopsfromeachgrouptoensurewehadsimilarnumberofSystem Liand System Comparisonusedinthestudy. Onarrival,studentswereseatedrandomly.Theyweregivenconsentformsandshort instructionstothestudy.Theinstructionsincludedaintroductiontothegoalofthe studyandthetaskdescriptionthatwasmentionedinthelastsubsection.Weusedthe 81

PAGE 82

The goal of this study is to evaluate an intelligent tutoring system (ITS). This ITS provide s conversational assistance to students during Java progra mming. It is important to note that this is an experimental system, which is one of th e few research projects that attempt to implement a dialogue system for a complex domain like Java programming. The system may fail to answer some of your questions I t is important to keep in mind that the goal of this study is to evaluate the dialogue system. This sys tem is designed to assist you through Java programming problems. You can ask questions about the programming problem, such as: "Is it correct?", "What should I do next?", "Where should I start from?". You can also ask questions about "Is my for loop correc t?", "How to declare an array ?" While you are interacting with the conversational agent, you will be working on the following problem. Postal Bar Codes The Problem: For faster sorting of letters, the United States Postal Service encourages companies that send large volumes of mail to use a bar code denoting the ZIP code Using the skeleton GUI program provided for you, you will complete this lab with code to actually generate the bar code for a given zip code More About Bar Codes: In postal bar codes, there is a full height frame bar on each end (and these are drawn automatically by the program provided for you; you don't have to write code to draw these) Each of the five encoded digits is represented by five bars The five encoded digits are followed by a correction digit Figure8-1. Ashortinstructionwiththetaskdescription. writteninstructionstomaintainconsistencyamongallofthedi! erentstudysessions. Afterreadingtheconsentformandtheinstructions,participantswereaskedtocompletea pre-surveyabouttheirattitudestowardprogramming. 82

PAGE 83

The Correction Digit The correction digit is computed as follows: Add up all digits, and choose the correct digit to make the sum a multiple of 1 0. For example, the ZIP code 95014 has sum of digits 19, so the correction digit is 1 to make the sum equal to 2 0. What's Already Written? You can see what parts of this program are already written by running the file Main java When you do, you should see output like the image below, with a blank zip code slot You can enter a zip code, and you should see that no bar code is generated (except the first and last full bars which are required for all bar codes) What's Your Task? Your job is to extract this five digit zip code from user's. The PostalFrame class is the one which handles this task The only method which you must complete is : extractDigits() For extractDigits(), you will need to create a variable in the method which stores the zip code as separate digits Some Helpful Information If you can't remember how to do something with the software, please refer to the reference sheet on your desk Figure8-2. Ashortinstructionwiththetaskdescription. Afterthepre-survey,theparticipantshad40minutestoworkontheprogramming taskwiththeassistanceofthetutorialdialoguesystemstheywereassignedto. Whenparticipantsnishedtheprogrammingtaskor40minuteshadpassed,they weregivenapost-surveytoevaluatethesystemusabilityanduserengagement. 8.2.4DataCollection Duringtheuserstudy,wecollectedusers'pre-surveyandpost-surveyresults.The pre-surveyfocusedonstudents'attitudetowardprogramming,includingwhetherthey viewedprogrammingasanimportantskill,aswellasself-reportedprogrammingskill 83

PAGE 84

evaluation.ThesurveycanbefoundinAppendixA.Thepost-surveyincludedtwoparts. Therstpartwasawidelyusedsystemusabilityscale(SUS)survey Bangoretal. ( 2008 ), whichaccessestheusabilityofasystem.TheSUSsurveycontained10questionswhich reectedusers'evaluationofasystem'susability.ThescoreofeachcompletedSUSsurvey was0-100.Ahigherscoreindicatedbetterusability.Bangoret.al.calculatedthemean SUSscoreofnearly3500surveysintheirpast273studies,whichsuggestedasystem withSUSscoreabove70hadabetter-than-averageusability Bangoretal. ( 2009 ).The secondpartwasasurveytomeasuretheuserengagementscale(UES),whichcontained30 questions.Acompletepost-surveycanbefoundinAppendixB. Besidessurveydata,Ialsocollectedthetextualdialoguesbetweenparticipantsand thedialoguesystems.AnexcerptdialogueisshowninTable 8-1 .Inthe41sessions, therewere2641utterancescollected,1292ofwhichwereuserutterances.Therewere64.4 utterancesonaverageineachsession,and31.5ofthemwereuserutterances.Inthese41 sessions,themaximumnumberofutteranceswas154,andtheminimumnumberwas17. Themaximumnumberofuserutteranceswas76,andtheminimumwas8. Ultimately,22studentparticipantsusedSystem Li,and19participantsused System Comparison. Users'programmingactionswereloggedintoalocaldatabase.Theseactionsinclude login,logout,typing,andselecting.TheseactionsweresavedinadatabaseasaJson string.TheformatofanuseractionisillustratedinTable 8-2 .Forexample,fora typingaction,thetimestampoftheaction,theaddedtext,andthelinethattheaction happenedwereloggedintothedatabaseforfurtheranalysis. Thereferenceresolutionactionshappenedinthedialoguesystemwerealsosavedinto thedatabase.Eachrecordcontainedareferringexpression,itssemanticsegments,the candidatelistandthecompatibilityprobabilityofeachcandidate. 84

PAGE 85

Table8-1. AnexcerptdialoguebetweenauserandtheVirtualTA. VirtualTA Hi,I'myourvirtualTA.IwillbehelpingyouonthisJavaproject. User doesextratDigitsmethodinputanything? VirtualTA ThisJavaprogramconvertsazipcodeforUSmailintoavisualbarcode.When youruntheprogram,itaskstheusertoenterazipcode.Theprogramwillthen convertitintoabarcodeanddisplayit.Butitdoesn'tworkyet!Tomakeitwork, yourtaskistocompletetheextractDigitsmethod.Itneedstoconvertaninteger zipcodeintoanarrayofindividualdigits,andreturnthatarray.Hint:Makingthe intintoaStringrstisaconvenientwaytogetitsindividualdigits. VirtualTA TheextractDigitsmethodislocatedinthePostalFrameclass,whichshouldhave beenopenedforyou.Youarewelcometoviewtheothercode,butyouonlyneedto workontheextractDigitsmethod. User whatdoIneedtodointheextractDigitsmethod? VirtualTA Youwillextractthevedigitsofazipcode.Nowweneedtogeteachofthe5digits ofthatint.Usingastringandthengettingtheindividualcharactersfromitisa goodwaytodothat. User whereistehzipcode VirtualTA Whentheprogramisrun,theusertypesinazipCode.ThecodetogetthezipCode fromtheuserisalreadycomplete.Thezipcodetheuserenteredisstoredasa privatevariable,intzipCodeonline34inPostalFrame. User howtoparseastringintoanarray VirtualTA YoucanusethesyntaxintVariable+""toconvertanintegertoaStringvariable. ... Table8-2. Anexampleuseractionsavedinthedatabase. type: TYPING timestamp: 1523261514283 addedText: int lineNum: 79 o set: 1918 currentCode: "importjavax.swing.*; \ nimport..." referents: [] isRRResult: False prob: 0.0 a ectedLine: \ tint[]digits=int" Intotal,therewere18775recordscollectedthroughoutthe41studysessions.The numberofreferenceresolutioneventswere1486.Anexamplereferenceresolutioneventis showninTable 8-3 AsshowninTable 8-3 ,"noun phrase"isthereferringexpression;"candidates"eld listsallofthegeneratedcandidatesfromtheparallelJavaprogram;"probs"eldlists 85

PAGE 86

Table8-3. Anexamplereferenceresolutioneventsavedinthedatabase. { noun phrase: charatmethod candidates:[ { u'category':u' METHOD ',u'line number':81,u'name':u' charAT ',... } { u'category':u' METHOD ',u'line number':40,u'name':u' PostalFrame ', ... } { u'category':u' METHOD ',u'line number':41,u'name':u' setSize ',... } ...] probs: [ 0.9741117181861638 0.00036208246341969553, 0.00036208246341969553,...] referent: { u'category':u'METHOD',u'line number':81,u'name': u'charAT',... } prob: 0.974111718186 isRRResult: true timestamp: 1523546180624 } thecompatibilityprobabilitybetweenthereferringexpressionandallofthecandidates; "referent"isthesystem-selectedreferent;"prob"isthecompatibilityprobabilitybetween thereferringexpressionandtheselectedreferent. 8.3SystemUsabilityEvaluation Toevaluatetheusabilityoftheimplementeddialoguesystemswithtwodi!erent referenceresolutionapproaches,studentparticipantsoftheresearchstudywereasked tocompleteapost-surveywhichcontainedaninstrumentwidelyusedtoassesssystem usability Bangoretal. ( 2008 ).Thetwogroups'userresponsecanbefoundinTable B-1 and B-2 inAppendixB.Thetwosystemshadaveryclosemeansystemusabilityscale (SUS)score.TheaverageSUSscoreof22studentparticipantswhousedSystem Liwas 66.67.TheaverageSUSscoreof19participantswhousedSystem Comparisonwas68.77. TointerpretSUSscores,Bangoret.al.arguedthatasystemwithaSUSscore over70wasacceptable,asshowninFigure 8-3 .Accordingtohisargument,thesystems implementedinthisprojectismarginalintheacceptabilityrange,butverycloseto acceptable. 86

PAGE 87

Figure8-3. Systemusabilityscoreinterpretation. Afurtherperformedt-testshowednosignicantdi!erence(p-value=0.361)onthe SUSscoresforthetwogroups. 8.4UserEngagementEvaluation Nextweexaminedourhypothesisaboutthetwosystems'userengagement.Besides SUSscale,thepost-surveyalsomeasureduserengagementusingtheUserEngagement Scale(UES)instrument Brienetal. ( 2018 ).Thisinstrumentincluded30questions. ParticipantswhousedSystem LihadanaverageUESscoreof11.80,andstudentswho usedSystem ComparisonhadanaverageUESscoreof12.27.Acompletetableofuser responsefromthetwogroupscanbefoundinTable B-1 and B-2 inAppendixB.At-test showednosignicantdi!erencebetweentwogroupsonUESscores(p-value=0.236). Thenumberofuserutterancesalsoreectedusers'engagementwiththedialogue system.System Lihad30.8userutterancepersession,System Comparisonhad32.4on average.Therewasnotasignicantdi!erencebetweenthem(p-value=0.382). 8.5OnlineReferenceResolutionEvaluationinTutorialDialogueSystems InChapter 6 ,wecomparedtworeferenceresolutionapproacheswitho"ine evaluation.Wemanuallytaggedthereferringexpressionsandtheirreferentsinthe parallelJavasourcecode.Inanonlinedialoguesystem,thesystemautomatically extractedreferringexpressions,generatedcandidatesandextractedfeatureswhilehaving aconversationwithahumanuser.Withouthumanintervention,errorsinonestepcould 87

PAGE 88

propagatetolaterstepsinthereferenceresolutionpipline.Wewouldliketoexamineif thereferenceresolutionapproachwithlearnedsemanticsstillhadahigheraccuracyin suchanonlinedialoguesystem. Intheproposal,wehypothesizedthatSystem Liwouldhaveahigherreference resolutionperformance.Toevaluatethesetwosystems'referenceresolutionaccuracy,we analyzedtheloggedreferenceresolutionactionsperformedbythedialoguesystems. AsmentionedinSection 8.2.4 ,allofthereferenceresolutioneventsperformed bythedialoguesystemswereloggedinalocaldatabase.Eachreferenceresolution eventcontainedseveralelds,referringexpression,candidatelist,thecompatibilityfor eachcandidateandtheselectedreferentfromthecandidatelist.Anexcerptreference resolutioneventisillustratedinTable 8-3 Thereferenceresolutioneventsweremanuallyevaluatedtocalculateaccuraciesfor thetwosystems.Asdiscussedearlier,thesystemselectedreferringexpressionsfrom nounphrasesinauserutterance.Theprocessofreferenceresolutionisillustratedin Figure 8-4 .Forauserutterance,thedialoguesystemrstfoundallthenounphrases intheutterance.Itthenlteredalloftheextractednounphrasesusingasetofrules. NounphrasesthatcouldneverbeareferringexpressionforobjectsintheJavacode,such as"you"and"me",werelteredout.Then,thesystemattemptedtonda"referent" intheparallelJavasourcefortheremainingnounphrasesasiftheywereallreferring expressions.Finally,thesystemusedthecompatibilityprobability(asshownas f inthe gure)betweentheremainingnounphrasesandtheir"referents"todecidewhichnoun phraseswererealreferringexpressions.Thethresholdforthecompatibilityprobability wassetto90%empirically. Theresearcherwentthroughallofthereferenceresolutioneventswhichidentied areferentwith90%orhighercompatibilityprobability.Therewere417suchreference resolutionevents,whichwas28.1%ofalltheloggedreferenceresolutionevents.System Li had320referenceresolutioneventsinthisclass,andSystem Comparisonhad97. 88

PAGE 89

!"#$%&'()%*"+,+"*$&-%( (-.(/"."++0%1(23,+"440-%4(( )5"%&.60%1(( /"."+"%*4( !"#$%!&#$'()#% I think I should start from the actionPerformed method by creating an array. 71: Public void actionPerformed(){...} *#+#$#(,% Referring Expression: "the actionPerformed method" Prob = 0.97 Referent : { NAME = actionPerformed', CATEGORY = METHOD ... } 7-8%(9:+$4"( (;:8%<0%1( "I", "I", "the actionPerformed method", "an array" /8=">?$4"5( (7-8%(9:+$4"(@=*"+0%1( "I", "I", "the actionPerformed method", "an array" -./(%01$'"#2"3% 456,#$#7%-./(%01$'"#2"3% !"#"$%&'" 8#"% -.% Referring Expression: "an array" Prob = 0.58 Referent : { NAME = table', CATEGORY = ARRAY ... } -.,%*#+#$$5(9%:;<$#""5.(% Figure8-4. Referenceresolutionprocessinthedialoguesystem. Foreachreferenceresolutioneventwithinthisclass,theidentiedreferentwas manuallyexaminedwithintheinvolvedprogrammingcontexttodetermineiftheresult wascorrect. System Li'sreferenceresolutionaccuracyforthissetofreferringexpressionswas 21.6%,andSystem Comparison'swas19.6%. Bothofthetwosystemshadamuchlowerreferenceresolutionaccuracyonthe selectedreferringexpressions,comparedwiththeiro"ineversions,whichwas61.6%and 51.2%.Theloggedreferenceresolutionresultswerecloselyexaminedtondthereasons whichmayshedsomelightonbuildingonlinereferenceresolutionapproachesinthe future.Tohaveamoreaccurateunderstandingofthereferenceresolutionperformance ofthedialoguesystem,Icollectedallofthereferenceresolutionevents,regardlessof thecompatibilityprobability.Therewere1486referenceresolutioneventsloggedin 89

PAGE 90

the41studysessions.Becauseofthewaythesystemselectedreferringexpressions, mostoftheseloggedreferenceresolutioneventswereperformedbythesystemonnoun phrasestoidentifyreferringexpressions.Imanuallytagged169referringexpressionfor System Li'sdata,and158referringexpressionsforSystem Comparison'sdata.Ialso manuallyidentiedtheirreferentsintheJavacode.TheresultshowedSystem Lihada 63.3%referenceresolutionaccuracyonthese169manuallytaggedreferringexpressions. System Comparisonhadanaccuracyof44.9%onthe158referringexpressions.These accuraciesmatchedtheperformanceofthetworeferenceresolutionapproachesintheir o inesetting. Itappearsthatthemainreasonforthepoorperformanceoftheonlinereference resolutionapproacheswastheinaccuratereferringexpressionextraction.Whileextracting referringexpressionsfromalloftherecognizednounphrasesinauserutterance,we combinedarule-basedapproachandtheclassicationresultfromtheclassierweusedto calculatecompatibilityprobabilitiesbetweenreferringexpressionsandtheircandidates. Theintuitionofusingthisclassierwasthatwhenanounphraseiscompatiblewith anentityintheJavacode,thenitislikelytobeareferringexpression.However,this combinedapproachdidnotworkasexpectedinpractice.Thisdirectlya! ectsthe referenceresolutionaccuracyifwecannotaccuratelyidentifyreferringexpressionsin userutterances. Tofurtherillustratethereasonsofthepoorreferringexpressionidentication functionality,weprovidetwofalseexamplesreferringexpressionidenticationinTable 8-4 andTable 8-5 .InTable 8-4 ,nounphrase"astring"wasnotareferringexpression,sinceit isnotspecicallyreferringtoanythingintheJavacode.However,theuserjustcreateda stringintheJavacode,called"scode".Thenounphrase"astring"containedanattribute "VAR TYPE""string"init.Recallthereferenceresolutionapproachtakesthesemantic features,dialoguehistoryfeaturesandthebehaviorhistoryfeaturesasinputs.Sincethe "scode"wasjustcreated,thebehaviorhistoryfeaturessuggestedthat"scode"hadahigh 90

PAGE 91

probabilitybeingthereferent.Also"scode"wasastringvariable,thusthe"scode"had ahighcompatibilityprobability0.939withthenounphrase"astring".Thiscauseda falsepositiveinstance.Similarly,inanotherfalsenegativeexampleshowninTable 8-5 nounphrase"theforloop"wasareferringexpression,anditreferredtoaforstatement inuser'sJavaprogram.Thereferenceresolutionwascorrectlyperformed,butsincethe forstatementwasnotrecentlyoperatedormentioned,thedialoguehistoryandbehavior historysuggestedalowcompatibilityprobability0.791,anditislowerthanthethreshold. Fromthenegativeexamples,wefoundthatitisinsu#cienttoonlyusethe compatibilityinidentifyingreferringexpressions.Thelexicalfeaturesofreferring expressionsandtheirenclosingutterancesalsoplayakeyroleinreferringexpression identication.Thesefeaturesshouldbeconsideredwhilebuildingareferringexpression identicationclassier. Table8-4. Afalsepositiveexampleofreferringexpressionidentication. Utterance "whattodonextifIhaveastringofthezipcode" NounPhrase "astring" Referent { category:VARIABLE,line number:76,name: scode,... } Probability 0.939 Table8-5. Afalsenegativeexampleofreferringexpressionidentication. Utterance "Istheforloopcorrect?" NounPhrase "theforloop" Referent { category:STATEMENT FOR,line number:78, name:for,... } Probability 0.791 91

PAGE 92

CHAPTER9 DISCUSSION Thischapterdiscussessomeofourobservationsinbuildingthedialoguesystemsand conductingtheuserstudy. 9.1NullResults Thepreviouschapterdescribedtheresearchstudytoevaluatethetwoimplemented dialoguesystems.Wedidnotndsignicantresultsforthehypothesisonusersatisfaction anduserengagement.Oneofthereasonscouldbethelowaccuracyoftheonline referenceresolutionapproach,whichwascausedbythereferringexpressionidentication functionality. Anotherreasoncouldlieatthedi!erencebetweenhuman-computerdialoguesand human-humandialogues.Wecomparedthehuman-humandialoguesintheRipplecorpus andthehuman-computerdialoguescollectedinthisprojectbymanuallyannotatingthe numberofutterancesandnumberofreferringeventsineachsession.AsshowninTable 9-1 ,theaveragenumberofutterancesinonesessionoftheRipplecorpuswas130.2, whichwasmuchlower,64.4,inthehuman-computerdialogueswecollected.InRipple corpus,eachsessionlastedabout50-55minutes,andinthestudyconductedinthis project,eachsessionlastedabout40minutes.Thereisahugedi!erencebetweenthese twokindsofdialoguesintermsofutterancefrequencies.Inaddition,thehuman-human dialogueshad0.44referringeventsperutteranceonaverage,andhuman-computer dialoguesonlyhad0.12.Thesenumberssuggestedadi!erentcommunicationpatternin human-humandialoguesandhuman-computerdialogues.Thisdi!erencemaysuggestthat referenceresolutionplaysadi!erentroleinahuman-computerdialoguecomparingwith human-humandialogues.Furtherresearchisneededtoexplainthisphenomenon. Also,asarguedatthebeginningofthisdissertation,referenceresolutionplaysakey roleinnaturallanguagedialogueunderstanding.However,inanaturallanguagedialogue systemforacomplexdomainlikeJavaprogramming,therearemanyothermodulesthat 92

PAGE 93

inuencetheperformanceofthedialoguesystem,suchasdialogueacttagger,utterance topicclassieranduserintentionrecognizer.Referenceresolutiontakese!ecttogether withthesemodulesasanintegratedsystem.Theimprovementofasinglemodulemaynot necessarilyincreasetheperformanceofthewholesystem. Table9-1. Acomparisonbetweenhuman-computerdialoguesandhuman-human dialogues. Average#Utt #RefExp/#Utt Human-computer 64.4 0.12 Human-human 130.2 0.44 9.2Data-drivenApproachinBuildingDialogueSystems Thedialoguesystemsimplementedinthisprojectuseddata-drivenapproachesfor mostoftheessentialfunctionalities,suchasdialogueactclassication,utterancetopic classication,POStagging,nounphrasechunking,andreferenceresolution.Someofthese modelsarelesscloselyrelatedtothedomainofthedialoguesystem.Forexample,wecan trainanounphrasechunkingmodelforthedialoguesystemusingtrainingdatafromthe WallStreetJournalcorpus,sincethegrammaroftheEnglishlanguageusedinatutorial dialogueforJavaprogrammingisverysimilartothatinthenewsfeed.However,some ofthemodelsaremoredomain-specic,whichmeansthesemodelsneedbetrainedusing domain-specicdata. DuetotheavailabilityoftheRipplecorpusthatwasdescribedinChapter 3 ,wecan useitshuman-humandialoguesastrainingdatatobuilddialogueactclassicationand topicclassicationmodelsforthedialoguesystemsinthisproject.Thedialoguesystems inthisprojectsupportaprogrammingtaskthatisalmostthesameasthatintheRipple corpus.So,wecantakeadvantageofthissimilarity.WelookedintotheRipplecorpusto discoverthemostfrequentlymentionedtopicsbythestudentsandthetutorswhilethey areapproachingtheprogrammingtask,andbuilttopicclassiersforthesetopicstohelp thesystembetterunderstanduserutterances.However,thisdata-drivenapproachsu!ers fromdatasparsityproblems.Forexample,oneoftheimportantstepsintheprogramming 93

PAGE 94

taskisconvertingacharacterdigitintoaninteger.Whenstudentsextractacharacter digitfromazipcode,theyneedtoconvertthecharacterdigitintoanintegerandadd theintegerintoanarray.However,weonlyfound8utterancesintheRipplecorpusthat arerelatedtoconvertingacharactertoanintegerinJava.Itisveryhardforthetopic classicationmodeltolearnanaccurateclassierforthisspecictopicgivensuchasmall setoftrainingutterances.Inaddition,therearealsosometopicsthatareuniquetoour dialoguesystems,whichwecannotndtrainingdatafromtheRipplecorpus.TheCOP 3502classattheUniversityofFloridausesanintegrateddevelopmentenvironmentcalled IntelliJ 1 toteachJavaprogramming,whiletheuserinterfaceofourdialoguesystemsis basedonEclipse.So,studentsmayaskthedialoguesystemhowtoruntheirprogram.In thisproject,wemanuallycreatedtrainingutterancesforthesetopicstoalleviatethedata sparsityproblem,butcannottotallyeliminateit. 9.3UnderstandingUsers'JavaProgram-AChallengeinBuildingDialogue SystemsForJavaProgramming OneofthechallengestobuildatutorialdialoguesystemforJavaprogramming liesinunderstandinguser'sJavaprogram.Beforeansweringauser'squestionthatis relatedtoherJavaprogram,thedialoguesystemneedstounderstandthecontextof thequestion.Theuser'scurrentprogramisarguablythemostimportantcontextual informationinthiscase.However,automaticinterpretationofauser'sJavaprogramis averychallengingtask.Therearetwolevelsofinterpretationofauser'sJavaprogram, syntacticalinterpretationandsemanticinterpretation.Thedialoguesystem'sabilityto interprettheuser'sJavaprogramdirectlylimitsthesystem'sabilitytorespondtothe user'squestionsregardingherprogram.Wediscussthislimitationlaterinmoredetailin thissection. 1 https://www.jetbrains.com/idea/ 94

PAGE 95

Thegoalofsyntacticinterpretationistounderstandifauser'sJavaprogramis syntacticallycorrect.Inaddition,itidentiesitemssuchasvariabledeclarations,variable assignmentsandmethodcalls.Correctlyidentifyingtheseoperationsinauser'sJava sourcecodeisessentialtointerpretauser'sJavaprogramsemantically,i.e.understanding whichsteptowardthesolutiontheuserisworkingon.Forexample,whentheuser declaresanintegerarrayatthebeginningofthe"extractDigits()"method,theuser's intentionisprobablycreatinganarraytoholdthe5integersoftheinputzipcode. WeimplementedJavacodesyntacticparsingusinganabstractsyntaxtree(AST) parser.Whentheprogramissyntacticallycorrect,theparsercangenerateaparsefor user'sJavasourcecodewithoutproblems.However,itismorelikelythannotthatthe programissyntacticallyincorrectwhentheuserneedshelpfromthedialoguesystem. Forexample,thestudentmayaskaquestionbeforenishingtypingalineofsource code.Inthiscase,theASTparserfailstoparsethelineofJavasourcecodewithsyntax errors(suchasanincompletelineofJavacode).Toaddressthisproblem,wecreateda rule-basedparsertointerpretuser'sJavaprogram.Thisrule-basedparsercontainsaset ofpatterns.Wematchuser'sprogramwiththesepatternstoidentifythestatusofuser's progresstowardthesolution.However,thenumberofconditionsthatthisrule-based parsercanidentifyisthenumberofconditionsthatthedialoguesystemcanrespondto regardinguser'sprogram.Ifthesourcecodeparsercannot"perceive"aprobleminuser's program,thedialoguesystemcannotreasonablycommentonit.This"granularity"ofthe system'sperceptiondirectlydeterminesthe"granularity"ofthedialoguesthatthesystem couldconduct.Mulkar-Mehtaet.al.arguedthat"granularity"ofanaturallanguage discourseis"thelevelofdetailofdescriptionofaneventorobject" Mulkar-mehtaetal. ( 2011 ). 95

PAGE 96

CHAPTER10 CONCLUSION Thisdissertationhasreportedonthedevelopmentofatutorialdialoguesystemusing aninnovativereferenceresolutionapproachthatIdeveloped LiandBoyer ( 2016 ).In Chapter 6 ,weempiricallyevaluatedthisreferenceresolutionapproachwithanexisting human-humandialoguecorpusforJavaprogramming.Ithenimplementedatutorial dialoguesystemforJavaprogramminganddeployedthereferenceresolutionapproachto evaluateitinrealtimewithhumansubjects. 10.1HypothesisRevisited Thisdissertationfocusesonevaluatingmynovelreferenceresolutionapproachin atutorialdialoguesystem.Weareinterestedinhowwellmyapproachperformsina real-timedialoguesystemcomparedtoacomparisoncondition,anditsimpactonuser satisfactionanduserengagementwheninteractingwiththesystem.Toservethisgoal,we implementedtwotutorialdialoguesystemwithdi!erentreferenceresolutionapproaches, System LiandSystem Comparison.Wehadthreehypotheses: HypothesisI:Moreaccurateo"inereferenceresolutionapproachisalsomore accurateinareal-timedialoguesystem. HypothesisII:Moreaccuratereferenceresolutionleadstohigherusersatisfactionina tutorialdialoguesystem. HypothesisIII:Moreaccuratereferenceresolutionleadstohigheruserengagementin atutorialdialoguesystem. Thersthypothesiswasconrmed,butwedidnotndevidenceforthesecond andthethirdhypotheses.Theperformanceofadialoguesystemisdeterminedbythe performanceofmultipledi!erentmodules.Improvingreferenceresolutionaccuracyinthe implementedtutorialdialoguesystemmaynotdirectlyincreasethesystemperformance. Identifyingthe"bottleneck"moduleofthetutorialdialoguesystemwillbeainteresting researchquestion. 96

PAGE 97

Summary. Thisdocumenthaspresentedourworkonautomaticreferringexpression extraction,semanticlabelingofreferringexpressions,andareferenceresolutionapproach combininglearnedsemanticsandcontextualfeaturesofthedialogue.Thepresented referenceresolutionapproachwasevaluatedusinganexistinghuman-humantutorial dialogueforJavaprogramming.Then,Ipresentedtheimplementationofatutorial dialoguesystemforJavaprogramming.Irstdenedthefunctionalitiesthesystem requiresandthendescribeditsarchitectureandtheimplementationofitsmodule.To evaluatetheimpactofournovelreferenceresolutionapproachwithintheimplemented tutorialdialoguesystem,Iimplementedtwodi!erentversionsofreferenceresolution approachesandconductedauserstudywith41undergraduatestudentparticipants.We didnotndasignicantdi!erenceonusersatisfaction( p =0.361)oruserengagement ( p =0.236)betweenthetwosystemswithdi! erentreferenceresolutionapproaches. Contributions. Thisprojectmakestwomaincontributionstothenaturallanguage dialoguesystemresearchcommunity.First,theimplementedtutorialdialoguesystem inthisprojectisoneofthersttosupportacomplexdomainlikeJavaprogramming, inwhichtheentitiesandenvironmentdynamicallychangebecauseoftheuser'sactions. Second,thisworkisthersttoinvestigatereal-timereferenceresolutionapproaches insuchacomplexsituateddialoguesystem.Weexamineboththeperformanceofthe referenceresolutionmodule,andtheimpactofdi!erentreferenceresolutionapproacheson theperformanceofthedialoguesystem. Topushdialoguesystemstowardassistingpeopleinmoreandmorecomplextasks, weneedtoaddresssomechallengingproblemsincludingreferenceresolution.This dissertationinvestigatesthischallengewithinatask-orienteddialoguesysteminacomplex domain.Thisworkisasteptowardpracticaldialoguesystemsthatsupportusersinmore complexdomains. 10.2Limitations Thisresearchprojecthasseverallimitations. 97

PAGE 98

Firstly,thescaleoftheuserstudyislimited,whichmaybeoneofthereasonsthat wehadanullresultforthehypothesesondialoguesystemperformance.Givenlimited time,werecruited43undergraduatestudentsfromtheCOP3502classattheUniversity ofFlorida.Moredatacouldleadtomorecondentresults. Secondly,accordingtotheparticipants'feedback,thesystemperformanceislimited byitsabilitytoaccuratelyunderstandusers'utterances.Theparticipantssometimes needtorephrasetheirquestionsmultipletimesbeforethesystemcouldunderstandthem. Moretrainingdatacanhelpthesystemtotrainamoreaccuratetopicclassier,whichcan resultinabetternaturallanguageunderstandingresult. Thirdly,thesystem'sperformancewasalsolimitedbytheJavaprogramparser.With amoreaccurateJavasourceparser,thesystemcouldidentifymorenegrainederrorsin users'program,andfurthergivemoreaccuratefeedback. 10.3FutureWork Thisdissertationresearchinvestigatedtheperformanceofreferenceresolution approachesinareal-timetutorialdialoguesystemforJavaprogramming.Bothofthe evaluatedapproachesarestillfarfromperfectlyidentifyinguser'sreferents.Accordingto theresultanalysis,moreaccuratereferringexpressionsidenticationapproachisrequired tohaveabetterreferenceresolutionperformance.Anotherpromisingresearchdirectionis toinvestigatemorefeaturesfromthedialogueandthesituatedenvironmenttoinformthe referenceresolutionmodule.Forexample,theverbsinthesameutterancecouldbeagood feature.Inaddition,therearealsocoreferencerelationshipsinsituateddialogue,andit willbeinterestingtoconsiderreferenceresolutionandcoreferenceresolutionatthesame time.Thetutorialdialoguesystemcanbeviewedasastartpointforaseriesofbetter performeddialoguesystems,whichcouldbedevelopedbyreningsomeofthemodulesin theexistingsystem.ThetutorialdialoguesystemcouldbenetfromtheintroductoryJava class'sinstructors'inputs.Theyshouldhaveabetterunderstandingofthesystemusers' Javaknowledge,whichcouldhelpthesystemtobetteradapttowardusers'needs.Also, 98

PAGE 99

thedata-drivensystem'sperformancewaslimitedbythelackoftrainingdata.Itwillbea interestingresearchquestiontohavethesystemlearnfromtheinteractionwithusers. 99

PAGE 100

APPENDIXA PRE-SURVEY Name UFID Please indicate how much you agree or disagree with the following statements. Please indicate how much you agree or disagree with the following statements. Strongly disagree Disagree Neutral Agree Strongly Agree Generally, I have felt secure about attempting computer programming problems. I am sure I could do advanced work in computer science. I am sure that I can learn programming. I think I could handle more di cult programming problems. I can get good grades in computer science. I have a lot of self "# condence when it comes to programming. Strongly disagree Disagree Neutral Agree Strongly Agree I'll need programming for my future work. I study programming FigureA-1. Pre-survey. 100

PAGE 101

Please indicate how much you agree or disagree with the following statements. I study programming because I know how useful it is. Knowing programming will help me earn a living. Computer science is a worthwhile and necessary subject. I'll need a rm mastery of programming for my future work. I will use programming in many ways throughout my life. Strongly disagree Disagree Neutral Agree Strongly Agree I like writing computer programs. Programming is enjoyable and stimulating to me. When a programming problem arises that I can't immediately solve, I stick with it until I have the solution. Once I start trying to work on a program, I nd it hard to stop. When a question is left unanswered in computer science class, I continue to think about it afterward. I am challenged by programming problems I can't understand immediately. FigureA-2. Pre-survey. 101

PAGE 102

Powered by Qualtrics University of Florida Gainesville, FL 32611 Terms of Use FigureA-3. Pre-survey. ID Q1-Q6 Q7-Q12 Q13-Q18 2 424443 545455 444433 3 334432 233313 224224 4 113132 243433 332224 5 545455 444443 555444 6 334333 444443 444434 7 445445 555555 554544 8 224322 555155 443434 9 334333 554454 554444 10 545444 555554 444344 11 114112 333423 114215 12 214422 242422 434224 13 324323 223311 343433 14 234231 443343 332444 15 324322 344433 443344 16 445443 555555 444554 17 224221 555543 344334 TableA-1. Acompletepre-surveyresultsforstudentsused System Li 102

PAGE 103

ID Q1-Q6 Q7-Q12 Q13-Q18 18 334442 444444 444534 19 444344 444444 444444 20 444444 444444 334224 21 234241 555555 444525 22 334334 555555 345345 23 545555 554455 555555 24 334243 444434 443223 25 345443 555544 554444 26 335242 555555 444444 27 434443 342434 443434 28 234332 455555 444445 29 324443 454444 444454 30 224233 555555 554344 31 224232 244443 323222 32 333333 444444 333434 TableA-2. Acompletepre-surveyresultsforstudentsused System Comparison 103

PAGE 104

APPENDIXB POST-SURVEY Name UFID I think that I would like to use this system frequently. I found the system unnecessarily complex. I thought the system was easy to use. I think that I would need the support of a technical person to be able to use this system. Strongly Disagree Extremely likely 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-1. Post-survey. 104

PAGE 105

I found the various functions in this system were well integrated. I thought there was too much inconsistency in this system. I would imagine that most people would learn to use this system very quickly. I found the system very cumbersome to use. I felt very condent using the system. I needed to learn a lot of things before I could get going with this system. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-2. Post-survey. 105

PAGE 106

Powered by Qualtrics Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 University of Florida Gainesville, FL 32611 Terms of Use FigureB-3. Post-survey. 106

PAGE 107

This tutoring system is attractive. This tutoring system was aesthetically appealing. I liked the graphics and images used in this tutoring system. This tutoring system appealed to my visual senses. The screen layout of this tutoring system was visually pleasing. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-4. Post-survey. 107

PAGE 108

Learning with this tutoring system was worthwhile. I consider my experience a success. Doing this task did not work out the way I planned. My experience was rewarding. I would recommend this tutoring system to my friends and family. I lost myself in this task. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree FigureB-5. Post-survey. 108

PAGE 109

I was so involved in this task that I lost track of time. I blocked out things around me while I was working with this tutoring system. When I was doing this work, I lost track of the world around me. The time I spent on this task just slipped away. I was absorbed in the task. During this experience, I let myself go. 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-6. Post-survey. 109

PAGE 110

During this experience, I let myself go. I was really drawn nding the solutions. I felt involved in this task. This experience was fun. I continued to use this tutoring system out of curiosity. This tutoring system incited my curiosity. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-7. Post-survey. 110

PAGE 111

I felt interested in this tutoring system. I felt frustrated while using this tutoring system. I felt this tutoring system confusing to use. I felt annoyed while using this tutoring system. I felt discouraged while using this tutoring system. Using this tutoring system was mentally taxing. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 FigureB-8. Post-survey. 111

PAGE 112

Powered by Qualtrics This experience was demanding. I felt in control of the experience. I could not do something I needed to do with this tutoring system. Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 Strongly Disagree Strongly Agree 0 1 2 3 4 5 6 7 8 9 10 University of Florida Gainesville, FL 32611 Terms of Use FigureB-9. Post-survey. 112

PAGE 113

ID Q1-Q20 Q21-Q41 1 7182808072 5575575688 7655888057 66750222275 2 3232222522 5555555555 2252522252 88888978559 3 529064102101 999999998 4444558086 56721111283 4 9181839192 2283323779 75455910299 10101022222285 5 3243446745 6677777886 6655653644 55576744558 6 4363543645 4433466654 7776745654 66673534266 7 10180939161 2276384475 9856777358 46661552256 8 2527363715 2256456678 1354523351 776777534710 9 801007810560 10101010101010101010 5433108100108 87750700568 10 5162666341 1333363666 5655656155 46662411177 11 6266555353 4565664755 7766776556 66653643365 12 5273647564 4424445775 4565562956 46682841177 13 3277387327 5787788888 3511714633 101010832335510 14 7575388252 7677555777 62222681088 66693775669 15 10584769245 59788889910 101010101099178 10101032222287 16 100907393102 5372464478 10776710101910 86662622276 17 6376468260 3877878787 7778776577 88755323368 18 866467664 5666664666 5555666664 66666665566 19 8184537224 7775665998 6653875766 867747536610 20 4353427261 5655555646 6667765655 56674741869 21 728137186 3362362767 7867877267 77832652778 22 8472858473 3465666656 7677766666 88854433264 TableB-1. Acompletepost-surveyresultsforusersused System Li 113

PAGE 114

ID Q1-Q20 Q21-Q41 23 3285466632 5677775875 7777755574 677103884247 24 8110085100100 1487575797 10533997179 10101010300098 25 6181467281 1331145787 8434577275 10101012000080 26 3181576542 66106666999 5888855253 10101062622478 27 4764467461 2065676666 4553767164 76663522286 28 7310187100106 2222287888 8946888578 86621211282 29 3290238583 2132232454 7777736663 66652632276 30 8092728171 91088810108107 835389911010 107961601770 31 77321810550 01010510107101010 5555533855 5101030553553 32 55565957810 0050080101010 5781010101010105 777357050102 33 7473768282 4363365555 8453576667 77754564474 34 901008110082 9999997101010 6666710100810 78900000080 35 7662568567 899979510106 8877863937 87775672255 36 6372647375 6544453776 4554664765 45663332257 37 8373958577 3777771787 888888100108 77722211182 38 6442345534 7766766666 6557763736 6667855364 39 3190429141 4445444444 4324453834 644444444410 40 10394939482 4333333345 9867798388 88853221046 41 7375737356 7555555555 5444565656 77774755446 TableB-2. Acompletepost-surveyresultsforusersused System Comparison 114

PAGE 115

REFERENCES Ariel,Mira."ReferringandAccessibility." JournalofLinguistics 24(1988).1:6587. Austin,JL."HowToDoThingsWithWords."(1962). Bangor,Aaron,Kortum,PhilipT,Miller,JamesT,Bangor,Aaron,Kortum,PhilipT, Miller,JamesT,Empirical,An,Bangor,Aaron,Kortum,PhilipT,andMiller,JamesT. "AnEmpiricalEvaluationoftheSystemUsabilityScaleUsabilityScale." International JournalofHumanComputerInteraction 24(2008).6:574594. Bangor,Aaron,Sta!,Technical,Kortum,Philip,Miller,James,andSta!,Technical. "DeterminingWhatIndividualSUSScoresMean:AddinganAdjectiveRatingScale." JournalofUsabilityStudies 4(2009).3:114123. Blitzer,John."DomainAdaptationwithStructuralCorrespondenceLearning." Proceedingsofthe2006ConferenceonEmpiricalMethodsinNaturalLanguageProcessing (EMNLP2006) .July.2006,120128. Boyer,KristyElizabeth. StructuralandDialogueActModelinginTask-OrientedTutorial Dialogue .Ph.D.thesis,2010. Boyer,KristyElizabeth,Ha,EunYoung,Phillips,Robert,Wallis,MichaelD.,Vouk, MladenA.,andLester,JamesC."DialogueActModelinginaComplexTask-Oriented Domain." Proceedingsofthe11thAnnualSIGDIALMeetingonDiscourseandDialogue 2010,297305. Boyer,KristyElizabeth,Phillips,Robert,Ingram,Amy,Ha,EunYoung,Wallis, MichaelD,Vouk,MladenA,andLester,JamesC."InvestigatingtheRelationship BetweenDialogueStructureandTutoringE!ectiveness:AHiddenMarkovModeling Approach." InternationalJournalofArticialIntelligenceinEducation(IJAIED) 21 (2011).1:6581. Brien,HeatherLO,Cairns,Paul,andHall,Mark."APracticalApproachtoMeasuring UserEngagementwiththeRenedUserEngagementScale(UES)andNewUESShort Form." InternationalJournalofHuman-ComputerStudies 112(2018).December2017: 2839. Brill,Eric."Transformation-BasedError-DrivenLearningandNaturalLanguage Processing:ACaseStudyinPart-of-SpeechTagging." Computationallinguistics 21(1995).4:543565. Can,AysuEzen. UnsupervisedDialogueActModelingforTutorialDialogueSystems Ph.D.thesis,2016. Chai,Joyce,Hong,Pengyu,andZhou,Michelle."AProbabilisticApproachtoReference ResolutioninMultimodalUserInterfaces." Proceedingsofthe9thInternational ConferenceonIntelligentUserInterfaces-IUI'04 (2004):7077. 115

PAGE 116

Corbin,Carina,Morbini,Fabrizio,andTraum,David."CreatingaVirtualNeighbor." NaturalLanguageDialogSystemsandIntelligentAssistants (2015):203208. Crystal,David. ADictionaryofLinguisticsandPhonetics(4thed.) .OxfordUniversity Press,1997. Culotta,Aron,Wick,Michael,andMccallum,Andrew."First-OrderProbabilisticModels forCoreferenceResolution." Proceedingsofthe2007AnnualConferenceoftheNorth AmericanChapteroftheAssociationforComputationalLinguistics(NAACL) .2007, 8188. Daume,Hal."FrustratinglyEasyDomainAdaptation." arXivpreprintarXiv:0907.1815 (2009). Denis,PascalandBaldridge,Jason."SpecializedModelsandRerankingforCoreference Resolution." ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguage Processing (2008).October:660669. Dzikovska,MyroslavaO,Callaway,CharlesB,Farrow,Elaine,Marques-pita,Manuel, Matheson,Colin,andMoore,JohannaD."AdaptiveTutorialDialogueSystemsUsing DeepNLPTechniques." NAACLHLTDemonstrations .April.2007,56. Finkel,JennyRoseandManning,ChristopherD."HierarchicalBayesianDomain Adaptation." theNorthAmericanChapteroftheAssociationforComputational LinguisticsHumanLanguageTechnologies(NAACLHLT)2009Conference .June. 2009,602610. Forsythand,EricNandMartell,CraigH."LexicalandDiscourseAnalysisofOnlineChat Dialog." SemanticComputing,2007.ICSC2007 .2007,1926. Funakoshi,Kotaro,Nakano,Mikio,Tokunaga,Takenobu,andIida,Ryu."AUnied ProbabilisticApproachtoReferringExpressions." Proceedingsofthe13thAnnual MeetingoftheSpecialInterestGrouponDiscourseandDialogue (2012).July:237246. Garrette,DanandBaldridge,Jason."LearningaPart-of-SpeechTaggerfromTwoHours ofAnnotation." Proceedingsofthe2013ConferenceoftheNorthAmericanChapterof theAssociationforComputationalLinguisticsHumanLanguageTechnologies(NAACL HLT2013) .June.2013,138147. Gorniak,PeterandRoy,Deb."SituatedLanguageUnderstandingasFilteringPerceived A !ordances." CognitiveScience 31(2007).2:197231. Grosz,BJ,Weinstein,S,andJoshi,AK."Centering-aFrameworkforModelingthe LocalCoherenceofDiscourse." ComputationalLinguistics 21(1995).2:203225. Hovy,Dirk,Plank,Barbara,andSgaard,Anders."MiningforUnambiguousInstancesto AdaptPart-of-speechTaggerstoNewDomains." Proceedingsofthe2015Conferenceof theNorthAmericanChapteroftheAssociationforComputationalLinguisticsHuman LanguageTechnologies(NAACLHLT2015) .2015,12561261. 116

PAGE 117

Iida,Ryu,Kobayashi,Shumpei,andTokunaga,Takenobu."IncorporatingExtra-linguistic InformationintoReferenceResolutioninCollaborativeTaskDialogue." Proceedings ofthe48thAnnualMeetingoftheAssociationforComputationalLinguistic (2010): 12591267. Iida,Ryu,Yasuhara,Masaaki,andTokunaga,Takenobu."Multi-modalReference ResolutioninSituatedDialoguebyIntegratingLinguisticandExtra-LinguisticClues." Proceedingsofthe5thInternationalJointConferenceonNaturalLanguageProcessing (IJCNLP2011) (2011).2003:8492. Jiang,JingandZhai,Chengxiang."InstanceWeightingforDomainAdaptationin NLP." the45thAnnualMeetingoftheAssociationofComputationalLinguistics .2007, 264271. Kennington,CaseyandSchlangen,David."SimpleLearningandCompositional ApplicationofPerceptuallyGroundedWordMeaningsforIncrementalReference Resolution." ProceedingsoftheConferencefortheAssociationforComputational Linguistics(ACL) (2015):292301. La!erty,John,McCallum,Andrew,andPereira,FernandoCN."ConditionalRandom Fields:ProbabilisticModelsforSegmentingandLabelingSequenceData." Proceedings oftheInternationalConferenceonMachineLearning .2001,282289. Lappin,ShalomandLeass,HerbertJ."AnAlgorithmforPronominalAnaphora Resolution." ComputationalLinguistics 20(1994):535561. Lemon,Oliver,Bracy,Anne,Gruenstein,Alexander,andPeters,Stanley."TheWITAS Multi-ModalDialogueSystemI." ProceedingsofINTERSPEECH .2001,15591562. Li,Shen,Graca,JoaoV,andTaskar,Ben."Wiki-lySupervisedPart-of-SpeechTagging." the2012JointConferenceonEmpiricalMethodsinNaturalLanguageProcessingand ComputationalNaturalLanguageLearning .2012,13891398. Li,XiaolongandBoyer,KristyElizabeth."SemanticGroundinginDialogueforComplex ProblemSolving." Proceedingsofthe2015ConferenceoftheNorthAmericanChapter oftheAssociationforComputationalLinguisticsHumanLanguageTechnologies (NAACLHLT2015) .2015,841850. ."ReferenceResolutioninSituatedDialoguewithLearnedSemantics." the17th AnnualMeetingoftheSpecialInterestGrouponDiscourseandDialogue .2016,329338. Liu,ChangsongandChai,JoyceY."LearningtoMediatePerceptualDi!erencesin SituatedHuman-RobotDialogue." ProceedingsoftheTwenty-ninthAAAIConference (AAAI15) .2015,22882294. Liu,Changsong,She,Lanbo,Fang,Rui,andChai,JoyceY."ProbabilisticLabelingfor E #cientReferentialGroundingBasedOnCollaborativeDiscourse." Proceedingsofthe 117

PAGE 118

52ndAnnualMeetingoftheAssociationforComputationalLinguistics(ACL) (2014): 1318. Liu,Chansong,Fang,Rui,andChai,JoyceYue."TowardsMediatingSharedPerceptual BasisinSituatedDialogue." Proceedingsofthe13thAnnualMeetingoftheSpecial InterestGrouponDiscourseandDialogue (2012).July:140149. Manning,ChristopherD."Part-of-SpeechTaggingfrom97%to100%:IsItTimefor SomeLinguistics?" InternationalConferenceonIntelligentTextProcessingand ComputationalLinguistics .2011,171189. Manning,ChristopherD,Bauer,John,Finkel,Jenny,andBethard,StevenJ."The StanfordCoreNLPNaturalLanguageProcessingToolkit." the52ndAnnualMeetingof theAssociationforComputationalLinguistics:SystemDemonstrations (2014):5560. Matuszek,Cynthia,Bo,Liefeng,Zettlemoyer,LukeS,andFox,Dieter."Learningfrom UnscriptedDeicticGestureandLanguageforHuman-RobotInteractions." Proceedings ofAAAI2014 (2014):25562563. Mccarthy,JosephFandLehnert,WendyG."UsingDecisionTreesforCoreference Resolution." ProceedingsoftehFourteenthInternationalJointConferenceonArticial Intelligence (1995). McClosky,David,Charniak,Eugene,andJohnson,Mark."AutomaticDomainAdaptation forParsing." Proceedingsofthe2010AnnualConferenceoftheNorthAmerican ChapteroftheAssociationforComputationalLinguistics(HLT-NAACL) .2010,2836. Mulkar-mehta,Rutu,Hobbs,Jerry,andHovy,Eduard."GranularityinNaturalLanguage Discourse." ProceedingsoftheNinthInternationalConferenceonComputational Semantics .Section3.2011,360364. Owoputi,Olutobi,O'Connor,Brendan,Dyer,Chris,Gimpel,Kevin,Schneider,Nathan, andSmith,NoahA."ImprovedPart-of-SpeechTaggingforOnlineConversationalText withWordClusters." Proceedingsofthe2013ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguisticsHumanLanguageTechnologies (NAACLHLT2013) .June.2013,380390. Plank,Barbara,Hovy,Dirk,McDonald,Ryan,andSgaard,Anders."AdaptingTaggers toTwitterwithNot-so-distantSupervision." COLING2014,the25thInternational ConferenceonComputationalLinguistics:TechnicalPapers .2014,17831792. Ponzetto,SimonePaoloandStrube,Michael."ExploitingSemanticRoleLabeling, WordNetandWikipediaforCoreferenceResolution." ProceedingsofthemainconferenceonHumanLanguageTechnologyConferenceoftheNorthAmericanChapterofthe AssociationofComputationalLinguistics (2006).2:192199. 118

PAGE 119

Poon,HoifungandDomingos,Pedro."UnsupervisedSemanticParsing." Proceedingsof the2009ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP) August.2009,110. Rose,CarolynP."AFrameworkforRobustSemanticInterpretation." Proceedingsof the1stNorthAmericanChapteroftheAssociationforComputationalLinguistics Conference(NAACL) .2000,311318. Schlangen,David,Zarriess,Sina,andKennington,Casey."ResolvingReferencesto ObjectsinPhotographsusingtheWords-As-ClassiersModel." Proceedingsofthe54th AnnualMeetingoftheAssociationforComputationalLinguistics(ACL2016) (2016): 12131223. Schmidt,MarkandSwersky,Kevin." http://www.cs.ubc.ca/ schmidtm/Software/ crfChain.html ."2008. Sha,FeiandPereira,Fernando."ShallowParsingwithConditionalRandomFields." the 2003ConferenceoftheNorthAmericanChapteroftheAssociationforComputational LinguisticsHumanLanguageTechnologies(HLT-NAACL2003) .June.2003,134141. Sidner,CandaceL."Attention,Intentions,andtheStructureofDiscourse."12(1986). Sidner,CandaceL,Lee,Christopher,Lesh,Neal,andRich,Charles."Explorationsin EngagementforHumansandRobots." ArticialIntelligence 166(2005).1-2:140164. Soon,WM,Ng,HT,andLim,DCY."AMachineLearningApproachtoCoreference ResolutionofNounPhrases." Computationallinguistics (2001). Strik,Helmer,Russel,Albert,Cucchiarini,Catia,Boves,Lou,Oostdijk,N,and Cucchiarini,C."ASpokenDialogueSystemForPublicTransportInformation." InternationalJournalofSpeechTechnology 2(1997):119129. Tjong,ErikFandSang,Kim."IntroductiontotheCoNLL-2000SharedTask: Chunking." the2ndWorkshoponLearningLanguageinLogicandthe4thConferenceonComputationalNaturalLanguageLearning .2000,127132. Toutanova,Kristina,Klein,Dan,andManning,ChristopherD."Feature-Rich Part-of-SpeechTaggingwithaCyclicDependencyNetwork." HumanLanguage Technologies:The2003AnnualConferenceoftheNorthAmericanChapterofthe AssociationforComputationalLinguistics .2003,252259. Vanlehn,Kurt,Jordan,PamelaW,Ros,CarolynP,Bhembe,Dumisizwe,Michael,B, Gaydos,Andy,Makatchev,Maxim,Pappuswamy,Umarani,Ringenberg,Michael, Roque,Antonio,Siler,Stephanie,andSrivastava,Ramesh."TheArchitectureof Why2-Atlas:ACoachforQualitativePhysicsEssayWriting."(2002):158167. Wen,Tsung-Hsien,Vandyke,David,Mrksic,Nikola,Gasic,Milica,Rojas-Barahona, LinaM,Su,Pei-Hao,Ultes,Stefan,andYoung,Steve."ANetwork-basedEnd-to-End TrainableTask-orientedDialogueSystem.",2016. 119

PAGE 120

Xue,NianwenandPalmer,Martha."CalibratingFeaturesforSemanticRoleLabeling." Proceedingsofthe2004ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP) .2004,8894. Yang,Xiaofeng,Zhou,Guodong,Su,Jian,andTan,ChewLim."CoreferenceResolution UsingCompetitionLearningApproach." Proceedingsofthe41stAnnualMeetingon AssociationforComputationalLinguistics (2003):176183. 120

PAGE 121

BIOGRAPHICALSKETCH XiaolongLireceivedhisPh.D.fromtheUniversityofFloridainAugust2018. Beforethat,hereceivedhisbachelor'sandmaster'sdegreesincomputerengineeringand technologyin2008and2012fromNorthwesternPolytechnicalUniversityandZhejiang UniversityinChina,respectively.HestartedhisPh.D.programincomputersciencein 2012atNorthCarolinaStateUniversityandthentransferredtotheUniversityofFlorida withtheLearnDialogueresearchgroupin2015. 121