<%BANNER%>
Jaqi Language Archive : XML Documentation ( Version 1.1)
ALL VOLUMES CITATION DOWNLOADS
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00103162/00001
 Material Information
Title: Jaqi Language Archive : XML Documentation ( Version 1.1)
Physical Description: data set
Language: English
Creator: Beck, Howard
Hardman, MJ
Legg, Sue
Publisher: University of Florida
Place of Publication: Gainesville, FL
Publication Date: 2011
 Subjects
Subjects / Keywords: Data set
 Notes
Abstract: Data set created from an NSF funded project to document endangered languages.
General Note: Initial explanatory PDF available in 2010 with initial data set. Updated PDF loaded on June 14, 2012 with new data sets added. Earlier version suppressed to prevent confusion; earlier versions always retained in permanent preservation repository and available by request: ufdc@uflib.ufl.edu.
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier:
System ID: UF00103162:00001

Downloads
Full Text

PAGE 1

Jaqi Language Archive XML Documentation V1.0 University of Florida February, 2010 This document provides a description of the XML files used to archive the Jaqi language database. The Jaqi language database o r archive consists of three languages: Aymara, Jaqaru, and Kawki. Archival resources for these languages includes XML representations of all linguistic data structures that were created for these languages, including audio (mp3) and image (jpg,gif) files. The data were originally stored in a database [] and were all exported into the XML files described in this document for archival purposes. It is not necessary to use the original database software to utilize this archive. The data are archived in such a way that they can be imported into any database by someone familiar with XML technology. The data in this archive are arranged in a tree structure. In terms of the terminology used below, each language in the Jaqi archive is considered to be a “project”. The Each project consists of chapters, and each chapter contains modules. A chapter reflects a logical grouping of information. For example, the Aymara database was originally developed for teaching the Aymara language, and consists of chapters corresponding to original lesson plans. The Jaqaru and Kawki chapters contain logical groupings of field notebooks. The modules within a chapter could contain dialogs and exercises (Aymara) or individual dialogs recorded in field notebooks (Jaqaru and Kawki). A module can contain individual phrases, or it can contain dialogs, another subgrouping consisting of individual phrases. Each chapter in each language is contained within a single XML file named after that chapter, and stored in a directory named after the language. Associated audio and image subdirectories contain media for that chapter. These media are referenced within the XML file in the individual data structures where they are used. A single XML Schema File (langed.xsd) defines the schema or all the XML files, and all the XML files have been validated successfully against this schema. Each phrase in the database consists of [someday phrases in the sense of smaller grammaticcal structures?], words, and each word can consist of two or more allomorphs. Each allomorph is associated with a base morpheme. There are data structures for each of these encoded in the XML. The same morpheme, allomorph, word, and phrase can appear more than once in the database, and they are stored in the database as individual objects identified by a unique object identifier (OID). In the database, when one of these data structures is used in more than one place, there is a reference to the object by OID, rather than duplicate the object. These OIDs also appear in the XML. Information for each phrase, word, allomorph, or morpheme is repeated in the XML at each location where it is used. This is rather redundant, this was done for completeness. If a phrase is used in one or more location, there is some context dependent information, namely the sound recording, origin volume, origin page, and speaker, that is recorded separately for each location. This information appears in the XML in context where the particular phrase appears. That is, even if the same phrase is used more than once (thus the phrase OID appears in more than one place), this context

PAGE 2

specific information is included at the particular occurrence, whereas all other non content specific information about the phrase is repeated and is redundant. Chapter Each chapter is referenced by an element in the project instance. Each chapter consists of: Attributes • title Elements label • objectives • outline • cultural notes • resources • modules (zero or more modules in this dialog) Module Module can be phrase or dialog module. Each module contains: Attributes • title • OID • type (phrase or dialog) Elements • label • introduction • instructions • notes • phrases or dialogs (depends on module type) Phrases For modules containing phrases. Elements phrase (zero or more phrases in this module) Dialog For modules containing dialogs. Each dialog contains phrases.

PAGE 3

Attributes • title Elements • phrase (zero or more phrases in this dialog) Phrase Each phrase data structure contains: Attributes OID (object identifier) term (term used to refer to phrase in this language) original phrase (should be same as term) annotated phrase (phrase with annotation marks) audio (file name of audio recording or this phrase – mp3) image (file name of image associated with this phrase – jpg or gif) origin volume (notebook volume where phrase is located) origin page (page in notebook volume where phrase is located or started ) speaker (identifies the person who spoke the phrase) Aymara only; should we indicate this now/here? Or hope that some year there be succh exercises for the other? Or say Amara only and remove it if that someday comes? que string (used to prompt students if phrase is a question, simple string) Ditto que class oid (used to prompt student if phrase is a question, refers to a class) Ditto que instance oid (used to prompt student if phrase is a question, refers is an instance) Ditto question (can be “question “answer or “answer2 identifies phrase as a “question “answer or “second answer) constraints (string notation placing constraints on relationships among entities in the phrase) Elements • gloss (English “eng or Spanish “spa”) • notes (rich text providing notes about this phrase) • words (sequence of Word appearing in this phrase) Word Each word can contain: Attributes OID (object identifier) term (term used to refer to word in this language) original word (should be same as term)

PAGE 4

image (file name of image associated with this word) Elements gloss (English “eng” or Spanish “spa”) allomorphs (word is decomposed into a sequence of zero or more allomorph s ) Allomorph Each allomorph can contain: Attributes OID (object identifier) term (term used to refer to word in this language) original allomorph (should be same as term) base morpheme OID (object identifier of base morpheme or this allomorph) base morpheme term (term for base morpheme in this language) image (file name of image associated with this allmorph) type (“root” or “suffix”) pos (part of speech – see below) mcrule (morphological conditioning rule – see below) Note that image, type, pos, and mcrule are associated with and are unique to the base morpheme, but repeated in each allomorph description that uses the base morpheme. Elements gloss (English “eng” or Spanish “spa”) suballomorphs (allomorph is decomposed into a sequence of zero or more allomorph s ) Root POS ambivalent verb shape verb language verb particle quantity particle greeting particle negative particle noun locational personal pronoun interrogative pronoun deictic pronoun toponym

PAGE 5

people name time number syntactic linker proroot unknown Suffix POS nominal possessive nominal locational nominal directional nominal J/K verbal derivation Aymara only verbal stem derivation Aymara only verbal person derivation J/k motion modifier derivation verbal inflectional J/k principal clause inflectional subordinate clause inflectional J/k temporal subordinate inflectional J/k nominal subordinate inflectional J/k invariables subordinate inflectional thematic sentence Aymara only independent Unknown These work well, I think, but often they don't seem to come back up after saving; that never happened w/ Aymara Morphological Conditioning Rules These are Aymara only and sufficient therefor. They work somewhat for J/K; they need a lot of working on; I keep hoping there would be someone who would do a thesis usting the db. Meanwhile, there is a lot of n; I need to think up at least a couple more, like some symbol that would be: preceding V unless followed by another suffix, or preceding C unless followed by another suffix – stuff like that. Suggestions welcome cc cv vc vv c v vSS nSS

PAGE 6

n null



PAGE 1

Jaqi Language Archive XML Documentation V1.0 University of Florida February, 2010 This document provides a description of the XML files used to archive the Jaqi language database. The Jaqi language database and archive consists of three languages: Aymara, Jaqaru, and Kawki. Archival resources for these languages includes XML representations of all linguistic data structures that were created for these languages, including audio (mp3) and image (jpg,gif) files. The da ta were ori ginally stored in a database and were all exported into the XML files described in this document for archival purposes. It is not necessary to use the original database s of tware to utilize this archive. T he data are archived in such a way th at they can be imported into any database by someone familiar with XML technology. The data in this archive are arranged in a tree structure. In terms of the terminology used below, each Each project consists of chapters and each chapter contains modules. A chapter reflects a logical grouping of information. For example, the Aymara database was originally developed for teaching the Aymara language, and consists of chapters correspondi ng to original lesson plans. The Jaqaru and Kawki chapter s contain logical groupings of field notebooks. The modules within a chapter could contain dialogs and exercises (Aymara) or individual dialogs recorded in field notebooks (Jaqaru and Kawki). A module can contain individual phrases, or it can contain dialogs, another subgrouping consisting of individual phrases. Each chapter in each language is contained within a single XML file named after that chapter and stored in a directory named after the language. Associated audio and image subdirectories contain media for that chapter These media are referenced within the XML file in the individual data structures where they are used. A single XML Schema File (langed.xsd) defines the schema or all the XML files, and all the XML files have been validated successfully against this schema. Each phrase in the database consists of words, and each word can consist of two or more allomorphs. Each allomorph is associated with a base morpheme. There are data structures for each of these encoded in the XML. The same morpheme, allomorph, word, and phrase can appear more than once in the database, and they are stored in the database as individua l objects identified by a unique object identifier (OID). In the database, when one of these data structures is used in more than one place, there is a reference to the object by OID, rather than duplicate the object. These OIDs also appear in the XML. Information for each phrase, word, allomorph, or morpheme is repeated in the XML at each location where it is used. This is rather redundant, this was done for completeness. If a phrase is used in one or more location, there is some context dependent information, namely the sound recording, origin volume, origin page, and speaker, that is recorded separately for each location. This information appears in the XML in context where the particular phrase appears. That is, even if the same phrase is used more than once (thus the phrase OID appears in more than one place), this context specific information is i ncluded at the particular occurre nce, whereas all other non content specific information about the phrase is repeated and is redundant.

PAGE 2

Chapter Each chapter is referenced by an element in the project instance. Each chapter consists of : Attributes title Elements label objectives outline cultural notes resources modules (zero or more modules in this dialog) Module Module can be phrase or dialog module. Each module contains : Attributes title OID type (phrase or dialog) Elements label introduction instructions notes phrases or dialogs ( depends on module type) Phrase s F or modules containing phrase s Elements phrase (zero or more phrases in this module) Dialog F or modules containing dialog s. Each dialog contains phrases Attributes

PAGE 3

title Elements phrase (zero or more phrases in this dialog) Phrase Each phrase data structure contains: Attributes OID (object identifier) term (term used to refer to phrase in this language) original phrase (should be same as term) annotated phrase (phrase with annotation marks) audio ( file name of audio recording or this phrase mp3 ) image ( file name of image associated with th is phrase jpg or gif ) origin volume (notebook volume where phrase is located) origin page (page in notebook volume where phrase is located or started ) speaker (identifies the person who spoke the phrase) que string ( used to prompt students if phrase is a question, simple string) que class oid (used t o prompt student if phrase is a question, refers to a class) que instance oid (used to prompt student if phrase is a question, refers is an instance) cond answer) constraints (string notation placing constraints on relationships among entities in the phrase) Elements notes (rich text providing notes about this phrase) words (sequence of Word appearing in this phrase) Word Each word can contain: Attributes OID (object identifier) term (term used to refer to word in this language) original word (should be same as term) image (file name of image associated with this word)

PAGE 4

Elements gloss allomorphs (word is decomposed into a sequence of zero or more allomorph s ) Allomorph Each allomorph can contain: Attributes OID (object identifier) term (term used to refer to word in this language) original allomorph (should be same as term) base morpheme OID (object identifier of base morpheme or this allomorph) base morpheme term (term for base morpheme in this language) image (file name of image associated with this allmorph) pos (part of s peech see below) mcrule (morphological conditioning rule see below) Note that image, type, pos, and mcrule are associated with and are unique to the base morpheme, but repeated in each allomorph description that uses the base morpheme. Elements suballomorphs (allomorph is decomposed into a sequence of zero or more allomorph s ) Root POS ambivalent verb shape verb language verb particle quantity particle greeting particle negative particle noun locational personal pronoun interrogative pronoun deictic pronoun toponym people name

PAGE 5

time number syntactic linker proroot un known Suffix POS nominal possessive nominal locational nominal directional nominal verbal derivation verbal stem derivation verbal person derivation motion modifier derivation verbal inflectional principal clause inflectional subordinate clause inflectional temporal subordinate inflectional nominal subordinate inflectional invariables subordinate inflectional thematic sentence independent U nknown Morphological Conditioning Rules cc cv vc vv c v vSS nSS n null



PAGE 1

Jaqi Language Archive XML Documentation V1.1 University of Florida April, 2011 This document provides a description of the XML files used to archive the Jaqi language database. The Jaqi language database and archive consists of three languages: Ayma ra, Jaqaru, and Kawki. Archival resources for these languages includes XML representations of all linguistic data structures that were created for these languages, including audio (mp3) and image (jpg,gif) files. The data were ori ginally stored in a dat abase and were all exported into the XML files described in this document for archival purposes. It is not necessary to use the original database s of tware to utilize this archive. T he data are archived in such a way that they can be imported into any data base by someone familiar with XML technology. The data in this archive are arranged in a tree structure. In terms of the terminology used below, each chapters and e ach chapter contains modules. A chapter reflects a logical grouping of information. For example, the Aymara database was originally developed for teaching the Aymara language, and consists of chapters corresponding to original lesson plans. The Jaqaru a nd Kawki chapter s contain logical groupings of field notebooks. The modules within a chapter could contain dialogs and exercises (Aymara) or individual dialogs recorded in field notebooks (Jaqaru and Kawki). A module can contain individual phrases, or it can contain dialogs, another subgrouping consisting of individual phrases. Each chapter in each language is contained within a single XML file named after that chapter and stored in a directory named after the language. Associated audio and image sub directories contain media for that chapter These media are referenced within the XML file in the individual data structures where they are used. A single XML Schema File (langed.xsd) defines the schema or all the XML files, and all the XML files have bee n validated successfully against this schema. Each phrase in the database consists of words, and each word can consist of two or more allomorphs. Each allomorph is associated with a base morpheme. There are data structures for each of these encoded in t he XML. The same morpheme, allomorph, word, and phrase can appear more than once in the database, and they are stored in the database as individual objects identified by a unique object identifier (OID). In the database, when one of these data structures is used in more than one place, there is a reference to the object by OID, rather than duplicate the object. These OIDs also appear in the XML. Information for each phrase, word, allomorph, or morpheme is repeated in the XML at each location where it is used. This is rather redundant, this was done for completeness. If a phrase is used in one or more location, there is some context dependent information, namely the sound recording, origin volume, origin page, and speaker, that is recorded separately for each location. This information appears in the XML in context where the particular phrase appears. That is, even if the same phrase is used more than once (thus the phrase OID appears in more than one place), this context specific information is i ncl uded at the particular occurre nce, whereas all other non content specific information about the phrase is repeated and is redundant.

PAGE 2

Chapter Each chapter is referenced by an element in the project instance. Each chapter consists of : Attributes title El ements label objectives outline cultural notes resources modules (zero or more modules in this dialog) Module Module can be phrase or dialog module. Each module contains : Attributes title OID type (phrase or dialog) Elements label introduction inst ructions notes phrases or dialogs ( depends on module type) Phrase s F or modules containing phrase s Elements phrase (zero or more phrases in this module) Dialog F or modules containing dialog s. Each dialog contains phrases Attributes

PAGE 3

title Element s phrase (zero or more phrases in this dialog) Phrase Each phrase data structure contains: Attributes OID (object identifier) term (term used to refer to phrase in this language) original phrase (should be same as term) annotated phrase (phrase with a nnotation marks) audio ( file name of audio recording or this phrase mp3 ) image ( file name of image associated with this phrase jpg or gif ) origin volume (notebook volume where phrase is located) origin page (page in notebook volume where phrase is loca ted or started ) speaker (identifies the person who spoke the phrase) que string (used to prompt students if phrase is a question, simple string) que class oid (used to prompt student if phrase is a question, refers to a class) que instance oid (used to pro mpt student if phrase is a question, refers is an instance) constraints (string notation placing constraints on relationships among entities in the p hrase) Elements notes (rich text providing notes about this phrase) words (sequence of Word appearing in this phrase) Word Each word can contain: Attributes OID (object identifier) term (term used to refer to wor d in this language) original word (should be same as term) image (file name of image associated with this word)

PAGE 4

Elements allomorphs (word is decomposed into a sequence of zero or more allomorph s ) Allomorph Each a llomorph can contain: Attributes OID (object identifier) term (term used to refer to word in this language) original allomorph (should be same as term) base morpheme OID (object identifier of base morpheme or this allomorph) base morpheme term (term for base morpheme in this language) image (file name of image associated with this allmorph) pos (part of speech see below) mcrule (morphological conditioning rule see below) Note that image, type, pos, and mcrule are associated with and are unique to the base morpheme, but repeated in each allomorph description that uses the base morpheme. Elements suballomorphs (allomorph is decomposed into a sequence of zero or more allomorph s ) Root POS ambivalent verb shape verb language verb particle quantity particle greeting particle negative particle noun locational personal pronoun interrogative pronoun deictic pronoun toponym people name

PAGE 5

time number syntactic linker proroot un known Suffix POS n ominal possessive nominal locational nominal directional nominal verbal derivation verbal stem derivation verbal person derivation motion modifier derivation verbal inflectional principal clause inflectional subordinate clause inflectional temporal subordi nate inflectional nominal subordinate inflectional invariables subordinate inflectional thematic sentence independent U nknown Morphological Conditioning Rules cc cv T he suffix requires a preceding C when final in the word. When not the final sufix then it requires a preceding vowel. Example: suffix v c T he suffix require s a preceding V when final in the word. When not the final suf f ix then it requires a preceding C. Example: suffix vv c v

PAGE 6

vSS nSS n null