Title: The Parker elephant data sheets: A library mini-grant project proposal
Creator: Reboussin, Daniel
Norton, Hannah
Publisher: George A. Smathers Libraries, University of Florida
Place of Publication: Gainesville, FL
Publication Date: 2013
Abstract: We will 1) provide convenient open access to a unique, historic and scientifically significant data set; and 2) demonstrate and prepare the libraries for future projects that preserve and curate data collections for open access. Data from a recently digitized collection of field data sheets collected in the 1960s will be transcribed into machine-readable formats (spreadsheet, comma delimited file, etc.) to facilitate easy online discoverability, examination, and analysis by students, researchers, and practitioners, for example in conservation, biology, zoos, and veterinary medicine. We’ll employ the free, web-based REDCap (Research Electronic Data Capture) application to facilitate data entry and control quality.
General Note: Submitted May 1, 2013. Awarded June 21, 2013.
Rights Management:
This item is licensed with the Creative Commons Attribution Share Alike License. This license lets others remix, tweak, and build upon this work even for commercial reasons, as long as they credit the author and license their new creations under the identical terms.
1 I. Project narrative Project description The primary goa l of the Elephant Data Sheets project (part of the Ian Parker Collection of East African Wildlife Conservation ) is to make a unique and significant data set conveniently accessible and d irectly useable for analysis by ( for example ) biology students, wildlife conservation researchers, and veterinary practitioners. T hese scientific ally valuable primary resource s are already available as open access images online T heir value for teaching and research will be increased enormously however, when they are transcribed to a machine readable data file. Th is project will signal the availability of the se materials, encourage their use in academics and among practitioners and will provide a level of access that is certain to make th e data set a popular resource. P rint sources can be converted successfully to machine readable format with Optical Chara cter Recognition (OCR) software. Manuscript records in contrast, are not candidates for reliable OCR analysis. H istorical scientific data projects often opt for manual transcripti on to assure accurate results from handwriting. S ome employ innovative crowd sourcing software to leverage project resources against large data sets ( e.g., see Brumfield 2013 ) T h e Elephant Data Sheets are too numerous and complex for traditional manual transcription, but too small a project to be appropriate for the softwa re and management overhead of crowd sourc ing T he team instead will organiz e and employ student transcri ber s with access to REDCap : free ly available intuitive online software that will simplify and validate data input assist in manag ing project workflow maintain data integrity, and output results to University of Florida Digital Collections ( UFDC ) ( see Harris et al 2009; Lyon 2010 ; Lyon et al 20 11 ) This approach allows transcriptions to be completed and reviewed online. T h e approach is supported by Mark Sullivan and Laurie Taylor, both l ibr ar y information technology and digital collections experts ; team member Norton has experience using the software in a previous transcription project ( Lyon 2010 ) ; and technical support is available at no charge from the UF REDCap support team Once the data transcription s are complete and certified as accurate, they will be output in appropriate format s and ingested into UFDC collections for long term preservation and convenient, online, open access. Significance The University of Florida (UF) provides an exceptional educational environment that includes programs in African Studies, economic development, and wildlife conservation. These broad based graduate programs offer an exceptional opportunity for students and faculty to apply their knowledge of biology, cultures, environment, languages, and politics to the community management of wildlife conservatio n in Africa. Now a firmly established principle worldwide, community based conservation first was applied in 1960 by the Galana Game Management Scheme in Kenya (Parker 1964) and expanded by the Communal Areas Management Programme for Indigenous Resources ( CAMPFIRE) in Zimbabwe during the 1980s (Brosius et al 2005:116). Recent donations of scarce, rare, and unique materials relating to these historic waypoints in wildlife conservation practice strengthen the applied work in exciting ways. The African Studies Collections at the George A. Smathers Libraries are in the final stage of processing the Ian Parker Collection of East African Wildlife Conse rvation to be open for research this summer. Renowned as a key figure in establishing the international ban on trade in ivory, Parker implemented the Galana plan as a game warden in colonial Kenya. His papers document the history of East African wildlife conservation during the 20 th century. Similarly for CAMPFIRE and its precedents in Southern Africa, the Graham and Brian Child African Wildlife and Range Management Collection offers a broad scope of related materials in environmental conservation. proposal (see Appendix). Along with the Records of the East African Professional Hunters Association


2 (EAPHA) the Parker and Child collections represent what may be the largest set of research resources available anywhere on the history of African wildlife conservation. Elephant Data Sheets online) are accessible only to this set of field data compiled during culling (herd thinning) operations intended to mitigate elephant overpopulation at environmentally stressed sites from 1965 to 1969. Culls organized at national parks in Uganda, Kenya, and Tanzania relied on skilled hunte rs to kill family groups humanely within a minute, on a scale that current law prohibits ( Sheldrick 2012:172; Martin 2012 :144; Daston and Mitman 2006: 180,194 ). Taking advantage of this unique opportunity, body and organ measurements, age estimates, reproductive status, and disease observations were collected post mortem and recorded on 3,175 data sheets ( Laws et al 1975:129 131; 348 351) now housed in the manuscript collections. The Elephant Data Sheets are unique in several ways: the large number of individuals is unlikely to be reproduced; the sampling represents natural (albeit environmentally stressed) family g roups rather than trophies or weak individuals; and until now the records have been unavailable to the public. They have been curated for preservation and physical access as manuscripts (accessible via finding aid) as well as digitized for online open acce ss in UFDC. The data sheet images are currently available online funded by the The attached endorsement letters assert the inherent importance of these data from a scientific perspective and indicate that they will attract interest and re cogni tion across diverse disciplines. Providing convenient, state of the art, fully open access to these data will create new educational and research opportunities in support of academic programs such as wildlife conservation, mammalian biology, and zoo and wildlife veterinary medicine in at least four major STEM units on the UF campus : the Colleges of Agricultu ral and Life Sciences, Liberal Arts and Sciences, and Veterinary Medicine, along with the Fl orida Museum of Natural History Direct convenient, online access to this newly available, unique, empirical data source will encourage analyses that were not cons idered i n the original publication ( Laws et a l. 1975) and create an opportunit y for reanalysis of the published data. A cademic libraries are keen to support digital data set curation services (see Bail ey 2013 ) as these represent a newly recognized and rapidly growing area of need for universities and researchers that libraries are well suited to support (though few are fully prepared to do so, given current funding and technical resources available). T he data sheets project will encourage the development of data curation supports both during and after the grant period, provok ing institutional learning to plan and provide appropriate support for the curation and preservation of more and larger data sets in libraries. As an example, these data and their associated image files in UFDC offer a real world tes t bed for the development of a database module in SOBEK/CM ( Sullivan 2013), with potential features such as guided queries, display of results in clusters that may be examined together, and the mapping of linked case numbers by location a s recorded on the sheets. While such dev elopments are outside the scope of this project, team member Norton is a member of the Libraries' Data Management/Curation Task Force and can serve as an important conduit for information and lessons about our team's experience to task force colleagues. T h e Elephant Data Sheets project will further prove the value of the REDCap data entry and management application for the Libraries as a flexible and effective model for future innovative data curation projects. Each time the Libraries significantly activate data curation principles and resources we build institutional capacities and demonstrate interest and capabilities in this new field of activity. Doing so with a c ollaborati ve data project such as this one further establish es these capacities across the institution


3 F inally, the field data sheet images and machine readable numeric data will be curated together as digital objects, with the online finding aids for related manuscript originals. P resenting these within a broad scholarly context that includes backgrou nd and supplementary materials (e.g. scattered additional sheets of summary tabular data, brief analys e s, field notes, etc.), as well as with appropriate metadata and links to published contextual materials, will support intellectual access through the application of Search Engine Optimization ( SEO ) techniques (see Reboussin 2012 ) The Parker Child and EAPHA manuscripts will benefit by association as the elephant data set becomes more widely known and used online, making these related collections much more easily discoverable by general online searches. Project team members anticipate continuing related digit ization and SEO work during the post grant period for the purpose of scholarly dissemination of results and networking, including online marketing to improve the discoverability of this data set and related manuscript collection s Similar projects W ithout either prior knowledge of the specific work or the benefit of SEO techniques appli ed by project creators o ne can easily overlook small similar projects due to the limits of online discoverability Th e project team considers the closest analogy to our effort (with regard to its modest scale and intended outcomes ) to be the transcription of historic scientific data collected for Franz th century anti racist Immigrant Study ( Gravlee et al. 2003 ). A search of the UF web site reveal s Gravlee transcri ption of 100 year old handwritten notebooks recording anthropometric s (e.g. head circumferences) reanalyzing based on the availability of new statistical method s. L arger scale projects include the crowd sourced transcription of herbarium type specimen data described by Thiers ( 2005 ) and the Old Weather Project a lso using crowd sourcing technology to transcri be 19 th century arctic weather observations ( NOAA 2012 ). T he Digital Ble ek and Lloyd ( Skotnes 2007 ) transcribes early linguistic work on South African languages, while a popular project renders 19 th century menus available for study online ( NYPL Labs 2011 ) The long term task of transcribing Darwin professional correspondence ( Burkhardt et al. 1985 ) is on another scale all together T hese diverse efforts fit well within scholarly practice, adding value t o library resources by improving access and making them more conveniently useful to end users. Resources required plan of activities sustainability The project team request s OPS fund s to hire student transcribers who will create high quality accurate data files as confirmed by a certification process The data sheet images are reproduced from hand filled typewritten forms completed under difficult field conditions so the text can be challenging to read. The project team require s a data quality control (QC) assis tant who can monitor, review, and ( potentially ) correct student transcribers' data entries. The QC assistant also will be hired and paid from OPS funds These two funds are our only an ticipated project expenditures. Project activities will begin with the preparation of data input templates in REDCap (see : A ctivity timeline below ) The PI first will create a list of recorded variables compiled from a s urvey of the data sheets (variables do not appear consistently on all of the sheet s ) The PI will n ote units of measurement and full ranges of possible values. Similarly, the PI will c reate a list of terms found in free text observation notes ( e.g., common and unfamiliar place names, veterinary and disease terms abbreviations ) Team members will train concurrent ly in the use of REDCap software ( with online video s and tutorials provided by the UF support office ). Training includes p repar ing test templates in REDCap for review by the tea m. Following up, a t est will provide average time s transcribers need to complete a template using a sample set of data sheets, and then create a final template in REDCap with the goal of effectively guiding and controlling transcription for better accuracy. The next phase will focus on data quality control and creat e criter ia for rating ( certifying ) transcriber output In the Fall Semester, the PI will r ecruit student transcriber s and begin training them The PI will rate samples of work submitted by volunteers using the criteria develop ed Students will be hired based on the quality of their initial submissions and upon library employment guidelines. At the same time the PI will r ecruit


4 and train a QC assistant and r eview transcription rating decisions with the assistant In the final phase of the project, the team will f inalize rating decisions made by the QC assistant, based on our criteria and judgments Following a final review of the input data (with notes on further work needed, if necessary) the PI will load the transcribed data to UFDC Follow up (post grant period) will include the creat ion of a landing page in UFDC for the Parker collection according to SEO principles and finally the team will disseminat e our experiences and results in p rofessional presentations and publications Transcription clean up may continue beyond the grant period guided by the assigned quality ratings Activity timeline Date Personnel Time Activity 7/13 DR 12 hrs. Com pile list of recorded variables 7/13 DR 12 hrs. Create list of vocabulary terms found in free text notes 7/13 Team 0 6 hrs. Team members train in use of REDCap software 7/13 T eam 0 8 hrs. Prepare test template(s) in RE DCap for review by project team 8/13 HN 0 6 hrs. Create final template in REDCap 8/13 HN 0 6 hrs. Test avg. time to complete t emplate from sample data sheets 8/13 T eam 0 2 hrs. Plan data QC; finalize judgin g criteria for incentive awards 9/13 DR 0 2 hrs. Recruit student transcribers 9 11/13 DR 24 hrs. Manage and provide training to support student transcribers 10/13 DR 0 2 hrs. R ecruit QC assistant 10/13 DR 0 6 hrs. Train QC assistant 12/13 4/14 T eam 0 4 hrs Hire eligible student transcribers ba sed on rated output 4/14 HN 0 8 hrs. Review data transcription rating decisions with QC reviewer 5/14 T eam 0 2 hrs. Finalize QC ratings and certification 5/14 DR 0 1 hr. Manage payroll for student transcribers 5/14 DR 0 1 hr. Manage payroll for QC assistant(s) 6/14 DR 0 2 hrs. Load data to UFDC, prepare report to GMC Post grant Team -SEO, marketing, dissemination Student transcribers and QC assistant s will provide their own (or use existing UF) incidental supplies, space and computing equipment for online access to UFDC images, and their own GatorLink accounts for access to their REDCap accounts and templates online. No eq uipment or supplies will be purchased with mini grant funds. The data file output to UFDC will have the standard impacts of a small, low bandwidth file and requires no special attention. Supporting its use will require a continuing public service commitmen t, but the team ha s excellent documentation and scholarly context in the related manuscript collections. Reboussin will provide public service support as part of his normal activities. Permissions and measures of success Non exclusive p ermission to distribute the data online has been secured. S ee attached letter of endorsement f r om Ian Parker, owner of the elephant data sheets, and donor of the manuscript collection that included the m We will complete the bulk of the cases striv ing for accurate transcription s D ue to the challenging legibility of some parts of many data sheets, w e e xpect to have a number of imperfect transcriptions that may not be corrected with confidence during the grant period. O ne of the benefit s o f using the REDCap software is that we will have a tested system in place that we can use to complete and refine the toughe st cases We will also provide a list of links back to the original sheet images for records where the data are undetermined by the end of project.


5 II. Budget narrative The project team requests the allocation of a $5,000 budget to hire student transcribers and a quality control (QC) assistant with OPS funds. S tudent transcribers who demonstrate following training that they can create accurate, error free data records (a key value for the project) will be hired if eligible for library employment. All students who wish to voluntarily participate in the project and wh ose contributions meet the criteria for quality transcriptions will be allowed to participate in the project. A brief volunteer phase for those students who wish to be hired will be used to train, coordinate the application process, and screen for quality. We anticipate hiring a maximum of 32 student transcribers to work at $10.00 per hour for up to ten hours each at a total cost of $3,200. Student employees with the assigned goal of completing 100 records will be expected to do so in 10 hours, assuming an average of 6 minutes to complete each record (many of which, as one can see online include only six or eight completed fields). The estimates will be modified as training and testing progresses. If we hire fewer students, the project team w ill assign some to work for more hours (and expect them to complete more records at this rate unless their assignment is to tackle more complex and difficult records ). Students will record their own hours and online activities ( such as login time, REDCap activities, and transcription output) will be monitored as confirmation before work hours are approved. The pro ject team requests a further allocation of $1,800 to fund one or more QC assistant hires. A single QC assistant would be paid $10.00 per hour for up to 180 hours (the equivalent of four and a half 40 hour work weeks, spread over much of the grant period as determined by student transcriber output). If multiple QC assistants are hired, available hours will be divided as determined by the PI. The QC assistants will review, evaluate, and certify quality record output in REDCap. This may be implemented as a t riage system to flag records that need attention, for example, identifying text interpretation and handwriting legibility problems versus numerical transcription errors. The volunteer phase of training and screening, or perhaps working as a student transcr iber, is expected to provide invaluable information for the project team to use in selecting the best available QC assistants. Based on our estimates, which may need to be modified once the work has begun, a QC assistant assigned to review 3, 175 transcrip tions within the budgeted time of 180 hours, would have to proceed at the average rate of 17. 6 records per hour. On average, the student employee would have about 3:24 minutes to compare the numerical and text data in each transcribed record against the di gitized images online at UFDC, and then to certify each transcription (e.g. by checking a box on the form). PI Reboussin's contributed time will average 2 hours per week ( .0 5 FTE), with team member Norton averaging 1 hour per week (for .0 2 5 FTE) over the 52 weeks x 2 hours (less leave time at 20%), or 84 hours contributed by the PI during the grant period. he grant. Total hours to be contributed by all project team members will be 126 hours. Individual time to be contributed for specific tasks is summarized in the above Activity timeline counted toward the contributions of each member


6 Mini Grant Budget Form 2013 2014 1. Salaries and Wages (no fringe benefits required) Name of Person Salary times % of effort Grant Funds Cost Share Total Dan Reboussin 76,149 x .5 FTE $0.00 $3,807.00 $3,807.00 Hannah Norton 57,585 x 2.5 FTE $0.00 $1,440.00 $1,440.00 OPS (student transcribers) $ 3,200 .00 $0.00 $ 3,200 .00 OPS (QC assistants) $ 1,80 0.00 $0.00 $ 1,80 0.00 SUBTOTAL $ 5,00 0.00 $5,247.00 $ 10 ,247.00 2. Equipment Item Quantity times Cost Grant Funds Cost Share Total $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 SUBTOTAL $0.00 $0.00 $0.00 3. Supplies Item Quantity times Cost Grant Funds Cost Share Total $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 SUBTOTAL $0.00 $0.00 $0.00 4. Travel From/To # of people/# of days Grant Funds Cost Share Total $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 SUBTOTAL $0.00 $0.00 $0.00 5. Other (Vendor costs, etc. Provide detail in Budget Narrative section.) Item Quantity times cost Grant Funds Cost Share Total $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 SUBTOTAL $0.00 $0.00 $0.00 Grant Funds Cost Share Total Total Direct Costs (add subtotals of items 1 5) $5,000.00 $5,247.00 $10,247.00


7 References cited Bailey, Jr., Charles W. 2013. Research Data Curation Bibliography Version 2: Jan. 14. Available online: http://digital scholarship.org/rdcb/ Houston, TX: Digital Scholarship. Brosius, J. Peter, Anna Lowenhaupt Tsing, and Charles Zerner (eds.). 2005. Communities and c onservation: Histories and politics of community based natural resource management Walnut Creek, Calif.: AltaMira Press. Itinera Nova Collaborative Manuscript Transcription [April 29 blo g entry on crowd sourced transcription software]. Available online: http://manuscripttranscription.blogspot.com/2013/04/itinera nova in worlds of crowdsourcing.html Burkhardt, F., Sydney Smith, David Kohn, William Montgomery. 1985 The correspondence of Charles Darwin New York: Cambridge University Press. [Data files available online with 4 year embargo at: http://www.darwinproject.ac.uk ] Daston, Lorraine and Gregg Mitman. 2006. Thinking with Animals: New Perspectives on Anthropomorphism New York: Columbia University Press. pp. 180,194. Gravlee Clarence C., H. cranial form: a re American Anthropologist 105(1):125 138. [Data files available online: http://www.gravlee.org/research/boas/data/ ]. Harris, Paul A., Robert Taylor, Robert Thielke, Jonathon Payne, Nathaniel Gonzalez, and Jose G. A metadata driven methodology and workflow process f Journal of biomedical informatics 42(2):377 381. Laws, Richard M., I. S. C. Parker, and Ronald C. B. Johnstone. 1975. Elephants and their habitats: The ecology of elephants in North Bunyoro, Uganda Oxford, UK: Clarendon Press. [Libraries' mini grant proposal]. Available online: http://ufdc.ufl.edu/UF00103191 ----Evidence Based Library and Information Practice conference, Sheffield, U.K., June 28. Martin, Glen. Game Changer: Animal Rights and the Fate of Africa's Wildlife Berkeley Calif. : University of California Press, 2012. [Ebook available online: http://uf.catalog.fcla.edu/forward.jsp?ig=u f.jsp&type=link&link=http%3A%2F%2Flib.myilibrary.co m%2Fdetail.asp%3FID%3D352087 ] scientists to reconstruct historical climate National Oceanic and Atmospheric Admin istration. October 24 press release available online: http://www.noaanews.noaa.gov/stories2012/20121022_oldweatherprojectlaunch.html sourced historic menu transcriptions]. New York: New York Public Libraries Available online: http://menus.nypl.org/


8 Journal of Epizoot ic Disease of East Africa 12: 21 31. Presentation to SCOLMA 50th Anniversary Conference: Dis/connects: African Studies in the Digital Age Oxford, UK: SCOLMA. Avail able online: http://ufdc.ufl.edu/AA00011385 OH: Jacana Media and Ohio University Press. [See: http://lloydbleekcollection.cs.uct.ac.za/ ]. Sullivan Mark. 2013. Personal communication (telephone consultation, Apr. 8). Head of Digital Development and Web Unit, Information Technology, University of Florida Libraries. Thiers, Barbara. 2005. Best Practices Guide New York: The New York Botanical Garden Virtual Herbarium. Available online: http://sciweb.nybg.org/science2/hcol/mtsc/NYBG_Best_Practices.doc


9 Appendices Letter s