Citation
dLOC as Data: A Thematic Approach to Caribbean Newspapers

Material Information

Title:
dLOC as Data: A Thematic Approach to Caribbean Newspapers
Creator:
Collins, Perry
St. Hubert, Hadassah
Rogers, Jamie
Asencio, Miguel
Bakker, Rebecca
Castro, Molly
Guan, Boyuan
Krefft, Jill
Dinsmore, Chelsea
Perry, Laura
Taylor, Laurie
Publication Date:
Language:
English
Physical Description:
Grant proposal

Notes

Abstract:
Digital Library of the Caribbean (dLOC) intends to enhance access to its existing Caribbean newspaper collections by making texts available for bulk download to its users. This will facilitate modes of scholarship that depend on access to image and textual data at scale and will enable a new level of access to titles not included in newspaper data resources such as Chronicling America. To meet the needs of the dLOC community for teaching and research, we will demonstrate the potential of newspaper data by creating a pilot thematic tool kit focused on hurricanes and tropical cyclones. The toolkit will provide multilingual datasets focused on these disasters from several countries and islands in the Caribbean, such as the Bahamas, Belize, Cuba, the Dominican Republic, Grenada, Haiti, Jamaica, and Martinique.
Acquisition:
Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Perry Collins.
General Note:
Application to the Collections as Data: Part to Whole (https://collectionsasdata.github.io/part2whole/) program, funded by the Andrew W. Mellon Foundation and administered by the University of Nevada-Las Vegas.

Record Information

Source Institution:
University of Florida Institutional Repository
Holding Location:
University of Florida
Rights Management:
This item is licensed with the Creative Commons Attribution Non-Commerical License. This license lets others remix, tweak, and build upon this work non-commercially, and although their new works must also acknowledge the author and be non-commercial, they don’t have to license their derivative works on the same terms.

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

(1) dLOC as Data: A Thematic Approach to Caribbean Newspapers (2) List of team members, titles, and roles on the project Project Leads Miguel Asencio, Executive Director, Digital Library of the Caribbean (dLOC), Florida International University; Senior administrative lead. Jamie Rogers, Assistant Director of Digital Collections, Florida International University; Senior administrative lead. Perry Collins, Scholarly Communications Librarian, University of Florida; Project lead Hadassah S t. Hubert, CLIR Postdoctoral Fellow in Data Curation for Latin American and Caribbean Studies, Florida International University; Scholarly lead Partners Florida International University Libraries (FIU) The following participants will focus on OCR evaluati on, geo location, named entity extraction, text analysis and dataset preparation, and developing a preservation solution. Rebecca Bakker, Digital Collections Librarian Molly Castro, Digital Humanities Librarian Boyuan Guan, Lead Developer, GIS Center Jill Krefft, Institutional Repository Coordinator University of Florida Libraries (UF) The following participants will focus on preparing newspaper data for distribution and analysis. Chelsea Dinsmore, Director, Digital Support Services Laura Perry, Digital Pr oduction Manager, Digital Support Services Laurie Taylor, Chair, Digital Partnerships & Strategies Caribbean Data Curation Graduate Intern (see Appendix B for position description) Advisory Committee The following participants will advise on corpus develo pment and dissemination, data documentation, and outreach activities such as local community training events and edit a thons. Julio Capo Jr., Associate Professor of History, Florida International University Fletcher Durant, Head of Conservation and Preser vation, University of Florida Alex Gil, Digital Humanities Librarian, Columbia University Melissa Jerome, Project Coordinator for the Florida & Puerto Rico Digital Newspaper Project, University of Florida Amalia Levi, Archivist and Cultural Heritage Professional, HeritEdge Connection in Barbados Preeya Mohan, Fellow, Sir Arthur Lewis Institute of Social and Economic Studies, University of the West Indies, St. Augustine Leah Rosenberg, Professor of English, University of Florida

PAGE 2

(3) Investigator Bios Miguel Asencio is the Director of the Digital Library of the Caribbean (dLOC). He has trained teams of technicians and staff on digitization projects, ensured that quality control and environmental sta ndards for production and facilities were met. Asencio is a digitization specialist well versed in preservation and archiving standards in the United States (FADGI) and in Europe (Metamorfoze). His advanced degrees, both completed and in progress, have ena bled him to create instructional material for use online. His interest in Curriculum and Instruction: Learning Technologies led him to develop and implement K 12 and post secondary outreach programs, which include developing thematic collections designed t o increase the use of the digital library for teaching and research. He has led numerous digitization, preservation and archiving trainings in the United States and abroad. In his capacity as Director of the Digital Library of the Caribbean (dLOC), Asencio has worked closely with partners in the United States and across the Caribbean to foster productive and beneficial collaborative relationships. In this role he assesses partner needs and finds solutions to issues that are often unique to organizations ope rating in the Caribbean and Latin America. Jamie Rogers is the Assistant Director of Digital Collections at Florida International University (FIU). In this capacity, she leads the digital production, digital scholarship, data management strategies, and pr eservation for internally and externally funded digital initiatives in collaboration with the FIU community, as well as local partners, including municipalities, cultural institutions, government agencies, and scientific organizations. She has curated and managed over 100 digitized special collections and institutional repository collections, which are accessed an average of 9 million times per year. Since 2009, she has served as PI and Co PI for thirteen successful grant and local partner initiatives, incl uding projects sponsored by the Institute of Museum and Library Services and the Society of American Archivists, amounting to over $1.5 million in funding. She holds a M.S. in Management of Information Systems from Florida International University. Perry Collins is the Scholarly Communications Librarian at the University of Florida in Gainesville, where she manages initiatives promoting open access in scholarship and education, copyright literacy, ethical approaches to digital scholarship, and capacity bui lding for born digital library publishing. Before joining UF in 2018, Collins held a similar position at the Ball State University Libraries in Muncie, Indiana, and worked for six years as a program officer in the Office of Digital Humanities at the Nation al Endowment for the Humanities. While at the NEH, Collins played a major role in administering the grant review process and shaping funding programs at the intersection of technology and the humanities. She also co managed the NEH Mellon Humanities Open B ook Program, an effort to digitize out of print scholarly monographs and disseminate them under open licenses. Collins holds a M.L.I.S. from the University of Illinois at Urbana Champaign and M.A. in American Studies from the University of Kansas. Hadassa h St. Hubert, Ph.D. is currently the CLIR Postdoctoral Fellow in Data Curation for Latin American and Cari bbean Studies with the Digital Library of the Caribbean ( dLOC ) at Florida

PAGE 3

International University. She received a Ph.D. in History from the University of Miami and her dissertation, Visions of a Modern Nation: Haiti , focuses on Haiti's participation for Haiti: An Island Luminous , a tri lingual digital humanities site dedicated entirely to Haitian history and Haitian studies. An Island Luminous pairs books, manuscripts, newspapers, and photos digitized by libraries and archives in Haiti and the United S tates with commentary by more than 100 authors at 75 universities around the world. As a Postdoctoral Fellow with dLOC, she leads Sauvegarde du Patrimoine National (I SPAN). In this cooperative project, she has provided training and expert technical assistance to ISPAN in its digitization efforts. In addition, she has secured over $500,000 in funding for dLOC partners for various digitization projects. (4) Summary of Project Digital Library of the Caribbean (dLOC) intends to enhance access to its existing Caribbean newspaper collections by making texts available for bulk download to its users. This will facilitate modes of scholarship that depend on access to image and textual data at scale and will enable a new level of access to titles not included in newspaper data resources such as Chronicling America. To meet the needs of the dLOC community for teaching and research, we will demonstrate the potential of newspaper d ata by creating a pilot thematic tool kit focused on hurricanes and tropical cyclones. The toolkit will provide multilingual datasets focused on these disasters from several countries and islands in the Caribbean, such as the Bahamas, Belize, Cuba, the Dom inican Republic, Grenada, Haiti, Jamaica, and Martinique. The dataset collection from these newspapers have coverage from different periods of time and can provide scholars with insights into Caribbean culture and society as well as the role of resiliency within disasters. (5) Project Rationale and Statement of Significance Rationale & User Communities Digital Library of the Caribbean (dLOC ) is a multi institutional, international digital library that has worked on data curation and digitization projects with archives and libraries across the Caribbean. hile providing access to scholars and students around the world. Scholars, practitioners, and students engage with dLOC not only as an access point for digital objects, but also as a shared node that supports public scholarship and pedagogy. As dLOC collec tions have grown to include almost 4 million pages and over 75 institutional partners, there is an immediate need to facilitate computational analysis to enable new modes of storytelling and collaboration. Administered by Florida International University (FIU) in partnership with the University of the Virgin Islands (UVI), dLOC's online technical infrastructure is currently provided by the University of Florida (UF). The Caribbean Newspapers subcollection makes up about 25 percent of the total number of p ages in dLOC, with titles published in 21 countries in nine distinct languages, dating from 1783 to 2019. dLOC has the largest digital collection of Caribbean Newspapers available

PAGE 4

online in a single platform. UF already contributes to Chronicling America a nd has received NEH funding to make available newspapers from Puerto Rico and the Virgin Islands. In addition, several Endangered Archives Programme grants to dLOC partners have made dozens of newspapers from various countries in the Caribbean available to users. We propose an initiative that will complement these projects and focus on Caribbean newspapers as a broad data source that lends itself to a thematic approach. The project will undertake three related goals: 1. Enhance access to existing, previously digitized Caribbean newspaper collections by making titles available for bulk download and by better documenting available methods for collecting batch files. The project will focus on curating Caribbean newspaper dat a as a crucial source of political, social, and economic history across the region, with opportunities to home in on the lived experiences of individuals and communities over time. (See ean Newspapers site already acts as a robust counterpart to resources such as Chronicling America and Europeana Newspapers. However, it does not offer the same level of access for those pursuing computational analysis of newspaper page images or textual da ta. These files are currently available externally for dLOC newspapers only via web scraping at the page level; this project will refine a workflow for extracting, packaging, and documenting assets to enable simpler access for users with a range of technic access data repository, the collections data will be cross searchable, discoverable, and harvested to multiple access points including DataCite and Google/Google Data, supporting a broad audience. 2. Develop a pilot thematic toolkit that showcases the potential for computational analysis of newspapers as a lens onto the history of hurricanes and tropical cyclones and their impact on the region. This toolkit will include a relevant subset of the underlying textual data; one or more structured datasets derived from the original data; and a descriptive document or finding aid with information about data assets and examples of how they might be used. In this case, we plan to extract portions of text referencing hurricanes and trop ical storms. Building upon this foundational dataset, we will experiment with both named entity recognition and manual techniques in order to develop a linked data model that establishes relationships between identifiable storms and specific people and loc ations. While we envision a series of thematic toolkits to be developed in the future, during the grant period we will specifically focus on coverage of hurricanes in Caribbean newspapers. 3. Finally, we will emphasize local capacity building and community engagement throughout the project and beyond the grant period through our faculty and teacher trainings. In the applicant institutions as well as partner nodes across the dLOC member community that are

PAGE 5

creating, contributing, and reusing materials but may lack adequate support. This will include development of a long term outreach strategy around these particular data resources and the impact of opening up dLOC data to new kinds of analysis. We will emphasize training opportunities both for our core project team and for others in the dLOC community by supporting local community events. in mind. A relatively small community of dLOC users has both the subject expertise and technical knowledge or programming expertise to process available data and undertake large scale text or image analysis. These users are more comfortable starting with less structured data from a variety of sources to conduct computationally intensive analyses, and they can more easily identify external data sources to augment and enhance dLOC collections (e.g. historical hurricane data alongside historical newspaper rep orts). This community is most likely to seek access to page images or text files at scale as an initial starting point for research. collecting institutions are e ager for a lower barrier model that would package and interpret structured data as a starting point for scholarly or classroom use. These users are comfortable experimenting with plug and play software (e.g. Timeline JS, Palladio, Neatline) and have a work ing knowledge of how to identify appropriate datasets for such tools. The needs of this community drive our decision to develop thematic toolkits that offer curated, derived datasets that can be used with open source software without any specialized techni interpretive framework and training modules will promote understanding of how the dataset was produced as well as potential gaps or pitfalls in the data. Existing Resources & Needs Data sources: llion digitized and born digital newspaper pages offers rich coverage of topics across geographic, linguistic, and temporal boundaries. Because many newspapers in the collection were digitized ten or more years ago, OCR quality varies significantly, though Finereader for OCR processing. The project will rely on language experts on the advisory committee to document the quality of current OCR resulting from papers in French, Spanish, English, and Dutch (including some that include more than one language), as well as potentially Papiamento and Haitian Krey˜l. This project will help prioritize which newspaper titles are most in need of reprocessing. Our work with this multilingu al corpora and documentation of OCR quality will be useful for other institutions with collections from non English speaking nations. Repository infrastructure: We also intend to use this project to provide pathways and infrastructure to facilitate more collaborative work between librarians, faculty members, IT, and digital scholars at FIU and UF. Both institutions host digital collections on the SobekCM platform, an open source repository software solution developed at UF. SobekCM offers strong functiona lity

PAGE 6

for viewing and downloading newspapers and other materials at the item, issue or page level; however, it does not currently provide tools for users to download collection or title level groups of files simultaneously or to easily access OCR text files . This project will help us document potential space for this project. This will allow for branding, contextualization of the data sets, metadata, and mass downloads of the data. This makes for a more streamlined and easy user experience for researchers looking for this specific data, who may not want to sift throug h other datasets to find what they need. Expertise and training: dLOC follows a model of shared governance, decentralized digitization and distributed collection development, thus giving Caribbean institutions, and those that know the collections most in timately, an important role in the decision making and production process. Recent professional development opportunities such as the 2019 NEH ed potential collaborations and experimentation with data visualization and publication, particularly in pedagogical contexts. However, dLOC partners and affiliated researchers are often working in their own disciplinary or institutional silos as they seek ways to reuse collections, and few dLOC partners have sufficient capacity in data curation or analysis at scale. Project funding will allow not only the release of newspaper data, but more importantly focused collaboration with expert advisors and the lau nch of a community engagement effort that brings people together for events and trainings. Project funding would also allow for the participation of Boyuan Guan, the Lead Developer for the , who will collaborate on named entity extraction, geocoding, and data modeling as described further in the Draft Implementation Model component. He specializes in the development of web based databases and GIS applications, programming, use of GIS software in civil engineering project management, digital repository systems, and metadata engines. This initiative will provide new opportunities for collaboration background in transpor tation engineering and computer science may lead to unanticipated insights. capacity for text analysis and geocoding. Significance & Research Value Archival materials about hurricanes and tropical storms have become increasingly important for scholars of the Caribbean. People in the Caribbean have been coping with hurricanes and tropical storms for thousands of years. Hurricane data has been able to provide in sight into cultural, economic, social, and environmental histories of the Caribbean. With limited resources, researchers have been able to provide documentation about struggles over disaster capitalism, labor, land, and climate. This data would contribute further evidence about the reported strength of storms and about

PAGE 7

the resiliency of Caribbean people and society. Through this project, our scholars may investigate questions such as: How might we recover stories of individual people and places impacted by hurricanes across the region over time?; How has regional newspaper coverage of hurricanes over time compared with government reports?; and how might we track the locations of lesser known events where little data exists? Methodologically, the project a 2018 report on OCR challenges as we seek to establish relationships between covera ge of hurricanes, where they took place, and who was affected. One research question to be addressed with both technical and scholarly experts on the team will be the best way to identify and structure multilingual references to multiple entities associate d with a single storm system, sometimes across a large geographic region. Documentation of this undertaking will be of interest to any researchers seeking to analyze disasters, social and cultural movements, or other events across national and linguistic b oundaries. (6) Project Plan Ethical Considerations The ethical development and reuse of digital collections through a shared governance model is to day work. In this proposal, we are specifically addressi ng two areas where access to dLOC data would promote more equitable approaches to research and teaching: 1. There is a clear disparity regarding access to Caribbean newspapers; while digitized page images and in most cases searchable text are available, this project would begin to facilitate the kinds of research currently feasible with sources such as Chronicling America. This might include tracing social and reprinting networks, better understanding coverage of events with a regional impact, etc. 2. Our thematic focus on hurricanes and tropical cyclones will also address several ethical issues. While there is an ethical imperative to make more data about these disasters available to support research into Caribbean history, climate change, and ot her fields, we also acknowledge the ways in which reporting and research on disasters can omit individual and community identities or treat hurricane data as an extractable resource. Inspired by the Colored Conventions Project Principles , we will similarly seek to name specific people and places as a way to affirm their value and experience. Our collaboration with experts across disciplines will help ensure this data is contextualized thoughtfu lly. The project will also focus on modeling better practices in fostering professional growth for all team members and acknowledging all contributions. This includes an emphasis on ensuring a positive graduate student internship experience based on UF

PAGE 8

builds in supports for a living wage, resume development, and professional networking. At minimum, we will credit all who contribute over the course of the grant period and beyond on the project website, with a br ief description of their role; we may also seek to attribute more granular contributions to project data or event organization as feasible. Draft Use Model Leverage existing networks: dLOC is a long standing organization that reaches a range of distinct communities across disciplines and geographies. Members of the project team routinely attend conferences or engage with virtually connected networks of dLOC users, offering a sustainable and consistent set of opportunities to promote access to data over ti me and to continuously assess and outreach opportunities during the grant period, but outreach will continue long term. Promoting community engagement & interp retation: Building sub communities engaged with newspaper research and with hurricane or disaster studies is a crucial component in enhancing data and encouraging its use. During the latter half of the grant period, we will pilot a model to provide small s tipends to institutions willing to organize events focused on experimenting with newspaper data. While we anticipate many of these will prioritize scholars and teachers in higher education as a primary audience, they may also include students, GLAM profess ionals, and participants in local history or civic engagement initiatives. Depending on local interests, these may feature a hands on training session on using visualization software; an edit a thon to help document relationships between identifiable hurri canes, people, and places; or an opportunity to experiment with bulk data from our project alongside newspaper data from other regions. This series of events will seed efforts derstanding the impact of and responses to hurricanes at local and regional levels. To facilitate long term sustainability and to strengthen components of this work, during and immediately following the grant period we will undertake assessment in the fo rm of event evaluations and virtual town halls to get feedback on how community engagement efforts meet or do not meet specific needs. For instance, what additional documentation would make the data more useful to students or newcomers to the digital human ities? What barriers remain to conducting live events with regard to technical infrastructure or bandwidth? Ideally, we will be able to work with future with ve ry little or no funding. Some of these may be dependent specifically on hurricane focused data, while others may be adaptable to Caribbean newspaper data more broadly. Some may require external support such as software training from a dLOC community member , while others may be entirely self guided. We are confident that we will be able to engage users in an ongoing conversation to determine what will be genuinely useful; for instance, recent virtual conversations long NEH in stitute have attracted consistent participation and concrete suggestions for building capacity in the field.

PAGE 9

Miami Dade Public Schools and participation at the M iami Book Fair, provides additional long term possibilities for broadening the audience and framing components of the project from a public humanities perspective. For instance, once we have a better understanding of the data we may be able to identify and produce simple data stories to augment existing online lesson plans available through the Florida & Puerto Rican Newspapers Project. Encouraging adaptation of use model: Other users may include digital scholars, librarians, etc. interested in our approac h to user community engagement and in developing thematic data toolkits in areas outside Caribbean Studies. As laid out in the Documentation Overview below, the project will develop training materials and workflows for others seeking to replicate this mode l, with release of user personas and any materials developed for or by local community events. We are particularly interested in reaching other organizations or communities that approach digital collecting building and analysis from a networked, shared gov ernance perspective similar to event specifically targeted toward state or regional aggregations (including but not limited to DPLA hubs) and other community driven collections (e.g. South Asian American Digital Archive, Advanced Research Consortium nodes). Other model projects: We will produce a summary of other projects with outreach strategies that prioritize strong documentation and community engagement, particularly in the context of newspaper data and named entity extraction. In particular, this includes the Linked Jazz project and related Semantic Lab at the Pratt Institute , which seeks out community collaboration in enhancing relationship data; and the Viral Networks project based at Virginia Tech, which has built upon an earlier newsp aper analysis effort focused on epidemiology and sustained engagement for nearly a decade through symposia, hands on workshops, and a recent publication. Positions and duties: The project co leads and advisory committee will play a crucial role in seeking feedback and in facilitating professional development and outreach to current dLOC constituencies and to other potential user communities (e.g. digital humanists with an interest in newspaper research; public scholars developing narratives around hurrican e and disaster research). Technical experts in metadata, text analysis, and data visualization will also support use by making sure datasets are well described, discoverable, and citable, and by creating training materials directed at novice users. The pro ject co leads will also focus on ensuring that the use model is adaptable by other institutions by documenting and disseminating lessons learned -in international collaboration, project marketing, and data documentation. Positions & Duties Summary Use Focused Responsibility Primary Individual/Group Online community (web, social media, Project leads with support from all

PAGE 10

Google group) team members Conference outreach Project leads Data documentation/ thematic finding aid Project leads; FIU technic al team and advisory committee Develop or adapt training guides for analyzing newspaper data and for adapting new thematic toolkits Project leads; FIU technical team Local community training events and edit a thons Project leads; advisory committee; community partners Sustaining the use model: While routine outreach at conferences should help ensure continued awareness and use of plug and play datasets and some experimentation with newspaper data, it will be more challenging to launch and sustain development of future thematic data toolkits and other interpretive research projects. To alleviate this concern, one goal of the project will be to collaborate with the advisory committee and to document a network of scholarly and technical exper ts -in hurricane and disaster studies, in textual and image data analysis, and in humanities data curation -who future users may be able to call upon. The dLOC network also provides members with support for grant development, a service that could help fost er future investment in projects leveraging newspaper data. Draft Implementation Model Phase 1: Newspaper data processing & dissemination The first step will focus on finalizing a list of newspaper titles for inclusion in the project. We will aim to make available approximately 200,000 page images and OCR text for bulk download based on the following criteria: Contribution to breadth across geographic, linguistic, and temporal coverage. Availability of acceptable OCR text. This may privilege newspapers that have been digitized Cuban newspaper titles). Born digital titles may also be considered. For tit les in need of OCR reprocessing, we will seek out publications that contain a small issues within the publication. We will reprocess OCR only where the results are mos t likely to be acceptable (e.g. high contrast, consistent layout, minimal deterioration). Titles must not already be available in Chronicling America, Europeana Newspapers, or other sources where bulk download or computational analysis is currently feasibl e. Permission must have been previously granted by the copyright holder for noncommercial use, or titles must be in the public domain.

PAGE 11

To make the data available for download, we will complete the following steps: 1. Where necessary, undertake OCR reprocess ing and quality assurance and ingest updated text files in dLOC. Work with advisory committee to spot check records across text in multiple languages. (UF technical team; advisory committee) 2. To facilitate UI download access to batch data: Copy available metadata and file directories to public FIU data repository. Directories are currently structured in pairtree format, also used by resources such as HathiTrust. Data packages will contain page images in JPEG 2000 and PDF formats; uncorrected OCR text with word bounding boxes; and structural metadata. TIFF files will not be disseminated as part of data packages. (UF & FIU technical teams) 3. To facilitate access to bulk text only downloads: For each publication, extract and package OCR text files to provide bulk download access at title level outside nested directory structure. Each download will include a file manifest with page identifiers and corresponding issue dates. These files will be made ava ilable both through the dLOC and FIU data repository interfaces. Each data set will be assigned its own DOI, which allows us to link bi directionally from the newspaper collection repository to the associated data. (UF & FIU technical teams) 4. In addition to the metadata which will accompany each of the publication data packages, the Dataverse repository will also contain information about the corpus of materials included in this project as well as pertinent documentation about the preparation and structure o f the materials for bulk download. Phase 2: Derive thematic datasets To highlight the value of a collections as data approach and to open up dLOC newspaper data to a larger audience, we will develop a pilot thematic toolkit that provides structured data focused on newspaper coverage of hurricanes and tropical cyclones. As much as possible, we will undertake this phase with a focus on developing well documented, replicable workflows for application with other themes (see Documentation Overview below). This will require the following steps: 1. Collaboratively develop controlled vocabulary comprised of a target keyword list and Spanish, Dutch, and French, as well as other languages where feasible. Also include proper names assigned to particular storms (e.g. Andrew; San Lorenzo). (Scholarly lead; advisory committee). 2. Create corpus made up of OCR text from Phase 1 as well as Chronicling America data for at least one Puerto Rican newspaper, both to ensure geographic coverage and to test interoperability between data sources as a major goal of this proposal. (FIU technical team) 3. Create a simple topic classifier model to automatically generate new tags within the dataset in or der to expand the list originally defined by the scholarly lead and advisory committee. Define a named entities list. Develop a text extraction model to automatically generate additional keywords based on this list. Compile all references to targeted keywo rds and export each section of text containing a relevant term to tab delimited files and to CSV with

PAGE 12

identifier and corresponding page URI. (FIU technical team; multi lingual advisory committee members) 4. Experiment with additional named entity extraction t ools (e.g. Stanford NLTK) as well as manual methods to geolocate hurricane references and to identify specific individuals or organizations co located with references to hurricanes, referring to existing name authorities wherever possible. Develop data mod el for storing and disseminating entity and relationship information in appropriate formats (CSV, JSON LD, etc.). (Project leads; FIU technical team; multi lingual advisory committee members) 5. Archive the derived thematic datasets as a subsection of the dat a repository containing the content for bulk download. These thematic datasets will include detailed documentation of the processes for identifying target keywords, results and recommendations for further work to improve OCR quality and data interoperabili ty, and process for identifying entity and relationship information. This phase is crucial to making the data actionable for our user community. It will present intellectual and technical challenges in name disambiguation, translation, and data modeling; however, it will also present opportunities to better articulate relevant research questions and the collaborate to build capacity across the dLOC network for develo ping documentation, templates, and training opportunities during and after the grant period. Phase 3: Develop thematic toolkit and documentation As a final step in making data accessible, we will prepare a thematic data toolkit that will include a snapsho t of the hurricane dataset(s); a descriptive document or finding aid with information about toolkit data assets; and an inventory of other relevant digital objects in dLOC (e.g. photographs, government reports) and/or external datasets (e.g. NOAA hurricane data). Additionally, the toolkit will include 4 5 short tutorials demonstrating specific ways to analyze or visualize the data and appropriate tools to explore, based on documented processes and results from be directed toward novice or intermediate users and will include step by step walkthroughs for tasks such as associating entities across two or more datasets; preparing and interpreting data when using tools such as Palladio; or even testing our target ke yword list with software such as AntConc as one step toward understanding how the thematic data was created. Additionally, drawing on projects like DataBasic, we will ensure that our tutorials help users better understand how to formulate questions out of the data, and how to identify the stories that the data can tell. These guides will be crucial to contextualizing the data for students in the digital humanities or other fields as they begin to understand both the potential and the limitations of struct ured data. Wherever possible, we will adapt guides from other sources such as Programming Historian and will make these available in English, Spanish, and French. Depending on timing, these tutorials may

PAGE 13

offer a starting point for local user community events, but they should also be informed by those events as specific needs and suggestions emerge. Encouraging replication of implementation model: Phase 1 of the project will be most adaptable to other inst itutions and collections, as it relies on expertise and technical infrastructure common to a range of cultural institutions. For instance, UF is already planning to incorporate workflows refined through this project into future digitization initiatives acr oss collections. Our focus on newspapers is likely to be of wide interest, especially to those institutions that for a variety of reasons are not included in Chronicling America or that are grappling with legacy OCR. In addition, the use of an established open access data repository software with well established standards will provide a low barrier for others to replicate and implement a similar model of access, sharing, discoverability, and storage. While our pilot thematic toolkit and derived datasets a re more specialized and may not be fully scalable, we plan to create a guide that documents our general methods for creating such a toolkit and recommendations to others regardless of subject matter. This guide will outline processes such as (1) identifyin g potential source data; (2) reviewing and remediating source data; (3) data storage/archiving; (4) identifying potential research questions/application of data; (5) creating derivative data sets; (6) methodologies and tools for analysis. Other model pro jects: For data preparation and dissemination, we will look most closely to the approach Chronicling America has taken in making both textual OCR and page data available, and we will include information in our documentation for those who wish to develop co rpora containing papers from both data sources. This will include consulting internal documentation models for the Florida and Puerto Rican Newspaper Project. Europeana Newspapers NER pr ovides a rich example as we consider feasible methods for identifying events, people, and locations within our own corpus. In curating derived datasets and providing interpretive context, we will seek to emulate approaches such as that described in Katie R awson and Trevor Mu which offers a general framework for making data curation decisions explicit and balancing data normalization with preservatio n of data diversity and even inconsistency. This approach is crucial as we seek to provide datasets that are meaningful and actionable without making opaque the inherent complexity of newspaper coverage over time, language, and space. Positions and duties : Implementation will rely heavily on team members with technical expertise in digital production, OCR workflows, repository infrastructure, metadata, and text analysis. While our project brings together this expertise from across institutions, the staffin g model is readily adaptable for single institutions with sufficient expertise in across these areas. Training on both a local and broader community level will be key to ensuring the sustainability of our approach. Depending on interest and expertise, the graduate intern will have an opportunity to contribute throughout the data curation

PAGE 14

Positions & Duties Summary Implementation Focused Responsibility Primary Individual/Group Project management and coordination Collins (project wide and UF); Rogers (FIU) Identifying and preparing data for dissemination, including OCR quality control UF/FIU digital collections and metadata experts, with support from multi lingual members of advisory committee Packaging, formatting, and describing data UF/FIU digital collections, repository, and metadata experts Developing corpus of Caribbean newspaper hurricane coverage Castro; Guan; project leads; graduate intern; support from multi lingual members of advisory committee Experimentatio n with semi automated and manual methods for extracting named entities associated with hurricane coverage, including data modeling Guan; Castro; Bakker; project leads; graduate intern; support from multi lingual members of advisory committee Create themat ic toolkit, including derived datasets, documentation, training modules, and references to other relevant data Scholarly lead; project lead; project intern documentation to build local capacity for text analysis a nd data modeling Guan; Castro; Bakker Create training materials for those seeking to adapt implementation model for other collections Project leads; technical partners Data Overview To summarize, data to be disseminated will include the following: Data Type Format(s) Source(s) Access Point(s) Caribbean Newspapers page images JPEG2000/PDF dLOC/UF FIU Dataverse; page level access in dLOC Caribbean Newspapers OCR ALTO/XML; TXT dLOC/UF FIU Dataverse; potential for

PAGE 15

text redundant access in dLOC Structural and descriptive metadata METS XML dLOC/UF FIU Dataverse Dataset including references to hurricanes and related personal or geographic entities CSV; JSON LD Derived from Caribbean Newspaper data and other relevant titles currently in Chronicling America dLOC/FIU Dataverse/project website Scripts Python FIU Technical Team Open source (GitHub) Documentation Overview: The following summarizes documentation to be generated over the course of the grant period , as described above. The Project Lead (Collins) and Project Intern will be responsible for assigning documentation tasks and collecting and depositing outputs. Materials will be made available through dLOC, GitHub, Dataverse, and the project website as ap propriate. Materials will be shared under a Creative Commons Attribution Non Commercial 4.0 International License except in cases where local community event organizers choose a different license for their outputs. Item(s) Project website with overview, monthly updates, and resources Statement of ethical principles for collaboration and data reuse Environmental scan to contextualize use and implementation models Advisory board meeting minutes Evaluation of OCR quality, including results and template for review at distinct points (initial title selection, recognition of target keywords, named entity recognition) Workflows and scripts for migrating data from UF to FIU repository and for extracting plaintext files for deposit in dLOC Data citation metadata, discipline specific metadata and file level documentation Workflows for creating dataset from Caribbean News papers and Chronicling America (Puerto Rico) titles and any interoperability obstacles Derived data documentation (including target keyword list and methodology; workflows and scripts for extracting and structuring hurricane related data; key obstacles and

PAGE 16

troubleshooting; data model) Project team training materials focused on text analysis and geocoding Community event materials (including call for participation; marketing resources; slides/recordings; and assessment instruments) Conference presentation slides/recordings Thematic data documentation (including target keyword list; plain l anguage summary of and other relevant datasets; basic tutorials) Virtual outreach/webinars to promote use and implementation models Project report outlining summ ary of methods, accomplishments, and outputs (including use/implementation models), as well as lessons learned and future goals. Areas of likely broad interest will include OCR evaluation; ethics of collections as data in a shared governance environment; a nd our hurricane focused thematic approach. Stewardship & Sustainability: After refining workflows and points of data discovery as described in Phase 1, UF and FIU are committed to adopting these processes long term for future digitization initiatives as well as existing collections. UF has already begun reprocessing legacy OCR, and this project will help team and by testing OCR quality in research contex ts beyond basic text search. For Phases 2 and 3, we will consider sustainability from two perspectives, for the hurricane focused thematic toolkit and for the broader data analysis and toolkit models: For the former, project team members will commit at minimum to ensuring toolkit assets -including derived datasets and documentation -are available through dLOC and the FIU data repository for long centralized computing framework, a cloud computing infrastructure with 22 servers, over 220 TB storage space, and sufficient redundancy. Files will be routinely backed up on a weekly schedule, with versioning and a disaster recovery (DR) setup located in Tallahassee, Florida. As resources allow and as more dLOC newspaper data is made publicly available, we will continue to grow these datasets to include additional references to hurricanes and impacted people and locations. To enable the project team and ot her researchers to replicate the thematic toolkit model, we will also maintain workflow documentation, data modeling guidance, relevant training materials, and project code for long term access and preservation in dLOC and GitHub. Finally, we will write a nd disseminate a project white paper and other publications as appropriate to document project goals, lessons learned, and use cases.

PAGE 17

(7) Timeline of completion Activities & Responsible Team Member(s) Quarter 1 (Jan. Mar. 2020) Review and finalize titles for inclusion (Project leads, partners, advisory committee) Undertake OCR as necessary and implement quality assurance (UF technical partners; multi lingual members of advisory committee) Hire graduate student intern (Project lead) Advisory committee virtual meeting Quarter 2 (April June 2020) Prepare bulk download packages and disseminate via FIU Dataverse and dLOC (UF and FIU technical partners) Launch project website with documentation for data access (Project leads) Develop target keyword list for hurricane thematic text analysis (Scholarly lead; advisory committee) Advisory committee virtual meeting Quarter 3 (July Sept. 2020) Refine FIU Dataverse access and determine appropriate points of discovery via other platforms (FIU technical partners) Keyword analysis and data extraction (FIU technical team; multi lingual members of advisory committee) Collaborate to implement text analysis and to develop thematic data model (Project leads, technical partners, graduate intern) Use topic classifier model to extract geographic or personal named entities where feasible and associate with existing authorities (VIAF, OpenStreetMaps) (Project leads, technical par tners, graduate intern) Release call for local community events (Project lead, graduate intern) Disseminate information about project and initial findings on project website and conference presentations (Project lead, scholarly lead, graduate intern) Quarter 4 (Oct. Dec. 2020) Continue data processing and correction of major errors (Project lead, FIU technical team, graduate intern) Finalize thematic data toolkit with derived datasets, documentation, and use case examples (Project lead, scholarly lead, graduate intern) Announce local community events and work with

PAGE 18

organizers to identify specific data or training needs (Project lead, Scholarly lead, graduate intern) Conference outreach (Project leads, graduate intern) Complete and release documentation for technical workflows and associated code (Project lead, FIU technical team) Advisory Committee virtual meeting Quarter 5 (Jan. Mar. 2021) Provide facilitation and follow up support to local community event organizers (Project leads, FIU technical team) Virtual training opportunities (Project leads; advisory committee)

PAGE 19

Appendix A: List of Newspaper Titles The titles below are candidates for inclusion in bulk data downloads made available. This list excludes titles from Puerto Rico and the Virgin Islands that are already accessible or will soon be accessible through Chronicling America. Ab aconian (Bahamas, 1993 present) https://dloc.com/UF00093713/00001/allvolumes Carteles (Cuba, 1919 1927) https://dloc.com/AA00065193/00001 Le Civilisateur (Haiti, 1870 1873) https://dloc.com/AA00062914/0000 1/allvolumes Ha•ti IllustrŽe (Haiti, 1890 1892) https://dloc.com/AA00062728/00001/allvolumes L’stin Diario (Dominican Republic), 1909 1930 https://www.dloc.com/AA00021654/00006/allvolumes Outlook (Belize, 1945 1946) https://dloc.com/AA00064484/00001/allvolumes Le Progressiste (Martinique), over 1,400 issues from years 1958 2002, 2006 2009 https://www.dloc.com/l/AA00053606/000 02/allvolumes Abeng (Jamaica, 1969) https://www.dloc.com/UF00100338/00001/allvolumes?se arch=jamaica The Grenada Newsletter (1974 1994) https://dloc.com/AA00000053/00002/allvolumes

PAGE 20

Appendix B: Caribbean Data Curation Graduate Intern Position Description Term(s): Summer/Fall 2020 Compensation: $15/hr for up to 480 hrs Position overview: Reporting to the Project Lead/Scholarly Communications Librarian at the University of Florida, the Caribbean Data Curation Graduate Intern will play a meaningful role in developing the grant funded initiative dLOC as Data: A Thematic Approach to Caribbean Newspapers. In partnership with the Digi tal Library of the Caribbean (dLOC), this project seeks -including text and page images -available for new kinds of scholarly and educational exploration. Additionally, t he project will result in a toolkit focusing on ways newspaper data can offer insights into the history and impact of hurricanes and tropical storms in the Caribbean. Based on interest and experience, the intern will collaborate with team members at UF and Florida International University to help prepare and disseminate data, to engage with dLOC community partners, and to create training and demonstration materials. The project team is ging all contributions. Summary of duties: The intern will work alongside team members to create a thematic toolkit focused on newspaper coverage of hurricanes in the Caribbean. This may include participation in gathering and enhancing data; providing doc umentation to help others understand how to use the data; and identifying other datasets or primary sources relevant to this topic. The intern will also play a key role in outreach, including creating blog and social media content and responding to inquiri es from community partners. Duties will be finalized at the beginning of the internship to interns are required to participate in a CV writing workshop and t o give a public presentation on their work. The intern will also be invited to participate in a 1 day project meeting and training session to be hosted at FIU in Miami. Required qualifications: 1. Enrollment in a relevant advanced degree program at the Unive rsity of Florida. Many fields may be considered relevant; candidates should describe why their academic background supports the position duties in the letter of interest. 2. Interest in the digital humanities and a willingness to experiment with new technologies. 3. Experience giving presentations and teaching others in formal or informal settings. 4. Strong written communication skills and experience writing for public audiences. 5. Enthusiasm for collaborating with international partners. 6. Experience editing basic websites and familiarity with Microsoft Excel or other spreadsheet applications.

PAGE 21

Preferred qualifications: Note that the following are preferred, not required, and strong candidates need not meet every qualification. 1. Some knowle dge of text analysis and data visualization concepts and software and experience preparing data for analysis. 2. Reading knowledge of French and/or Spanish. 3. Experience giving presentations or trainings in an online setting.

PAGE 22

!"#$%&'()*#"' +,-./.0/(/(123/1(4/(125 !"#$%$$&'('(%'$)$*(*'(%'+ ,"-./$012-3$&'('(%4)*(*'(%'+ /-.6$&$,%"'.7.8$9"' !"#$%&'%()*+&,)-!./)012+'.3 45)6)7859::: 7;9<5:/:: !"#$%&' !()'*$%&' =,>?@?A.'&.?B+)*+&,)-=A+@"?$3 4<)6)75C9D$@.#A).$.&%3 7<9DF8/ME !"'&%** !+)$$$%*( G?H?.&%)I$%%+".?$@A)-N&OO+'3 45)6)7D895:: 7<9;<5/:: !&"#%$& !$)-'*%$& G?H?.&%)01>&@?.?+A)-I&A.'$3 4D)6)7D59::: 7F9EM:/:: !&'*%$& !$)+-*%$& J+P$A?.$'()I$$',?@&.$')-Q'+RR.3 4;)6)75D9:<5 7F98<:/C5 !('&%#" !$)'$&%-+ =,B?A$')-I&P$3 4F)6)7E59::: 75::/:: !'%'' !&''%'' 1-.:,%;9".<";"=%#' !"#$%&'%()*+&,)-!./)012+'.3 +(%'#. 7F95DC/D5 !$,*%++ !#)"$+%," =,>?@?A.'&.?B+)*+&,)-=A+@"?$3 +(%'#. 7&@?.?+A)-I&A.'$3 +(%'#. 78D*;'?&#$;#(@*&&$A*,$#*,.:""' =,B?A$')#$@$'&'?1>)-K?%3 75::/:: !&''%'' =,B?A$')#$@$'&'?1>)-*+B?3 75::/:: !&''%'' =,B?A$')#$@$'&'?1>)-S$#&@3 75::/:: !&''%'' B-.C,$D"& I$@R+'+@"+)F)6)G?H?.&%)*?2'&'()T+,+'&.?$@)-G*T3 T'$>)K'&@.)6)7<9:FF/FC/)TLU)I$A.) !#&'+)V$.&%W)7F9:M>1@?.()+B+@.)A.?P+@,A 7;::)&B&?%&2%+).$)1P).$)F:) ?@A.?.1.?$@A)?@.+'+A.+,)?@)#$A.?@H) %$"&%)+,?.6&6.#$@AX#&"O&.#$@A 7;9:::/:: !+)''' G&.&)!.$'&H+)I$A.) TLU)I$A.)A#&'+)6)R$').#+)YZ[K<::) &@,)ZGT)R?%+A).$)%?@O).+\.),&.&)2&"O) .$).#+),*]I)'+P$A?.$'( !$)$'' $)$'' >*&&"@#%*;'.$'.!$#$.
PAGE 23

M-.6?A$N$,H' !++)2+%$^ !#()$'-%'' -#$56786926/$:8./;.< &')''' =#$>192$9?-";.< !-)-'& 012-3$@"1A6B2$:8./;.< &-)-'&

PAGE 24

!"#$%&'()*#"' +"$,./.0/(/(123/1(4/(125 +"$,.1.0/(/(1/34(4/(121/5 C*#$& /-.6$&$,%"'.7.8$9"' Z'$_+".)%+&,)-I$%%?@A3 54 7;9888/:: 7M<;/:: 7D95EM/:: I&'?22+&@)G&.&)I1'&.?$@)K'&,1&.+)L@.+'@ 7F5\DE:)#'A 7C9<::/:: 7C9<::/:: G?H?.&%)!1PP$'.)!+'B?"+A)-G?@A>$'+3 <4)-"$A.)A#&'+3 7F9E5D/:: 7F9E5D/:: G?H?.&%)Z'$,1".?$@)-Z+''(3 <4)-"$A.)A#&'+3 7F9F;M/:: 7F9F;M/:: G?H?.&%)Z&'.@+'A#?PA)-V&(%$'3 <4)-"$A.)A#&'+3 7<9FD:/:: 7<9FD:/:: 1-.:,%;9".<";"=%#' Z'$_+".)%+&,)-I$%%?@A3 <8/E:4 7ME$'+3 <8/E4)-"$A.)A#&'+3 !(-,%'' 7DMC/:: G?H?.&%)Z'$,1".?$@)-Z+''(3 ;5/C4)-"$A.)A#&'+3 7D:C/:: 7D:C/:: G?H?.&%)Z&'.@+'A#?PA)-V&(%$'3 <8/E4)-"$A.)A#&'+3 75C5/:: 75C5/:: 4-.>*;'?&#$;#.:""' B-.C,$D"& V'&B+%).$)^$'OA#$P)-TLUXS?&>?3 <6,&().'&B+%) -&""$>>$,&.?$@A9) P+',?+>3)R$')P'$_+".)%+&,) &@,)H'&,1&.+)?@.+'@ 7CEF/:: 7CEF/:: -#$56786926/$:8./;.< !#()$'-%'' !#()$'-%'' =#$>192$9?-";.< !*)*#$%'' !*)*#$%'' 012-3$98=-C-"/$:8./;.< !$')"$#%''

PAGE 25

!!!!!!!!!!!!!!!!!!!! ! ! ! " #$#%&'!( #)*&*+!,-!%./!0&*#))/&1!2!345!(#)*&*#/6 ! 77899!:;!< %. ! :%*//%=!>(!?79@=!A#&B#=!3',*#C&!??7DD!2!E/'F!?9GF?H!%>!$9'!51?1$40!@1&(4(A!%>!$9'!/4(1&&'42!BC@"/D!42C!E 0%(1C4! F 2$'(24$1%240! G 21H'(31$A ! @1&(4(1'3+!F!I42$! $%!'J;('33!3K;;%($!>%(!$9'!L C@"/!43!54$46!:!M9'=4$1#!:;;(%4#9!$%!/4(1&&'42!N'I3;4;'(3O!?(42$!4;;01#4$1%2P! 70'43'!4#Q2%I0'C?'!=A!#%==1$='2$!43!RJ'#K$1H'!51('#$%(!%>!$9'!51?1$40!@1&(4(A!%>!$9'!/4(1&&'42!BC@"/D!$%! #%004&%(4$'!I1$9! S'21%(! :C =1213$(4$1H'!@'4C!T4=1'!U%?'(3+!42C ! 7(%V'#$!@'4C 3 ! 7'((A!/%00123!42C!W4C43349!S$P! WK&'($P ! M9'!;(%V'#$!0'4C3! I100!94H'!4##'33!$%!42A!('3%K(#'3!$9'A!2''C!CK(12?!$9'!;(%V'#$ P!M9'A!I100!&'!4&0'!$%! ;4($1#1;4$'!12!%K(!0'4C'(391;!=''$12?3!$%!K;C4$'!%K(!='=&'(3!4&% K$!$9'!1=;4#$!%>!$9'1(!I%(QP!5(P!S$P!WK&'($+!43! 'J X ">>1#1%!>%(!$9'!S#9%04(0A! :CH13%(A! Y%4(C +!I100!94H'!%K(!3K;;%($!$%!C133'=124$'!>12C12?3!$% ! %K(!H4(1%K3! 4KC1'2#'3P!F2!4CC1$1%2+!%K(!%>>1#'!I100!3K;;%($!;(';4(4$1%23!>%(!400!H1($K40 ! 42C ! 12 X ;'(3%2!=''$12?3 ! 43 ! I'00!43! #%==K21$A!'2?4?'='2$!'H'2$3 P ! ! M9 '! L C@"/!43!54$46!:!M9'=4$1#!:;;(%4#9!$%!/4(1&&'42!N'I3;4;'(3O ! ;(%V'#$! 13!1=;%($42$!$%!$9'!I%(Q!I'!C%! &'#4K3'! 1$!9'0;3!K3!433'33!42C ! &'?12!$% ! 4CC('33!3%='!%>!$9'!&1??'3$!#9400'2?'3!I'!>4#' ! 12!;(%H1C12?!4##'331&0'! C4$4!$%!3#9%04(3!42C!('3'4(#9'(3P! M913!;(%V'#$! K0$1=4$'0A!12#('43' 3 ! $9' ! 4##'331&101$A!%>!;(1=4(A!42C!3'#%2C4(A! %;'2 X 4##'33!('3'4(#9!42C!'CK#4$1%2!=4$'(1403P!M9'!1=;%($42#'!%>!$913!;(%V'#$!$%! %K(!;(%V'#$!0'4C3 ! 13!>K($9'(! 4=;01>1'C!& A!$9'!('3;%231&101$1'3!42C!(%0'3!$9'A!;04A!%2!$9'!>(%2$012'3!%>!3#9%04(0A!('3'4(#9!42C!'CK#4$1%2! '2C'4H%(3P! ! ! C@"/!13!9%3$'C!$9(%K?9!$9'!S%&'Q/Z!;04$>%(=+!42!%;'2 X 3%K(#'!(';%31$%(A!3%>$I4('!3%0K$1%2!C'H'0%;'C!4$!$9'! G21H'(31$A!%>!E0%(1C4!42C!1=;0'='2$'C!&A ! &%$9!GE!42C!EFG!>%(!=%3$!%>!%K(!C1?1$40!#%00'#$1%23P!7(%H1C12?! 12#('43'C!>K2#$1%2401$A!$%!$9'!;04$>%(=!943!&''2!;4($!%>!%K(!%2 X ?%12?!?%403P!W4H12?!4##'33!$%!"/U!$'J$!42C! 'H40K4$12?!=K0$1012?K40!"/U![K401$A+!?('4$0A!12#('43'!Q2%I0'C?'!4&%K$!42C!>(%=!$9'!/4 (1&&'42P! WK((1#42'3!42C! M(%;1#40!S$%(=3!94H'!394;'C!%K(!913$%(1#40!#%2$'J$!12!S%K$9!E0%(1C4!42C!$9'!/4(1&&'42P!:3!$9'!04(?'3$!42C! &K31'3$!%2012'!/4(1&&'42!#%2$'2$!01&(4(A+!I'!I100!=4Q'!3K('!$%!0'H'(4?'!%K(!3#9%04(0A!&%4(C+!%K(!#%004&%(4$1H'! I%(Q!I1$912!42C ! 4#(%33!123$1$K$1%23+!43!I'00!43!C@"/!31$'\3!('4#9!B IIIPC0%#P#%= D!$%!'2?4?'!=42A!4&%K$!$9'! 4H4104&101$A!%>!$9'3'!C4$4 X 3'$3!42C!394('!$9'!0'33%23!I'!0'4(2!>(%=!1=;0'='2$12?!=K0$1012?K40!"/UP! ! ! M%!('1$'(4$'!=A!#%==1$='2 $!$%!/4(1&&'42!S$KC1'3+!";'2!:##'33!'CK#4$1%2+!42C!('3'4(#9!=4$'(1403+! F! >K00A! 3K;;%($!42C!'2C%(3'!$9'!L C@"/!43!54$46!:!M9'=4$1#!:;;(%4#9!$%!/4(1&&'42!N'I3;4;'(3 O ! ;(%V'#$P ! 70'43'!>''0! >(''!$%!#%2$4#$!='!1>!A%K!94H'![K'3$1%23!%(!('[K1('!4CC1$1%240!12>%(=4$1 %2P! ! ! S12#'('0A+! ! ! ! ! Z1?K'0!:3'2#1%! ! 51('#$%(! !

PAGE 26

Jamie Rogers Florida International University 11200 SW 8 th Street Miami, FL 33199 305 348 6932 rogersj@fiu.edu October 18, 2019 To: Collections as Data: Part to Whole From: Jamie Rogers, Florida International University RE: dLOC as Data: A Thematic Approach to Caribbean Newspapers pleased to submit this letter of support for the initiative dLOC as Data: A Thematic Approach to Caribbean Newspapers . This proposed project aims to provide ready access to a large corpus of Caribbean newspaper textual data as well as a prototype thematic toolkit focused on the impact and response to hurricanes and tropical cyclones across the Caribbean . The outcomes of this project have the potential to be far reaching, not only serving the community of students and scholars who study the history of the Caribbean and the impacts of natural disasters, they may also provide insights into future humanitarian efforts and disaster recovery. Serving as co administrative lead for the project, I will direct the efforts of in their execution of quality control of data as well as archiving and preparation of the data for bulk download utilizing Dataverse . The FIU technical team will also perform textual analysis , data extraction, and the develop ment of curated thematic derivative data sets to be included in the toolkit , which will be archived , preserved, and made available to students and scholars . I enthusiastically support this initiative as it uniquely addresses pressing concerns as climate change increasing ly impact s our universities, local communities , and Caribbean neighbors. It is also a tremend ous opportunity for professional development with the FIU Digital Collections Center, to expand the skill sets of our library team , who in turn support our students and faculty with expanded research capacities. This project will also serve as a pilot for future collections as data endeavors across both institutions and within dLOC. If our proposal is accepted, we will e nsure our team has the necessary support and time allotted to accomplish the goals of this initiative. Sincerely, Jamie Rogers Assistant Director Digital Collections Center Florida International University

PAGE 27

! "#!$%&'(!)**+,-&#.-/!0#1-.-&-.+# ! !"#$%"&'(&)*+,-"$.&/01$+$0". ! ! 2.3.-'(!4',-#5,16.*1!7!8-,'-53.5 1 ! 4)!9+:! ;;<=>> ! ! ?'.#51@.((5A!BC!D>E;; F <=>> ! ! DG> F ><;= ! ! *5,,/H+((.#1I&J(K5L& ! ! ! ! ! ! )H-+M5,!>=A!>= ;N ! ! 25',!O5@.5P!Q+RR.--55S ! ! 0!'R!P,.-.#3!-+!5:*,511!R/!H+RR.-R5#-!-+! *',-.H.*'-5!'1!*,+T5H-!(5'L!J+,!LC)Q!'1!2'-'S!"!U65R'-.H! "**,+'H6!-+!Q',.MM5'#!V5P1*'*5,1A!'#L!-+!J&,-65,!L51H,.M5!R/!,+(5!.#!R'#'3.#3!-6.1!.R*+,-'#-! 5JJ+,-K ! ! *+1.-.+#!J+H&151!+#!L5@5(+*.#3!.#J,'1-,&H-&,5!J+,!5R5,3.#3!J+,R1!+J!*&M(.H'-.+#!'#L!L.3.-'(! 6+1-!'#L!*,+R+-5!'P',5#511!+J!J+&#L'-.+#'(!*('-J+,R1A!L'-'A!'#L!-,'. #.#3!+**+,-&#.-.51!J+,!-6+15! P6+!',5!5'35,!-+!5#3'35!P.-6!H+((5H-.+#1!65(L!M/!-65!C.M,',.51!'#L!+&,!*',-#5,1!.#!#5P!P'/1K!U6.1! *,+T5H-!P.((!H+R*(5R5#-!+&,!M,+'L5,!L.3.-'(!1H6+(',16.*!*,+3,'R!'#L!L5R+#1-,'-5!-65!*+-5#-.'(!J+,! .#.-.'-.@51!-6'-!R+@5!M5/+#L!L. 3.-.W'-.+#K! ! ! X/!3,'#F 1&**+,-5L!,+(5!'1!H+ F (5'L!P.((!'((+P!R5!L5L.H'-5L!-.R5!-+!J+H&1!+#!*,+T5H-!R'#'35R5#-! +@5,!-65!H+&,15!+J!-65!3,'#-!*5,.+LK!"1!+&-(.#5L!.#!-65!*,+*+1'(!.R*(5R5#-'-.+#!R+L5(!'#L!-.R5(.#5! -+!H+R*(5-.+#A!0!P.((!*',-.H.*'-5!.#!R+1-!1-5*1! +J!-65!*,+T5H-A!P.-6!'!R+,5!'H-.@5!,+(5!.#!(.'.1.#3!P.-6! -65!L.3.-'(!*,+L&H-.+#!-5'R!'-!YB!'#L!.#!1&*5,@.1.#3!-65!3,'L&'-5!.#-5,#K!"((!-5'R!R5RM5,1!P.((! L+H&R5#-!H+#-,.M&-.+#1!@.'!'!*,+T5H-!R'#'35R5#-!*('-J+,R!1&H6!'1!U,5((+A!'#L!0!P.((!'H-!'1!'! *,.R',/!1+ &,H5!+J!H+RR&#.H'-.+#!-+!-65!'L@.1+,/!H+RR.--55!-+!5#1&,5!-65/!6'@5!+**+,-&#.-.51!-+! *',-.H.*'-5!.#!R5'#.#3J&(A!P5(( F L5J.#5L!P'/1K ! "1!*,+T5H-!(5'LA!0!P.((!+@5,155!'L65,5#H5!-+!5-6.H'(! *,'H-.H51!'1!L51H,.M5L!.#!-65!#',,'-.@5A!5#1&,.#3!'((!H+#-,.M&-.+#1!',5! 'HZ#+P(5L35L K ! ! 0!P.((!'(1+!*('/!'!(5'L5,16.*!,+(5A!'(+#3!P.-6!-65!1H6+(',(/!(5'L!'#L!3,'L&'-5!.#-5,#A!'1!'!H+RR&#.-/! (.'.1+#!.#!L.115R.#'-.#3!.#J+,R'-.+#!'M+&-!-65!*,+T5H-!-+!LC)Q!'#L!+-65,!1-'Z56+(L5,1 ! '#L!.#! 155Z.#3! .#*&! J,+R!#5P!'#L!H&,,5#-!*',-#5,1 K!U6.1!P.((!.#H(&L5!R'.#-5#'#H5!+J!'!*&M(.H!P5M1.-5!'#L! L.1H&11.+#!3,+&*!'#L!H++,L.#'-.+#!+J!(+H'(!5@5#-1!-+!5:*5,.R5#-!P.-6!#5P1*'*5,!L'-'K!95H'&15!+J! R/!M,+'L!Z#+P(5L35!+J!-65!J.5(L!'1!'!J+,R5,!*,+3,'R!+JJ.H5,!.#!-65!V$[!)JJ .H5!+J!2.3.-'(! [&R'#.-.51A!0!'R!'(1+!P5(( F *+1.-.+#5L!-+!,5'H6!+&-!-+!#5-P+,Z1!M5/+#L!Q',.MM5'#!1-&L.51!-+! FF '#L! -+!155Z!J55LM'HZ!+#!-6515!R+L5(1!J,+R!+-65,!5:*5,-1!.# ! -65!J.5(LK ! !

PAGE 28

! ! 5:*,5115L!'!P.((.#3#511!-+!,5J.#5!H&,,5#-!P+,ZJ(+P1!'#L!*+-5#-.'((/!R'Z5!H6'#351!-+!+&,!YB F M&.(-A! +*5# F 1+&,H5!,5*+1.-+,/!*('-J+,R!.#!+,L5,!-+! 5#'M(5!'!H+((5H-.+#1!'1!L'-'!'**,+'H6K!U6.1!P+&(L!65(*! R55-!L5R'#L!J+,!'HH511!-+!LC)Q!-5:-&'(!'#L!.R'35!L'-'!'1!P5((!'1!L'-'!J,+R!H+((5H-.+#1!1&H6!'1!-65! H+R *&-'-.+#'(!'#'(/1.1K!U6.1!*+-5#-.'(!J+,!(+#3 F -5,R!1&1-'.#'M.(.-/!'#L!L5@5(+*R5#-!P.((!6'@5!'! -5H6#.H'(!.#J,'1-,&H-&,5!.#!+,L5,!-+!*,+@.L5!'HH511!-+!-65.,!+P#!H+( (5H-.+#1K ! ! 8.#H5,5(/A ! ! 45,,/!Q+((.#1 ! 8H6+(',(/!Q+RR&#.H'-.+#1!C.M,',.'# ! Y#.@5,1.-/!+J!B(+,.L' ! DG> F ><;= ! *5,,/H+((.#1I&J(K5L& !

PAGE 29

!!!!!!!!!!!!!!!!!!!! ! ! ! " #$#%&'!( #)*&*+!,-!%./!0&*#))/&1!2!345!(#)*&*#/6 ! 77899!:;!< %. ! :%*//%=!>(!?79@=!A#&B#=!3',*#C&!??7DD!2!E/'F!?9GF?H

xml version 1.0 encoding UTF-8 standalone no
fcla fda yes
!-- dLOC as Data: A Thematic Approach to Caribbean Newspapers ( Mixed Material ) --
METS:mets OBJID IR00011056_00001
xmlns:METS http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink
xmlns:xsi http:www.w3.org2001XMLSchema-instance
xmlns:daitss http:www.fcla.edudlsmddaitss
xmlns:mods http:www.loc.govmodsv3
xmlns:sobekcm http:digital.uflib.ufl.edumetadatasobekcm
xmlns:lom http:digital.uflib.ufl.edumetadatasobekcm_lom
xsi:schemaLocation
http:www.loc.govstandardsmetsmets.xsd
http:www.fcla.edudlsmddaitssdaitss.xsd
http:www.loc.govmodsv3mods-3-4.xsd
http:digital.uflib.ufl.edumetadatasobekcmsobekcm.xsd
METS:metsHdr CREATEDATE 2020-08-13T22:18:34Z ID LASTMODDATE 2020-01-02T11:06:07Z RECORDSTATUS COMPLETE
METS:agent ROLE CREATOR TYPE ORGANIZATION
METS:name UF,University of Florida Institutional Repository
METS:note Created using CompleteTemplate 'IR' and project 'UFIR'.
OTHERTYPE SOFTWARE OTHER
Go UFDC - FDA Preparation Tool
INDIVIDUAL
UFAD\renner
METS:dmdSec DMD1
METS:mdWrap MDTYPE MODS MIMETYPE textxml LABEL Metadata
METS:xmlData
mods:mods
mods:abstract Digital Library of the Caribbean (dLOC) intends to enhance access to its existing Caribbean newspaper collections by making texts available for bulk download to its users. This will facilitate modes of scholarship that depend on access to image and textual data at scale and will enable a new level of access to titles not included in newspaper data resources such as Chronicling America. To meet the needs of the dLOC community for teaching and research, we will demonstrate the potential of newspaper data by creating a pilot thematic tool kit focused on hurricanes and tropical cyclones. The toolkit will provide multilingual datasets focused on these disasters from several countries and islands in the Caribbean, such as the Bahamas, Belize, Cuba, the Dominican Republic, Grenada, Haiti, Jamaica, and Martinique.
mods:accessCondition type restrictions on use displayLabel Rights [cc by-nc] This item is licensed with the Creative Commons Attribution Non-Commerical License. This license lets others remix, tweak, and build upon this work non-commercially, and although their new works must also acknowledge the author and be non-commercial, they don’t have to license their derivative works on the same terms.
mods:language
mods:languageTerm text English
code authority iso639-2b eng
mods:location
mods:physicalLocation University of Florida
UF
mods:name
mods:namePart Collins, Perry
mods:affiliation University of Florida
St. Hubert, Hadassah
Rogers, Jamie
Asencio, Miguel
Bakker, Rebecca
Castro, Molly
Guan, Boyuan
Krefft, Jill
Dinsmore, Chelsea
Perry, Laura
Taylor, Laurie
mods:note acquisition Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Perry Collins.
Application to the Collections as Data: Part to Whole (https://collectionsasdata.github.io/part2whole/) program, funded by the Andrew W. Mellon Foundation and administered by the University of Nevada-Las Vegas.
mods:originInfo
mods:dateIssued 2019-12-13
mods:recordInfo
mods:recordIdentifier source sobekcm IR00011056_00001
mods:recordContentSource University of Florida Institutional Repository
mods:relatedItem original
mods:physicalDescription
mods:extent Grant proposal
mods:titleInfo
mods:title dLOC as Data: A Thematic Approach to Caribbean Newspapers
mods:typeOfResource mixed material
DMD2
OTHERMDTYPE SOBEKCM SobekCM Custom
sobekcm:procParam
sobekcm:Aggregation ALL
DLOC1
UFIR
IUF
UFIRGRANTS
sobekcm:MainThumbnail Revised dLOC as Data Proposalthmthm.jpg
sobekcm:Wordmark UFIR
sobekcm:bibDesc
sobekcm:BibID IR00011056
sobekcm:VID 00001
sobekcm:Source
sobekcm:statement UF University of Florida Institutional Repository
sobekcm:SortDate 737405
METS:amdSec
METS:digiprovMD DIGIPROV1
DAITSS Archiving Information
daitss:daitss
daitss:AGREEMENT_INFO ACCOUNT PROJECT UFDC
METS:techMD TECH1
File Technical Details
sobekcm:FileInfo
METS:fileSec
METS:fileGrp USE reference
METS:file GROUPID G1 PDF1 applicationpdf CHECKSUM 7194c113825166f6cd816a413167d149 CHECKSUMTYPE MD5 SIZE 1253587
METS:FLocat LOCTYPE OTHERLOCTYPE SYSTEM xlink:href Revised%20dLOC%20as%20Data%20Proposal.pdf
THUMB1 imagejpeg-thumbnails d8c93567df10575c9edf30762ada3a91 7863
Revised%20dLOC%20as%20Data%20Proposalthm.jpg
G6 THUMB6 3ccd31b90ceabe00eb3abaa70ba65ebc 7847
Revised%20dLOC%20as%20Data%20Proposalthmthm.jpg
G7 THUMB7 696044bc500e81194cfe45fe875edf83 7849
Revised%20dLOC%20as%20Data%20Proposalthmthmthm.jpg
G2 TXT2 textplain a7a29d0afc0d77cc5812bb859f04e74b 64340
Revised%20dLOC%20as%20Data%20Proposal_pdf.txt
G3 TXT3 9e1dcbc52e73116128c44b4defc3b0ba 5043
IR00011056_00001_xml.txt
G4 TXT4 50b78ae198d0d2e881c35c7a7fd865a2 1086
agreement.txt
G5 XML5 722daf367f87f55143983e974ab3e222 6182
marc.xml
METS:structMap STRUCT2 other
METS:div DMDID ADMID ORDER 0 main
ODIV1 1 Main
FILES1 Revised Data Proposal Page
METS:fptr FILEID
FILES2 2
FILES3 3
FILES4 4
FILES5 5
FILES6 6
FILES7 7