Digital Library of the Caribbean | english español français |
About dLOC | Topical Collections | Partner Collections |
| ![]() |
UFDC Home |
![]() ![]() ![]() ![]()
|
||||||||||||||||||||||||||||||||||||||||||||||||
Full Text | ||||||||||||||||||||||||||||||||||||||||||||||||
The University of Florida's Andrew W. Mellon Foundation Proposal Purpose: To review and potentially revise the direction of our Andrew W. Mellon Foundation proposal, shifting emphasis from the "Caribbean Newspaper Imaging Project" to the "Library of the Caribbean". To work toward a planning grant. Conversations with Marilyn Deegan (Oxford University's Mellon Foundation funded Forced Migration Online) per Andrew W. Mellon Foundation technology program underpin the proposed shift. Summary Agenda 1. Andrew W. Mellon Foundation technology program 2. The Library of the Caribbean : topical and technical conceptualization 3. Active partnerships 4. Passive partnerships 5. Economic model 6. Proposed next steps Detailed Agenda & Proposed Re-Direction: 1. Andrew W. Mellon Foundation technology program 1.1. Elaboration of conversations with Marilyn Deegan (Forced Migration Online) 1.2. Proposed outcome: Erich will contact Don Waters (Andrew W. Mellon Foundation) 2. The Library of the Caribbean : topical and technical conceptualization 2.1. Topical Conceptualization 2.1.1. Definition of the "Caribbean" 2.1.2. NEH and Southeastern Humanities Center (UVA) (limits of) 2.2. The place of the Caribbean Newspaper Imaging Project in the Library of the Caribbean. (Discussion of bibliographic formats) 2.3. Technical Conceptualization 2.3.1. Linking systems (XPATTM & ActivePaperTM) 2.3.2. (Linking decentralized libraries : a harvesting project for IMLS funding) Digital Libraries containing content not expected to be contributed directly into the Library of the Caribbean: * (Southeastern Humanities Center) * University of Miami * University of Texas * World Bank Group (Latin America & the Caribbean) 2.3.3. (Traditional Preservation & Preservation Microfilming) Mellon Project 3. Active partnerships 3.1. Extend the PALMM model outward into the Caribbean 3.1.1. Review of the model: centralized technology with distributed content providers * Supporting letters (potential): Paul Conway (UNC-CH); Liz Bishoff (Colorado Digital Library), etc. 3.1.2. Coordination by the University of Florida under the auspices of PALMM * Florida Center for Library Automation (technology partner) 3.1.3. Review of extant PALMM collections * Caribbean Newspaper Imaging Project * Florida Heritage Collection * French Revolution Frangaise (Haitian component) * Eric Eustace Williams * Linking Florida's Natural Heritage * Reclaiming the Everglades * U.S. V.I. History and Culture 3.1.4. Potential partners and their roles Libraries that might contribute content; charter members of the Library of the Caribbean: * (Archivo National de Cuba) * Caribbean Journal of Science (http://www.caribisci.orq/) * Journal of Caribbean Archaeology (http://www.flmnh.ufl.edu/JCA/) * Pontificia Universidad Cat61lica Madre y Maestra (Rep. Dominicana) * Florida International University * University of the Virgin Islands * University of the West Indies (Cave Hill, Barbados) * University of the West Indies (Mona, Jamaica) * University of the West Indies (St. Augustine, Trinidad) 3.2. Internal partners (University of Florida) for content and secondary education and curriculum grants, potentially including: * Caribbean Agricultural Research and Development Institute (CARDI) * Caribbean Students Association * Center for African Studies (re: Diaspora studies) * Center for Latin American Studies * College of Natural Resources and Environment * Ethnoecology Society * Florida Institute Of Paleoenvironmental Research (FLIPER) * Florida Museum of Natural History * Institute for Food and Agricultural Science (IFAS) * Hispanic Student Association * Land Use and Environmental Change Institute (LUECI) * Office of Academic Technology * Preservation Institute: Caribbean (College of Architecture) * Tropical Research and Education Center 4. Passive partnerships 4.1. Rational for passive partnerships (digital library programs indirectly related) 4.2. Identify potential partners, e.g., * Forced Migration Online (Oxford University, funded by the Andrew W. Mellon Foundation) * Underground Railroad Freedom Center (Cincinnati, funded by Federated Dept. Stores) Mellon Project 5. Economic model * De-emphasize the economic (sales) model for the moment * Revisit the economic model as part of a planning grant 6. Proposed next steps 6.1. Outline "Library of the Caribbean" content and growth plan 6.2. Draft and float partnership documents 6.3. Convene a meeting of partners (with funding from DSR and IFAS) and confirm roles of the partners * From Florida International University (Vicki Silvera (Special Collections), and Catherine Marsicek (Latin American & Caribbean Collection)) * From Pontificia Universidad Cat61lica Madre y Maestra, Republica Dominica: Dulce Maria Nuiez de Taveras (University Librarian) * From UVI: Jennifer Jackson (University Librarian and acting Provost) and/or Judith Rogers (UVI's USVI History & Culture Project), * From UWI, Trinidad: Margaret Rouse Jones (Campus Librarian) * From UWI, Barbados: Elizabeth Watson (Campus Librarian; ACURIL president) 6.4. Draft project narrative 6.5. Establish equipment and services needs and prepare budget. Mellon Project CARIBBEAN NEWSPAPER IMAGING PROJECT Phase III The University of Florida proposes a third phase of is Andrew W. Mellon Foundation funded Caribbean Newspaper Imaging Project, originally granted as a scholarly technology project by Richard Ekman, the Foundation's former director. Phase III will establish the collection as a viable, Internet accessible service based on an economic model similar to that of JSTOR, using Olive Software. The projected funding request for this two-year project is $500,000 from the Foundation with a minimum of approximately $50,000 contributed as overhead by to the project by the University and its partners. COLLECTIONS Phase III expands the number and run of available titles; making the sum economically viable as a collection. Previously imaged titles, the Diario de la Marina and Le Nouvelliste, will be repurposed for use with Olive Software. And, the available run of the titles will be expanded. New titles such as the Le Matin (Port-au- Prince, Haiti), Trinidad Guardian (Port of Spain, Trinidad), and Nassau Guardian and Nassau Tribune (Nassau, Bahamas) will be added to extend the collection's geographic representation. Priority for conversion will be given to periods of significance in the life of the country of origin or in coverage of regional issues such as independence and economic development. Other titles will be added over time, as the economic model and other funding generate the fiscal resources to expand. TECHNOLOGY Phases I and II investigated the economics of digitization, indexing, and optical character recognition (OCR), as well as for distribution of a product on CD-ROM. These phases developed efficient, low-cost digitization methods but reported high indexing costs. And, OCR with then available technologies was an inadequate and inaccurate substitute for indexing. By the end of the Phase II project, Internet distribution was in greater demand the CD-ROM product sought when Phase I was originally outlined. Following Phase II, the University of Florida migrated the product to an Internet base and began repurposing images for that medium. Phase III utilizes Olive Software. Olive Software, marketed in North America by OCLC, is recognized as the most advanced image display, text recognition, and search system for newspapers. Its highly accurate text recognition processes obviate the need to index. And, its search system is capable of differentiating types of content, e.g., titles, by-lines, and article content. The University of Florida is one of OCLC's Digitization and Preservation Cooperative partners. And, University staff sit on the Cooperative's Historic Newspaper Preservation and Digitization Group. The Group's research agenda was drafted by 1 Equal Opportunity / Affirmative Action Institution CARIBBEAN NEWSPAPER IMAGING PROJECT Phase III the University's representative, Erich Kesse, who will serve as one of the co- principle investigators for this Andrew W. Mellon Foundation funded project. PARTNERS The University of Florida continues to serve the lead institution for this project. But, the project will now be open to partnership with other institutions and content providers. OCLC's Historic Newspaper Digitization Service joins the University to maintain the Olive server; and, it facilitates an OAl-compliant archive of master image and text files. Files will be licensed to OCLC, which will broker them as part of a broader collection now under construction. Long-term capital and operating costs are built into the continuing budgets of the University. Partners for content include the other public universities of the state of Florida. Partnership for content has been discussed with the University of the Virgin Islands, University of the West Indies, the publisher of the Trinidad Guardian, and others. Partners for content agree, not only to provide content and metadata enhancement, but to support over time scholarly research through the creation of education modules that explore particular topics and teach users how to conduct research using newspapers. Among the potential partners for research, the Poynter Institute (St. Petersburg, Florida) is nationally recognized. None of the partners, as yet, have signed-on; all await this Phase of the University of Florida's Caribbean Newspaper Imaging Project. The University of the Virgin Islands is already a content partner for other digital collections and has unofficially stated intent to extend content with newspaper clippings and whole newspaper issues. The publisher of the Trinidad Guardian has already granted permission for the use of clippings associated with the University of Florida's Eric Eustace Williams digital collection and has expressed interest in joining a partnership for digitization of issues remaining in the public domain. ECONOMIC MODEL The economic model in place at the end of Phase III will borrow heavily from the JSTOR experience and OCLC's NetLibrary. The resulting Caribbean newspaper collection, will be marketed by OCLC. The exact economic model is still being worked out. Revenues raised from subscription or pay-per-view services will be turned back into preservation and maintenance and continued development of the collection and its delivery technology. Portions of these revenues will pay royalties to publishers and contributing institutions on a percentage-use basis. And, contributing institutions may use their own resources free of charge or at prorated subscription fees. 2 Equal Opportunity / Affirmative Action Institution CARIBBEAN NEWSPAPER IMAGING PROJECT Phase III COSTS Costs estimated to date, on the high-side: Equipment * O live Server License........................ .................. $ 45,000 * S e rv e r ........................................................ .. .......... . $ 1 3 ,0 0 0 * Microfilm Scanner (high speed) ............................... $ 40,000 * Computer workstation for scanner............................. $6,500 * Computer workstation for image processing.............. $4,500 Total Equipm ent ...... .... ..................................... $ 109,000 Travel & Materials Transfer * Partner travel for organizational meetings ............. $ xxx,xxx * Materials Transfer for digitization/filming................ $ xxx,xxx Total Travel & Materials Transfer .............................. $ xxx,xxx Services * C opyright N negotiation ................................... ................. $ 0 This budget request funds only pre-1923 issues, all in the public domain both under U.S. and international copyright legislation. We'll want to formulate a copyright policy - I recommend adoption of the PALMM policy - and formulate cost projection schemes for the following: a) Labor cost b) Programmer: tracking system c) Legal fees d) Copyright fees * Preservation Microfilming (titles not yet preserved) ...........$ 0 This budget does not allow for any preservation microfilming. We might should plan to cost-share archiving the preservation master microfilms for the term of the award period. We'll want to provide a statement as to why we continue to regard preservation microfilming for newspapers a necessity. And, we may want to formulate preservation microfilming costs for ingest of titles into the digital collection only. Formulae would be offered to partners bringing into CNIP new and as yet not preservation microfilmed titles/issues. a) Labor cost b) Programmer: tracking system c) Legal fees d) Copyright fees e) Deposit and archival storage fees * Microfilm conversion Services ............................... $ xxx,xxx See separately prepared cost sheet. * OCLC Olive Software Services.............................. $ xxx,xxx See separately prepared cost sheet. * Intelligent G geographic Tagging .......................... ................. $ 3 Equal Opportunity / Affirmative Action Institution CARIBBEAN NEWSPAPER IMAGING PROJECT Phase III Having prepared the cost sheet, I believe that the project can no longer or not presently include this task. We should encourage faculty to use the digital collection, to write grants for projects using the collection, etc. N.B. comes out of discussion with Su Chen Ching (CISE) Programming that would recognize Geographic names, involving look-up systems interfaced with GNIS, that tags names as a geographic places to facilitate place name search and differentiation. Not currently part of Olive. Cost might be recovered by selling back to Olive. * Storage (Electronic Archive) ................. ............. $ xxx,xxx * Sales Softw are...... .... ................................... $ xxx,xxx * Staff time * Named individuals (salary @ O.X FTE)............ $ xxx,xxx * Benefits and other overhead ........................... $ xxx,xxx Total Services ....... ... ...................................... $ xxx,xxx 4 Equal Opportunity / Affirmative Action Institution Caribbean Newspaper Imaging Project Phase III Costs Associated with Newspapers Assumptions: Reel Contents * 100 foot reels containing approximately 660 frames * Each frame contains 2 pages. There are approximately 1,320 pages per reel. Vended Costs * $0.32 per page-image raster conversion fee or .................per reel @ $ 422.40 * $0.45 per page Olive processing or ............ ....................per reel @ $594 * Total all vended solution is $0.80 per page or ...................per reel @ $1,056 With a potentially lowered image conversion cost: .............per reel @ $765.60 Potentially as low as $0.58 per page * In terms of total hours, this appears to be a two year vended project. In-House Image Conversion * High-speed m icrofilm scanner............................. ..............................................@... $40,000 * Com puter workstation (scanner) .............................................. ...................@... $6,500 * Com puter workstation (processing)................... ............................................@... $4,500 * T ota l H a rdw are ........................................................................ ...................... @ $5 1,000 * C D raw m edia and jew el cases ......................................................... .................... $1,700 * Labor :....... .... ........................... @ $7.25/hr x 1 hr/reel = $7.25/reel Image conversion * Labor :......................................................................... @ 6.50/hr x 0.25 hr/reel $1.63/reel Post-conversion image processing (semi-automated) and redundant image archiving both to CD and, via FTP, to FCLA's off-line file store at NERDC. * Total Labor per reel = ......................................... ................................... $8.88/reel * Total partial vended cost excluding hardware is .................... ....................... $602.88 * Unit costs range from In-house imaging/Vended Olive processing...................... ..................... $0.49 - $0.52 A ll vended service ................................................................... ................. . $0.64 - $0.70 Based on alternative funding formulas (below) * In terms of total hours, this appears to be a one and one-half year vended project. Other Costs * Please be reminded that this sheet looks at newspaper analog to digital conversion only. The total ball-park figure for this project appears to hover either just under or just under $500,000 not including cost share and overhead. We have to decide if we want to break the $500,000 ceiling. And, as we do so, what the perceptions and risks of doing so are. * Other costs not accounted here include: * O live server softw are ..................................................................... $45,000 * O live server hardw are ........................................................ ...................@...... $13,000 * Delivery media (PDFs + Text w/Newspaper XML mark-up)................. not yet costed * Archival storage media (digital master TIFF files) ................................ not yet costed * Staff time and other overhead............................ ....................... not yet fully costed Caribbean Newspaper Imaging Project Phase III Costs Associated with Newspapers 1) El Diario de la Marina (Havana, Cuba) * 719 reels = 949,080 pages * 1845 to 1882 * 1899 to 1961 (previously imaged) * 246 from before 1923 = 324,720 pages * 198,282 pages in previous project images Imaging Conversion 103910.40 2184.48 Olive Processing (new images) 146124.00 146124.00 Olive Processing (old images) 89226.90 89226.90 Total 339261.3 237535.38 2) Le Nouvelliste (Port-au-Prince, Haiti) * 1899 to 1913 and 1925 to 1985, * 1899 to 1979 (previously imaged) * 145 reels = 191,400 pages * 26 before 1923 = 30,360 pages (all image conversion completed previously) Imaging Conversion 0 0 Olive Processing (old images) 13662.00 13662.00 Total 13662.00 13662.00 3) Le Matin (Port-au-Prince, Haiti) * No contact info online. * 1907 to 1981 * 100 reels = 132,000 pages * 16 before 1923 = 21,120 pages Imaging Conversion 6758.40 142.08 Olive Processing I 9504.00 9504.00 Total 16262.4 9646.08 4) Trinidad Guardian (Port of Spain, Trinidad) * Contact info: http://www.guardian.co.tt/contactus.html. We have previous contact with this publisher, as well as a personal contact outside the library to this publisher * 1917 to 1999 * 611 reels = 806,520 pages * 24 before 1923 = 31,680 pages Imaging Conversion 10137.60 213.12 Olive Processing I14256.00 14256.00 Total 24393.6 14469.12 5) Port of Spain Gazette (Port of Spain, Trinidad) * No contact info. online. * 1828 to 1956 * 243 reels = 320,760 pages * 110 before 1923 = 145,200 pages Imaging Conversion 46464.00 976.80 Olive Processing I65340.00 65340.00 Total 111804 66316.8 Caribbean Newspaper Imaging Project Phase III Costs Associated with Newspapers 6) Nassau Guardian (Nassau, Bahamas) * Contact info: http://www.thenassauguardian.com/gendocs display.php?id=1 * 1849 to 1997 * 337 reels = 444,840 pages * 32 before 1923 = 42,240 pages Imaging Conversion 113516.80 284.161 Olive Processing I19008.00 19008.00 Total 32524.8 19292.16 7) Nassau Tribune (Nassau, Bahamas) * Contact info. (from microfilm): Ms. Eileen Dupuch Carron, Publisher, PO Box N3207, Nassau, Bahamas. Telephone: (242) 322-1986. * 1911 to 1916 and 1921 to 2000 * 275 reels = 363,000 pages * 7 before 1923 =9,240 pages Imaging Conversion 1 2956.80 1 62.161 Olive Processing I 4158.00 4158.00 Total 7114.8 4220.16 8) El Caribe (Santo Domingo, Dominican Republic) * Contact info: http://www.elcaribe.com.do/caribe digital/quienes somos.htm * 1948 to 1999 * 450 reels = 594,000 pages * Seems to have stopped in 1999, after death of publisher. Relaunched in 2000. 9) Amigoe di Curacao (Curacao, Netherlands Antilles) * Contact info: http://www.amigoe.com/english/about us.html * 1961-1998 * 130 reels = 171,600 pages Caribbean Newspaper Imaging Project Phase III Costs Associated with Newspapers TOTALS "E- ALL High-speed Mfm Scanner & computer workstations 51000.00 Raw media & jewel cases 0 1700.00 Diario 339261.3 237535.38 Nouvelliste 13662.00 13662.00 Matin 16262.4 9646.08 Trinidad Guardian 24393.6 14469.12 Port of Spain Gazette 111804 66316.8 Nassau Guardian 32524.8 19292.16 Nassau Tribune 7114.8 4220.16 Total 543099.9 417841.7 Alternate formula 1: & computer workstations Raw media & jewel cases 0 1700.00 Diario 339261.3 237535.38 Nouvelliste 13662.00 13662.00 Matin 16262.4 9646.08 Port of Spain Gazette 111804 66316.8 Nassau Guardian 32524.8 19292.16 Total 511591.5 399152.42 Alternate formula 2. High-speed Mfm Scanner 0 0 51000.00 & computer workstations Raw media & jewel cases 0 1700.00 Diario 339261.3 237535.38 Nouvelliste 13662.00 13662.00 Port of Spain Gazette 111804 66316.8 Nassau Guardian 32524.8 19292.16 Total 495329.1 389506.34 RESEARCH TOPICS * Web-Interface Performance * DTD Extensibility * Imaging * Distillation m Other topics? SABCD EFGH , IJKL MNOP SQRST !UVWX YZ 2002 September -- ejk/UF CONTEXT Image Only Pilots * Australian Periodical Publications, 1840-1845 * National Library of New Zealand. Papers Past Image & Indexing/Tagging Pilot * University of Florida. Caribbean Newspaper Imaging Project * University of Florida. Florida Newspaper Project Image & OCR Pilots * Lambrakis Press Archives * ProQuest. Historical NewspapersTM * TIDEN Project : a Nordic Digital Newspaper Library Olive Software Pilot * The British Library 2002 September -- ejk/UF WEB-INTERFACE PERFORMANCE Primary Purpose: * Characterize the bias of individuals conducting study Products: * How to use ActivePaperTM to YourAdvantage * Integration with CONTENTdm, XPAT 5.0, other * Alternate deliverable images * Centralized service - Distributed content - Variable platforms 2002 September -- ejk/UF DTD EXTENSIBILITY Primary Purpose: * Assess the XML against established newspaper uses Products: * How to use ActivePaperTM to Your Advantage * Document the XML as a public DTD * Establish a maintenance authority * Provide for extension of the DTD * Automation for extended tagging * How to construct a style sheet * Integration with CONTENTdm, XPAT 5.0, other * Define issues per the Economic Model 2002 September -- ejk/UF IMAGING * Directory Structure and File Naming * Archival Formats * Optimized Imaging 2002 September -- ejk/UF IMAGING: Directory Structure and File Naming Primary Purpose: * Recommended practices Products: * Methods for dealing with anomalies * Automated name capture during imaging 2002 September -- ejk/UF IMAGING: Archival Formats Primary Purpose: * Description of file formats & their characteristics for archive, distillation, and distribution Products: * Preservation metadata * Anticipate migration * Schedule & fee structure for inspection & migration * Strategy for format migrations & emulation 2002 September -- ejk/UF IMAGING: Optimized Imaging Primary Purpose: * Best practices for microfilming and digitizing (quantitative assessments) * Film reduction ratio * Evenness of illumination on film * Film background density * Quality Index & DPI/PPI * Skew * Color-space & Bit-depth * Image density/black & white points * Despeckling and Sharpening * Image restoration methods 2002 September -- ejk/UF IMAGING: Optimized Imaging Environments: * Operating System * Scanning Hardware * Lighting and Light Filtration * Post-processing * Other? Other Products: * Control target for OCR assessment * Revision: RLG Preservation Microfilming Guidelines 2002 September -- ejk/UF DISTILLATION * Document Zoning * Optical Character Recognition ABCD EFGH IJKL MNOP QRST iUVWX YZ * ABCD1 EFGH' IJKL 'MNOP, QRST .UVWX YZ 2002 September -- ejk/UF DISTILLATION: Document Zoning Primary Purpose: * Confirm assumptions re: document zoning * OCR has difficulty processing large letters * Smaller zone yield more accurate text Products: * Establish reference to the ... * PDF (fully scaled) * TIFF * Other derivative file formats (fully scaled) 2002 September -- ejk/UF ABCD EFGH1' IJKL MNOP OQRST VUVWX YZ ABCDI EFGH' I J KL 'MNOP QRST !UVWX YZ DISTILLATION: OCR Primary Purpose: * Provide quantitative OCR accuracy information Areas of Investigation: * Distillation Source Images * Language and Fonts * Column & Line Density * Relative Density/Contrast * Text Curvature and Other Defects 2002 September -- ejk/UF ABCD EFGH1 IJKL MNOP QRST UVWX ABCDI EFGH' IJKL 'MNOP, QRST !UVWX YZ DISTILLATION: OCR Distillation Source Images ABCD EFGH: IJKL MNOP QjRST UVWX YZ Primary Purpose: * Predict accuracy contingent upon source document (printing technologies & filming standards) Test-Set Characterization: * Source type (newspaper or microfilm) * Production date (technologies & standards used) Additional Products: * Best practices * Accuracy : Cost - Matrix 2002 September -- ejk/UF ABCD1 EFGH' IJKL ;MNOP QRST !UVWX YZ DISTILLATION: OCR Language and Fonts Primary Purpose: * Demonstrate ability to distill languages, character sets & fonts Test-Set Characterization: * Language & character set groups * Font face & font size groups * Regional variant spellings Additional Products: * Olive Software Speaks Your Language * How Olive Software Learns Your Lingo * Stylized text recognition & distillation guide 2002 September -- ejk/UF ABCD EFGH: IJKL MNOP QjRST UVWX YZ - -ZJ' ABCD1 EFGH' IJKL MNOP, Q RST !UVWX YZ DISTILLATION: OCR Column & Line Density Primary Purpose: * Demonstrate ability to distill compact text Test-Set Characterization: * Pre-1900 newspapers * Advertisement pages * Pages predominantly 8 pt. type or less * Pages with less than 1 mm space between lines * Pages with characters spaced at or below � mm 2002 September -- ejk/UF ABCD EFGH: IJKL MNOP QjRST VUVWX YZ - - ZJ' ABCD1 EFGH' IJKL M NOP, Q RST _UVWX YZ DISTILLATION: OCR Relative Density/Contrast Primary Purpose: * Investigate low and uneven contrast materials Test-Set Characterization: * Low contrast pages * Pages with low contrast zones * Printing, Filming, & Age/Storage Defects Additional Products: * Best practices * Accuracy : Cost - Matrix * Don't forget to buy the Life Insurance 2002 September -- ejk/UF ABCD EFGH: IJKL MNOP QjRST UVWX YZ ABCD1 EFGH' :MNOP, Q RST _UVWX YZ E DISTILLATION: OCR Text Curvature and Other Defects Primary Purpose: * Benchmark current capability to distill curved text & other defects of printing or filming Test-Set Characterization: * Curved text zones * Broken character zones * Broken line zones * Garbage elements (stains, etc.) Additional Products: * (Additional automatic image correction processes) 2002 September -- ejk/UF ABCD EFGH: IJK L MNOP QjRST UVWX YZ - -ZJ' ABCD1 EFGH' IJKL M NOP Q RST !UVWX YZ |