UFDC Home  
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00094075/00003
 Material Information
Title: CNIP 3: Grant Proposal Documents (Not finalized/funded)
Physical Description: Book
Language: English
Creator: George A. Smathers Libraries, University of Florida
Publisher: George A. Smathers Libraries, University of Florida
Place of Publication: Gainesville, Fla.
 Notes
Abstract: The Caribbean Newspaper Imaging Project was a series of demonstration projects, both funded by the Andrew W. Mellon Foundation and the University of Florida Libraries. These projects occurred as two distinct phases: Phase One: Imaging and Indexing Model. A feasibility studies for imaging and indexing. The imaging study examined the efficacy of digitizing microfilms produced in advance of current preservation microfilming standards. It also examined the use of off-the-shelf microfilm-projection scanning, as well as associated costs, benefits and drawbacks. The indexing study examined indexing procedures, application of controlled terminology, and the costs associated with multi-lingual term assignments by human readers. Phase Two: OCR Gateway to Indexing. A feasibility study on the application of Optical Character Recognition (OCR). In its current state, the Project is undergoing technological renovation, that is migration from CD-ROM to Internet delivery. At the same time, the Project is developing plans for additional content.
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UF00094075:00003

Full Text






The University of Florida's
Andrew W. Mellon Foundation Proposal

Purpose:
To review and potentially revise the direction of our Andrew W. Mellon
Foundation proposal, shifting emphasis from the "Caribbean Newspaper
Imaging Project" to the "Library of the Caribbean".
To work toward a planning grant.
Conversations with Marilyn Deegan (Oxford University's Mellon Foundation funded Forced
Migration Online) per Andrew W. Mellon Foundation technology program underpin the
proposed shift.


Summary Agenda
1. Andrew W. Mellon Foundation technology program
2. The Library of the Caribbean : topical and technical conceptualization
3. Active partnerships
4. Passive partnerships
5. Economic model
6. Proposed next steps


Detailed Agenda & Proposed Re-Direction:
1. Andrew W. Mellon Foundation technology program
1.1. Elaboration of conversations with Marilyn Deegan (Forced Migration Online)
1.2. Proposed outcome: Erich will contact Don Waters (Andrew W. Mellon
Foundation)
2. The Library of the Caribbean : topical and technical conceptualization
2.1. Topical Conceptualization
2.1.1. Definition of the "Caribbean"
2.1.2. NEH and Southeastern Humanities Center (UVA) (limits of)
2.2. The place of the Caribbean Newspaper Imaging Project in the Library of the
Caribbean. (Discussion of bibliographic formats)
2.3. Technical Conceptualization
2.3.1. Linking systems (XPATTM & ActivePaperTM)
2.3.2. (Linking decentralized libraries : a harvesting project for IMLS funding)
Digital Libraries containing content not expected to be contributed
directly into the Library of the Caribbean:
* (Southeastern Humanities Center)
* University of Miami
* University of Texas
* World Bank Group (Latin America & the Caribbean)
2.3.3. (Traditional Preservation & Preservation Microfilming)


Mellon Project










3. Active partnerships
3.1. Extend the PALMM model outward into the Caribbean
3.1.1. Review of the model: centralized technology with distributed content
providers
* Supporting letters (potential): Paul Conway (UNC-CH); Liz Bishoff
(Colorado Digital Library), etc.
3.1.2. Coordination by the University of Florida under the auspices of PALMM
* Florida Center for Library Automation (technology partner)
3.1.3. Review of extant PALMM collections
* Caribbean Newspaper Imaging Project
* Florida Heritage Collection
* French Revolution Frangaise (Haitian component)
* Eric Eustace Williams
* Linking Florida's Natural Heritage
* Reclaiming the Everglades
* U.S. V.I. History and Culture
3.1.4. Potential partners and their roles
Libraries that might contribute content; charter members of the Library of
the Caribbean:
* (Archivo National de Cuba)
* Caribbean Journal of Science (http://www.caribisci.orq/)
* Journal of Caribbean Archaeology (http://www.flmnh.ufl.edu/JCA/)
* Pontificia Universidad Cat61lica Madre y Maestra (Rep. Dominicana)
* Florida International University
* University of the Virgin Islands
* University of the West Indies (Cave Hill, Barbados)
* University of the West Indies (Mona, Jamaica)
* University of the West Indies (St. Augustine, Trinidad)
3.2. Internal partners (University of Florida)
for content and secondary education and curriculum grants, potentially including:
* Caribbean Agricultural Research and Development Institute (CARDI)
* Caribbean Students Association
* Center for African Studies (re: Diaspora studies)
* Center for Latin American Studies
* College of Natural Resources and Environment
* Ethnoecology Society
* Florida Institute Of Paleoenvironmental Research (FLIPER)
* Florida Museum of Natural History
* Institute for Food and Agricultural Science (IFAS)
* Hispanic Student Association
* Land Use and Environmental Change Institute (LUECI)
* Office of Academic Technology
* Preservation Institute: Caribbean (College of Architecture)
* Tropical Research and Education Center
4. Passive partnerships
4.1. Rational for passive partnerships (digital library programs indirectly related)
4.2. Identify potential partners, e.g.,
* Forced Migration Online (Oxford University, funded by the Andrew W. Mellon
Foundation)
* Underground Railroad Freedom Center (Cincinnati, funded by Federated
Dept. Stores)


Mellon Project










5. Economic model
* De-emphasize the economic (sales) model for the moment
* Revisit the economic model as part of a planning grant
6. Proposed next steps
6.1. Outline "Library of the Caribbean" content and growth plan
6.2. Draft and float partnership documents
6.3. Convene a meeting of partners (with funding from DSR and IFAS) and confirm
roles of the partners
* From Florida International University (Vicki Silvera (Special Collections), and
Catherine Marsicek (Latin American & Caribbean Collection))
* From Pontificia Universidad Cat61lica Madre y Maestra, Republica Dominica:
Dulce Maria Nuiez de Taveras (University Librarian)
* From UVI: Jennifer Jackson (University Librarian and acting Provost) and/or
Judith Rogers (UVI's USVI History & Culture Project),
* From UWI, Trinidad: Margaret Rouse Jones (Campus Librarian)
* From UWI, Barbados: Elizabeth Watson (Campus Librarian; ACURIL
president)
6.4. Draft project narrative
6.5. Establish equipment and services needs and prepare budget.


Mellon Project







CARIBBEAN NEWSPAPER IMAGING PROJECT
Phase III




The University of Florida proposes a third phase of is Andrew W. Mellon Foundation
funded Caribbean Newspaper Imaging Project, originally granted as a scholarly
technology project by Richard Ekman, the Foundation's former director. Phase III
will establish the collection as a viable, Internet accessible service based on an
economic model similar to that of JSTOR, using Olive Software. The projected
funding request for this two-year project is $500,000 from the Foundation with a
minimum of approximately $50,000 contributed as overhead by to the project by the
University and its partners.

COLLECTIONS
Phase III expands the number and run of available titles; making the sum
economically viable as a collection. Previously imaged titles, the Diario de la Marina
and Le Nouvelliste, will be repurposed for use with Olive Software. And, the
available run of the titles will be expanded. New titles such as the Le Matin (Port-au-
Prince, Haiti), Trinidad Guardian (Port of Spain, Trinidad), and Nassau Guardian and
Nassau Tribune (Nassau, Bahamas) will be added to extend the collection's
geographic representation. Priority for conversion will be given to periods of
significance in the life of the country of origin or in coverage of regional issues such
as independence and economic development. Other titles will be added over time,
as the economic model and other funding generate the fiscal resources to expand.

TECHNOLOGY
Phases I and II investigated the economics of digitization, indexing, and optical
character recognition (OCR), as well as for distribution of a product on CD-ROM.
These phases developed efficient, low-cost digitization methods but reported high
indexing costs. And, OCR with then available technologies was an inadequate and
inaccurate substitute for indexing. By the end of the Phase II project, Internet
distribution was in greater demand the CD-ROM product sought when Phase I was
originally outlined. Following Phase II, the University of Florida migrated the product
to an Internet base and began repurposing images for that medium.

Phase III utilizes Olive Software. Olive Software, marketed in North America by
OCLC, is recognized as the most advanced image display, text recognition, and
search system for newspapers. Its highly accurate text recognition processes
obviate the need to index. And, its search system is capable of differentiating types
of content, e.g., titles, by-lines, and article content.

The University of Florida is one of OCLC's Digitization and Preservation Cooperative
partners. And, University staff sit on the Cooperative's Historic Newspaper
Preservation and Digitization Group. The Group's research agenda was drafted by


1
Equal Opportunity / Affirmative Action Institution







CARIBBEAN NEWSPAPER IMAGING PROJECT
Phase III

the University's representative, Erich Kesse, who will serve as one of the co-
principle investigators for this Andrew W. Mellon Foundation funded project.

PARTNERS
The University of Florida continues to serve the lead institution for this project. But,
the project will now be open to partnership with other institutions and content
providers. OCLC's Historic Newspaper Digitization Service joins the University to
maintain the Olive server; and, it facilitates an OAl-compliant archive of master
image and text files. Files will be licensed to OCLC, which will broker them as part
of a broader collection now under construction. Long-term capital and operating
costs are built into the continuing budgets of the University.

Partners for content include the other public universities of the state of Florida.
Partnership for content has been discussed with the University of the Virgin Islands,
University of the West Indies, the publisher of the Trinidad Guardian, and others.
Partners for content agree, not only to provide content and metadata enhancement,
but to support over time scholarly research through the creation of education
modules that explore particular topics and teach users how to conduct research
using newspapers. Among the potential partners for research, the Poynter Institute
(St. Petersburg, Florida) is nationally recognized.

None of the partners, as yet, have signed-on; all await this Phase of the University of
Florida's Caribbean Newspaper Imaging Project. The University of the Virgin
Islands is already a content partner for other digital collections and has unofficially
stated intent to extend content with newspaper clippings and whole newspaper
issues. The publisher of the Trinidad Guardian has already granted permission for
the use of clippings associated with the University of Florida's Eric Eustace Williams
digital collection and has expressed interest in joining a partnership for digitization of
issues remaining in the public domain.

ECONOMIC MODEL
The economic model in place at the end of Phase III will borrow heavily from the
JSTOR experience and OCLC's NetLibrary. The resulting Caribbean newspaper
collection, will be marketed by OCLC. The exact economic model is still being
worked out.

Revenues raised from subscription or pay-per-view services will be turned back into
preservation and maintenance and continued development of the collection and its
delivery technology. Portions of these revenues will pay royalties to publishers and
contributing institutions on a percentage-use basis. And, contributing institutions
may use their own resources free of charge or at prorated subscription fees.




2
Equal Opportunity / Affirmative Action Institution








CARIBBEAN NEWSPAPER IMAGING PROJECT
Phase III

COSTS

Costs estimated to date, on the high-side:

Equipment
* O live Server License........................ .................. $ 45,000
* S e rv e r ........................................................ .. .......... . $ 1 3 ,0 0 0
* Microfilm Scanner (high speed) ............................... $ 40,000
* Computer workstation for scanner............................. $6,500
* Computer workstation for image processing.............. $4,500
Total Equipm ent ...... .... ..................................... $ 109,000

Travel & Materials Transfer
* Partner travel for organizational meetings ............. $ xxx,xxx
* Materials Transfer for digitization/filming................ $ xxx,xxx
Total Travel & Materials Transfer .............................. $ xxx,xxx

Services
* C opyright N negotiation ................................... ................. $ 0
This budget request funds only pre-1923 issues, all in the
public domain both under U.S. and international copyright
legislation.
We'll want to formulate a copyright policy - I recommend
adoption of the PALMM policy - and formulate cost projection
schemes for the following:
a) Labor cost
b) Programmer: tracking system
c) Legal fees
d) Copyright fees
* Preservation Microfilming (titles not yet preserved) ...........$ 0
This budget does not allow for any preservation microfilming.
We might should plan to cost-share archiving the preservation
master microfilms for the term of the award period.
We'll want to provide a statement as to why we continue to
regard preservation microfilming for newspapers a necessity.
And, we may want to formulate preservation microfilming
costs for ingest of titles into the digital collection only.
Formulae would be offered to partners bringing into CNIP new
and as yet not preservation microfilmed titles/issues.
a) Labor cost
b) Programmer: tracking system
c) Legal fees
d) Copyright fees
e) Deposit and archival storage fees
* Microfilm conversion Services ............................... $ xxx,xxx
See separately prepared cost sheet.
* OCLC Olive Software Services.............................. $ xxx,xxx
See separately prepared cost sheet.
* Intelligent G geographic Tagging .......................... ................. $


3
Equal Opportunity / Affirmative Action Institution








CARIBBEAN NEWSPAPER IMAGING PROJECT
Phase III

Having prepared the cost sheet, I believe that the project can no longer or not
presently include this task. We should encourage faculty to use the digital
collection, to write grants for projects using the collection, etc.
N.B. comes out of discussion with Su Chen Ching (CISE)
Programming that would recognize Geographic names, involving
look-up systems interfaced with GNIS, that tags names as a
geographic places to facilitate place name search and
differentiation.
Not currently part of Olive. Cost might be recovered by selling back
to Olive.
* Storage (Electronic Archive) ................. ............. $ xxx,xxx
* Sales Softw are...... .... ................................... $ xxx,xxx
* Staff time
* Named individuals (salary @ O.X FTE)............ $ xxx,xxx
* Benefits and other overhead ........................... $ xxx,xxx
Total Services ....... ... ...................................... $ xxx,xxx





































4
Equal Opportunity / Affirmative Action Institution







Caribbean Newspaper Imaging Project
Phase III
Costs Associated with Newspapers

Assumptions:

Reel Contents
* 100 foot reels containing approximately 660 frames
* Each frame contains 2 pages. There are approximately 1,320 pages per reel.

Vended Costs
* $0.32 per page-image raster conversion fee or .................per reel @ $ 422.40
* $0.45 per page Olive processing or ............ ....................per reel @ $594
* Total all vended solution is $0.80 per page or ...................per reel @ $1,056
With a potentially lowered image conversion cost: .............per reel @ $765.60
Potentially as low as $0.58 per page
* In terms of total hours, this appears to be a two year vended project.

In-House Image Conversion
* High-speed m icrofilm scanner............................. ..............................................@... $40,000
* Com puter workstation (scanner) .............................................. ...................@... $6,500
* Com puter workstation (processing)................... ............................................@... $4,500
* T ota l H a rdw are ........................................................................ ...................... @ $5 1,000
* C D raw m edia and jew el cases ......................................................... .................... $1,700
* Labor :....... .... ........................... @ $7.25/hr x 1 hr/reel = $7.25/reel
Image conversion
* Labor :......................................................................... @ 6.50/hr x 0.25 hr/reel $1.63/reel
Post-conversion image processing (semi-automated) and
redundant image archiving both to CD and, via FTP, to FCLA's off-line file store at NERDC.
* Total Labor per reel = ......................................... ................................... $8.88/reel
* Total partial vended cost excluding hardware is .................... ....................... $602.88
* Unit costs range from
In-house imaging/Vended Olive processing...................... ..................... $0.49 - $0.52
A ll vended service ................................................................... ................. . $0.64 - $0.70
Based on alternative funding formulas (below)
* In terms of total hours, this appears to be a one and one-half year vended project.

Other Costs
* Please be reminded that this sheet looks at newspaper analog to digital conversion only.
The total ball-park figure for this project appears to hover either just under or just under
$500,000 not including cost share and overhead.
We have to decide if we want to break the $500,000 ceiling. And, as we do so, what the
perceptions and risks of doing so are.
* Other costs not accounted here include:
* O live server softw are ..................................................................... $45,000
* O live server hardw are ........................................................ ...................@...... $13,000
* Delivery media (PDFs + Text w/Newspaper XML mark-up)................. not yet costed
* Archival storage media (digital master TIFF files) ................................ not yet costed
* Staff time and other overhead............................ ....................... not yet fully costed







Caribbean Newspaper Imaging Project
Phase III
Costs Associated with Newspapers

1) El Diario de la Marina (Havana, Cuba)
* 719 reels = 949,080 pages
* 1845 to 1882
* 1899 to 1961 (previously imaged)
* 246 from before 1923 = 324,720 pages
* 198,282 pages in previous project images

Imaging Conversion 103910.40 2184.48
Olive Processing (new images) 146124.00 146124.00
Olive Processing (old images) 89226.90 89226.90
Total 339261.3 237535.38

2) Le Nouvelliste (Port-au-Prince, Haiti)
* 1899 to 1913 and 1925 to 1985,
* 1899 to 1979 (previously imaged)
* 145 reels = 191,400 pages
* 26 before 1923 = 30,360 pages (all image conversion completed previously)

Imaging Conversion 0 0
Olive Processing (old images) 13662.00 13662.00
Total 13662.00 13662.00

3) Le Matin (Port-au-Prince, Haiti)
* No contact info online.
* 1907 to 1981
* 100 reels = 132,000 pages
* 16 before 1923 = 21,120 pages

Imaging Conversion 6758.40 142.08
Olive Processing I 9504.00 9504.00
Total 16262.4 9646.08

4) Trinidad Guardian (Port of Spain, Trinidad)
* Contact info: http://www.guardian.co.tt/contactus.html. We have previous contact with
this publisher, as well as a personal contact outside the library to this publisher
* 1917 to 1999
* 611 reels = 806,520 pages
* 24 before 1923 = 31,680 pages

Imaging Conversion 10137.60 213.12
Olive Processing I14256.00 14256.00
Total 24393.6 14469.12

5) Port of Spain Gazette (Port of Spain, Trinidad)
* No contact info. online.
* 1828 to 1956
* 243 reels = 320,760 pages
* 110 before 1923 = 145,200 pages

Imaging Conversion 46464.00 976.80
Olive Processing I65340.00 65340.00
Total 111804 66316.8







Caribbean Newspaper Imaging Project
Phase III
Costs Associated with Newspapers

6) Nassau Guardian (Nassau, Bahamas)
* Contact info: http://www.thenassauguardian.com/gendocs display.php?id=1
* 1849 to 1997
* 337 reels = 444,840 pages
* 32 before 1923 = 42,240 pages

Imaging Conversion 113516.80 284.161
Olive Processing I19008.00 19008.00
Total 32524.8 19292.16

7) Nassau Tribune (Nassau, Bahamas)
* Contact info. (from microfilm): Ms. Eileen Dupuch Carron, Publisher, PO Box N3207,
Nassau, Bahamas. Telephone: (242) 322-1986.
* 1911 to 1916 and 1921 to 2000
* 275 reels = 363,000 pages
* 7 before 1923 =9,240 pages

Imaging Conversion 1 2956.80 1 62.161
Olive Processing I 4158.00 4158.00
Total 7114.8 4220.16

8) El Caribe (Santo Domingo, Dominican Republic)
* Contact info: http://www.elcaribe.com.do/caribe digital/quienes somos.htm
* 1948 to 1999
* 450 reels = 594,000 pages
* Seems to have stopped in 1999, after death of publisher. Relaunched in 2000.

9) Amigoe di Curacao (Curacao, Netherlands Antilles)
* Contact info: http://www.amigoe.com/english/about us.html
* 1961-1998
* 130 reels = 171,600 pages







Caribbean Newspaper Imaging Project
Phase III
Costs Associated with Newspapers


TOTALS


"E- ALL


High-speed Mfm Scanner
& computer workstations


51000.00


Raw media & jewel cases 0 1700.00
Diario 339261.3 237535.38
Nouvelliste 13662.00 13662.00
Matin 16262.4 9646.08
Trinidad Guardian 24393.6 14469.12
Port of Spain Gazette 111804 66316.8
Nassau Guardian 32524.8 19292.16
Nassau Tribune 7114.8 4220.16
Total 543099.9 417841.7

Alternate formula 1:



& computer workstations
Raw media & jewel cases 0 1700.00
Diario 339261.3 237535.38
Nouvelliste 13662.00 13662.00
Matin 16262.4 9646.08
Port of Spain Gazette 111804 66316.8
Nassau Guardian 32524.8 19292.16
Total 511591.5 399152.42

Alternate formula 2.


High-speed Mfm Scanner 0 0 51000.00
& computer workstations
Raw media & jewel cases 0 1700.00
Diario 339261.3 237535.38
Nouvelliste 13662.00 13662.00
Port of Spain Gazette 111804 66316.8
Nassau Guardian 32524.8 19292.16
Total 495329.1 389506.34







RESEARCH TOPICS


* Web-Interface Performance

* DTD Extensibility

* Imaging

* Distillation


m Other topics?


SABCD
EFGH
, IJKL
MNOP
SQRST
!UVWX
YZ


2002 September -- ejk/UF








CONTEXT


Image Only Pilots
* Australian Periodical Publications, 1840-1845
* National Library of New Zealand. Papers Past
Image & Indexing/Tagging Pilot
* University of Florida. Caribbean Newspaper Imaging Project
* University of Florida. Florida Newspaper Project
Image & OCR Pilots
* Lambrakis Press Archives
* ProQuest. Historical NewspapersTM
* TIDEN Project : a Nordic Digital Newspaper Library
Olive Software Pilot
* The British Library


2002 September -- ejk/UF








WEB-INTERFACE PERFORMANCE



Primary Purpose:
* Characterize the bias of individuals conducting study


Products:
* How to use ActivePaperTM to YourAdvantage
* Integration with CONTENTdm, XPAT 5.0, other
* Alternate deliverable images
* Centralized service - Distributed content - Variable platforms


2002 September -- ejk/UF








DTD EXTENSIBILITY



Primary Purpose:
* Assess the XML against established newspaper uses


Products:
* How to use ActivePaperTM to Your Advantage
* Document the XML as a public DTD
* Establish a maintenance authority
* Provide for extension of the DTD
* Automation for extended tagging
* How to construct a style sheet
* Integration with CONTENTdm, XPAT 5.0, other
* Define issues per the Economic Model


2002 September -- ejk/UF








IMAGING



* Directory Structure and File Naming
* Archival Formats
* Optimized Imaging


2002 September -- ejk/UF








IMAGING: Directory Structure and File Naming



Primary Purpose:
* Recommended practices


Products:
* Methods for dealing with anomalies
* Automated name capture during imaging


2002 September -- ejk/UF








IMAGING: Archival Formats


Primary Purpose:
* Description of file formats & their characteristics
for archive, distillation, and distribution


Products:
* Preservation metadata
* Anticipate migration
* Schedule & fee structure for inspection
& migration
* Strategy for format migrations & emulation


2002 September -- ejk/UF








IMAGING: Optimized Imaging


Primary Purpose:
* Best practices for microfilming and digitizing
(quantitative assessments)
* Film reduction ratio
* Evenness of illumination on film
* Film background density
* Quality Index & DPI/PPI
* Skew
* Color-space & Bit-depth
* Image density/black & white points
* Despeckling and Sharpening
* Image restoration methods


2002 September -- ejk/UF








IMAGING: Optimized Imaging


Environments:
* Operating System
* Scanning Hardware
* Lighting and Light Filtration
* Post-processing

* Other?


Other Products:
* Control target for OCR assessment
* Revision: RLG Preservation Microfilming Guidelines


2002 September -- ejk/UF








DISTILLATION


* Document Zoning
* Optical Character Recognition

ABCD
EFGH
IJKL
MNOP
QRST
iUVWX
YZ


* ABCD1
EFGH'
IJKL
'MNOP,
QRST
.UVWX
YZ
2002 September -- ejk/UF








DISTILLATION:


Document Zoning


Primary Purpose:


* Confirm assumptions re: document zoning
* OCR has difficulty processing large letters
* Smaller zone yield more accurate text


Products:
* Establish reference to the ...
* PDF (fully scaled)
* TIFF
* Other derivative file formats (fully scaled)


2002 September -- ejk/UF


ABCD
EFGH1'
IJKL
MNOP
OQRST
VUVWX
YZ


ABCDI
EFGH'
I J KL
'MNOP
QRST
!UVWX
YZ








DISTILLATION: OCR


Primary Purpose:


* Provide quantitative OCR accuracy information


Areas of Investigation:
* Distillation Source Images
* Language and Fonts
* Column & Line Density
* Relative Density/Contrast
* Text Curvature and Other Defects


2002 September -- ejk/UF


ABCD
EFGH1
IJKL
MNOP
QRST
UVWX



ABCDI
EFGH'
IJKL
'MNOP,
QRST
!UVWX
YZ








DISTILLATION:


OCR


Distillation Source Images


ABCD
EFGH:
IJKL
MNOP
QjRST
UVWX
YZ


Primary Purpose:
* Predict accuracy contingent upon source document
(printing technologies & filming standards)

Test-Set Characterization:
* Source type (newspaper or microfilm)
* Production date (technologies & standards used)

Additional Products:
* Best practices
* Accuracy : Cost - Matrix


2002 September -- ejk/UF


ABCD1
EFGH'
IJKL
;MNOP
QRST
!UVWX
YZ








DISTILLATION: OCR
Language and Fonts


Primary Purpose:
* Demonstrate ability to distill languages, character sets
& fonts

Test-Set Characterization:
* Language & character set groups
* Font face & font size groups
* Regional variant spellings

Additional Products:
* Olive Software Speaks Your Language
* How Olive Software Learns Your Lingo
* Stylized text recognition & distillation guide


2002 September -- ejk/UF


ABCD
EFGH:
IJKL
MNOP
QjRST
UVWX
YZ
- -ZJ'


ABCD1
EFGH'
IJKL
MNOP,
Q RST
!UVWX
YZ








DISTILLATION: OCR
Column & Line Density


Primary Purpose:
* Demonstrate ability to distill compact text

Test-Set Characterization:
* Pre-1900 newspapers
* Advertisement pages
* Pages predominantly 8 pt. type or less
* Pages with less than 1 mm space between lines
* Pages with characters spaced at or below � mm


2002 September -- ejk/UF


ABCD
EFGH:
IJKL
MNOP
QjRST
VUVWX
YZ
- - ZJ'


ABCD1
EFGH'
IJKL
M NOP,
Q RST
_UVWX
YZ








DISTILLATION: OCR
Relative Density/Contrast


Primary Purpose:
* Investigate low and uneven contrast materials

Test-Set Characterization:
* Low contrast pages
* Pages with low contrast zones
* Printing, Filming, & Age/Storage Defects

Additional Products:
* Best practices
* Accuracy : Cost - Matrix
* Don't forget to buy the Life Insurance


2002 September -- ejk/UF


ABCD
EFGH:
IJKL
MNOP
QjRST
UVWX
YZ


ABCD1
EFGH'
:MNOP,
Q RST
_UVWX
YZ


E








DISTILLATION:


OCR


Text Curvature and Other Defects


Primary Purpose:
* Benchmark current capability to distill curved text
& other defects of printing or filming

Test-Set Characterization:
* Curved text zones
* Broken character zones
* Broken line zones
* Garbage elements (stains, etc.)

Additional Products:
* (Additional automatic image correction processes)


2002 September -- ejk/UF


ABCD
EFGH:
IJK L
MNOP
QjRST
UVWX
YZ
- -ZJ'


ABCD1
EFGH'
IJKL
M NOP
Q RST
!UVWX
YZ




  Home | About dLOC | Collections | Governance | Digitization | Outreach | Contact  
  Powered by SobekCM
Acceptable Use, Copyright, and Disclaimer Statement  
© All rights reserved