Saving our scholarship : retrospective dissertation scanning project at George A. Smathers Library

MISSING IMAGE

Material Information

Title:
Saving our scholarship : retrospective dissertation scanning project at George A. Smathers Library
Physical Description:
Poster
Creator:
Parker, Robert J.
Shorey, Christy
Publisher:
George A. Smathers Libraries, University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Notes

General Note:
Poster presented at American Library Association Annual Conference & Exhibition, June 23-28, 2011 in New Orleans, LA.

Record Information

Source Institution:
University of Florida Institutional Repository
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution and holding location.
System ID:
AA00002872:00001


This item is only available as the following downloads:


Full Text


Saving


Retrospective Dissertation Scanning Proj<

Presented by Robert J. Parker a


Brief Summary


We are retrospectively digitizing PhD Dissertations from print copies. We only digitize Master
Theses upon patron or author request.
We started scanning internally, in our Digital Library Center, but moved to Internet Archive due
to scope of project (-8,000 titles), time frame, and funding opportunity.
We started with two pilot projects:
1. The Electrical Engineering department donated 300+ unbound dissertations, which we
used as pilot for the overall project to test workflows, etc.
2. We subsequently conducted a pilot project with Internet Archive. Administration gave us
$7,000 to test their capabilities and output before committing to a larger contract.
We built workflows conceptually, working with other departments, to figure out policies and
procedures. From there we built our tracking database, using title information from the catalog and
alumni contact information contributed by the Alumni Association. Once the titles and contact
information were matched, we performed time trials, and refined the workflow.
We enlisted a full-time staff member to build a tracking database, while a summer intern tested
the procedures. Once the project went to full swing, the full-time staff member supervised student
assistants and contributed time to the project. We started with one student assistant, and grew to
three students assistants (for a total of 58 hours/ week) at the project's largest point.
We seek permission from all our authors. We attempt to contact them via e-mail, snail mail, and
postcards. The permission form is available on our website, and we had an article in a UF alumni
publication advertising our project. Our initial focus to gain permission is on the "low-hanging fruit";
issues of authors with no contact information will be addressed later in the project.
In November 2010 we added dissertations housed in our Health Science Center library (772
titles) to the project. The next group we will consider are the 3,179 titles that were previously
digitized (by UMI) from microfilm.
We have encountered some bumps along the way, but the project is continuing forward.


ect at Geori

nd Christy Shorey


or

sig
ex

mc

no
ad
alt
do
an


How Much Does It Cost?


Pre-scanning Steps:


Scanning:


14.93
12.14
2.96
30.02


Post-scanning Steps:
Total Per Volume:


Digital Library Center:
Unique number assigned in DLC database
High speed, sheet feed scanners, 300 DPI bitonal
PDF version generated automatically
Quality control (de-skew, brighten, TOC, etc.)
OCR (Prime Recognition)


Pre-scanning Steps:


Scanning:


Post-scanning Steps:
Total Per Volume:


10.94
19.81
2.44


$ 33.18


* Vendor figures rely mostly on student-staff for pre- and post-scanning
procedures; this was a shift from full-time staff for some steps when
scanning was done in-house.
* Average dissertation: 185 pages; Average shipment to IA: 50 books
* "Scanning" with vendor includes shipping costs and a rate of 10/page
and $2/foldout


Internet Archive:


ii *Unique identifier assigned by Internet Archive
I(:ii *High speed, sheet feed scanners
DPI varies by size of print volume (300 minimum)
Read Online (-257 pg0 Bitonal / gray scale / color
PDF (9.1 M)
BtPDF (5.7M) OCR: ABBYY FineReader 8.0
EPUB(13.I<
Kindle (-2a57 pg)at
Daisy (-2570pg Provides a variety of versions of each dissertation
FullText (378.1 Kqo
DiVu (2.4 M) 0 Conducts quality control on-site- UF staff also completes quality
All Fies: HTTP control for completeness
Help reading g texds



Tracking Database

When we looked at all the steps an individual dissertation would go through in our projected
workflow starting with our attempts to contact the authors to gather permission through the
digitization process and finishing with a final thank you sent to the authors for their participation and
to provide them a URL we determined we needed a way to accurately track where a title was in
the process. In order to accommodate this, and to consolidate relevant information on each
rtiq qrkitinn n^I hi lilt a trnrkinn dtfhaq in Mirrnqnft ArrPqq


One-time costs for building tracking database:
SAnnual cost to maintain tracking database:


Statistics

April 2008 June 1 2011:

Permission Forms sent

Permission Forms received
0 58.13% return rate


* Dissertations Scanned


$ 6,560.64


250


6,793

3,949


303


2,917


SScanned in-house
9.4% of total scanned

* Scanned via Internet Archive


Our Scholarship:


Stakeholders
Primary
Preservation does primary work: tracking, preparing, shipping, etc.
Digital Library Center imports copies from Internet Archive, posts to IR
Internet Archive external vendor who digitizes works, posts on site
Alumni Association provides contact information for PhD graduates
Secondary
Storage pulls copies from off-site facility, sends to Preservation
Selectors make decision to return or dispose print originals
Cataloging handles metadata, created Provider Neutral E-Monograph record template
Tertiary
UF Legal counsel contacted regarding copyright and fair use
Fiscal Office processes invoices and purchase orders for payment
Systems created macros to update records and create packing lists
Acquisitions orders replacements of missing issues from ProQuest
Development Office sends letters to participants to fund endowment


Scan Quality


pri
CO
ad

ca
Int
im


I

Fo

CO
col
an

Fo

inf
thE
(hi

fro


In House Scanning with Vendor
(Digital Library Center) I (Internet Archive) I


View the book


I


I




Full Text

PAGE 1

Brief Summary We are retrospectively digitizing PhD Dissertations from print copies. We only digitize Master Theses upon patron or author request. We started scanning internally, in our Digital Library Center, but moved to Internet Archive due to scope of project (~8,000 titles), time frame, and funding opportunity. We started with two pilot projects: 1. The Electrical Engineering department donated 300+ unbound dissertations, which we used as pilot for the overall project to test workflows, etc. 2. We subsequently conducted a pilot project with Internet Archive. Administration gave us $7,000 to test their capabilities and output before committing to a larger contract. We built workflows conceptually, working with other departments, to figure out policies and procedures. From there we built our tracking database, using title information from the catalog and alumni contact information contributed by the Alumni Association. Once the titles and contact information were matched, we performed time trials, and refined the workflow. We enlisted a full time staff member to build a tracking database, while a summer intern tested the procedures. Once the project went to full swing, the full time staff member supervised student assistants and contributed time to the project. We started with one student assistant, and grew to We seek permission from all our authors. We attempt to contact them via e mail, snail mail, and postcards. The permission form is available on our website, and we had an article in a UF alumni issues of authors with no contact information will be addressed later in the project. In November 2010 we added dissertations housed in our Health Science Center library (772 titles) to the project. The next group we will consider are the 3,179 titles that were previously digitized (by UMI) from microfilm. We have encountered some bumps along the way, but the project is continuing forward. Saving Our Scholarship: Retrospective Dissertation Scanning Project at George A. Smathers Library Presented by Robert J. Parker and Christy Shorey Statistics April 2008 June 1 2011 : Permission Forms sent 6,793 Permission Forms received 3,949 58.13% return rate Dissertations Scanned Scanned in house 303 9.4% of total scanned Scanned via Internet Archive 2,917 90.59% of total scanned Total Scanned 3,220 81.54% of total Permission Forms received Stakeholders Primary Preservation does primary work: tracking, preparing, shipping, etc. Digital Library Center imports copies from Internet Archive, posts to IR Internet Archive external vendor who digitizes works, posts on site Alumni Association provides contact information for PhD graduates Secondary Storage pulls copies from off site facility, sends to Preservation Selectors make decision to return or dispose print originals Cataloging handles metadata, created Provider Neutral E Monograph record template Tertiary UF Legal counsel contacted regarding copyright and fair use Fiscal Office processes invoices and purchase orders for payment Systems created macros to update records and create packing lists Acquisitions orders replacements of missing issues from ProQuest Development Office sends letters to participants to fund endowment Digital Library Center: Unique number assigned in DLC database High speed, sheet feed scanners, 300 DPI bitonal PDF version generated automatically Quality control (de skew, brighten, TOC, etc.) OCR (Prime Recognition) Internet Archive: Unique identifier assigned by Internet Archive High speed, sheet feed scanners DPI varies by size of print volume (300 minimum) Bitonal / gray scale / color OCR: ABBYY FineReader 8.0 Provides a variety of versions of each dissertation Conducts quality control on site UF staff also completes quality control for completeness Scan Quality When we looked at all the steps an individual dissertation would go through in our projected workflow starting with our attempts to contact the authors to gather permission through the digitization process and finishing with a final thank you sent to the authors for their participation and to provide them a URL we determined we needed a way to accurately track where a title was in the process. In order to accommodate this, and to consolidate relevant information on each dissertation, we built a tracking database in Microsoft Access. The database is versatile, and allows us to get information from quick overview of statistics to detailed information on a single title. With a few simple selections, using queries and the mail merge feature in Microsoft Word, the database also allows for batch communication in the form of personalized e mails or printed letters. Built over an eight week period at the start of the program, the database has evolved, and additional fields have been added as new information is gathered. Tracking Database We decided to seek permission from all our dissertation authors. We compared a list of dissertations from the catalog to a list of PhD graduates provided by the Alumni Association and matched on 3 points: First Name, Last Name, Year of Graduation/Publication. We sent permission forms by e mail, first, where address was available. We sent forms by snail mail to domestic addresses for those with no e mail address, or who did not reply to the e mail. Where author was deceased, we accepted signature / permission from next of kin and executors of estate. Once we have a signature, the dissertation moves to the next step in the process. There are a number of authors for whom we do not have contact information. These will be addressed in the next wave. We are looking at alternate means to find contact information, public domain considerations (published before 1978) and fair use considerations. Permission Forms Our policy is to keep one physical copy in University Archive (non circulating collection). If the print copy used for digitization was a second copy, it is withdrawn and discarded. If it was the only copy, it is either rebound, placed in a box, or the cover is secured with zip ties, then the item is added to the archive collection. catalog displays entries for the print copy, the IR copy, and a link to the digital copy hosted at the Internet Archive site. A Provider Neutral E Monograph Record is created for each title in OCLC, and imported to our catalog. Preservation For Digital Library Center: Staff work with our Cataloging department to ensure the catalog records are correct and complete. Staff then creates electronic record based on this record, and adds digital item to catalog and OCLC. For Internet Archive: Metadata was originally sent to Internet Archive via the Z39.50 protocol. The vendor pulled information from our MARC BIB record based on a crosswalk conversion with Dublin Core using ( http://www.loc.gov/standards/marcxml ) We transitioned to send metadata via a spreadsheet created by a Macro that pulls information from fields in our MARC BIB record. This allows us to: Remove initial articles and capitalize the first word Render accented characters and symbols correctly Provide URL to catalog record in lieu of description Include copyright permission message Catalog Records: Staff updates the internal catalog record to reflect changes in holdings. They create a Provider Neutral E Monograph record in OCLC for the digital version, then import this record to our catalog and add a holding record for the electronic item Metadata In House (Digital Library Center) Scanning with Vendor (Internet Archive) Pre scanning Steps: $ 14.93 Pre scanning Steps: $ 10.94 Scanning: $ 12.14 Scanning: $ 19.81 Post scanning Steps: $ 2.96 Post scanning Steps: $ 2.44 Total Per Volume: $ 30.02 Total Per Volume: $ 33.18 Vendor figures rely mostly on student staff for pre and post scanning procedures; this was a shift from full time staff for some steps when scanning was done in house. Average dissertation: 185 pages; Average shipment to IA: 50 books /page and $2/foldout One time costs for building tracking database: $ 6,560.64 Annual cost to maintain tracking database: $ 250 How Much Does It Cost?


ERROR CAUGHT WHILE SAVING NEW DIGITAL RESOURCE
6/13/2011 12:01:16 PM

Error while executing stored procedure 'SobekCM_Save_Item'.
at SobekCM.Bib_Package.Database.SobekCM_Database.exception_caught(String stored_procedure_name, Exception exception)
at SobekCM.Bib_Package.Database.SobekCM_Database.Save_New_Digital_Resource(SobekCM_Item thisPackage, Boolean textFlag, Boolean online_submit, String username, String usernotes, Int32 userid)
at SobekCM.Library.MySobekViewer.New_Group_And_Item_MySobekViewer.complete_item_submission(SobekCM_Item item, SobekCM_Navigation_Object currentMode, User_Object user, Custom_Tracer Tracer) in C:\inetpub\wwwroot\UFDC Web\SobekCM_Library\MySobekViewer\New_Group_And_Item_MySobekViewer.cs:line 665