Archived Documentation for the UF Digital Library Center (approx. 2012): Average Times for Digitization Activities; Aver...

MISSING IMAGE

Material Information

Title:
Archived Documentation for the UF Digital Library Center (approx. 2012): Average Times for Digitization Activities; Average File Sizes; and Project Planning Resources
Physical Description:
Documentation
Language:
English
Creator:
Taylor, Laurie N.
Publisher:
George A. Smathers Libraries, University of Florida
Place of Publication:
Gainesville, FL
Publication Date:

Notes

Abstract:
Documentation from the Digital Library Center at the University of Florida from approx. 2012 and archived here for research and reference purposes, including the cost estimate draft spreadsheet and other materials with historical information. Cost template spreadsheet for estimating costs. Includes costs for cataloging, conservation, digitization locally, and curation and ingest for vendor and externally digitized.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:

The author dedicated the work to the Commons by waiving all of his or her rights to the work worldwide under copyright law and all related or neighboring legal rights he or she had in the work, to the extent allowable by law.
System ID:
AA00016038:00002

Full Text

PAGE 1

Archived Documentation for the UF Digital Library Center ( from approx. 2012) : Average Times for Digitization Activities ; Average File Sizes; and Project P lanning Resources Digitization Activities and Average Times Below is a list of the component activities in digitization offered by the DLC with estimates of average times per component. All digitization complies with national standards. See the average file sizes and project planning pages for more resources for planning projects. Estimating pages: "A cubic foot of records comprises about 2,000 pages." ( http://www.archives.gov/foia/ufos.html ). The average archive box is 5 inches. Calculating Costs In consultation with the DLC, use this spreadsheet to calculate costs Labor: unless otherwise specified, labor is calculated for the salary and benefits of a Library Associate 2. (Current fringe rates are linked here ) Overhead: when applicable, added automa tically on the workbook as shown on the "Totals_All" sheet (and can be removed as applicable) Server Costs: server costs are calculated per annual web and archival costs. Abstracted, simplified chart, assuming other supports in place, based on the examp le projects detailed below: Bound books 7 hours Disbound books 2 hours archival/photos 11 pages / hour large format 2.5 hours born digital 50 pages / hour print newspapers 40 pages / hour vended digitization, newspapers on microfilm, NDNP compliant 210 pages / hour

PAGE 2

vended digitization, newspapers on microfilm, non compliant 29 pages / hour Oral history files, 30 minutes; born digital; with PDF transcript 1 set (audio and PDF) / hour Bound book: assumes average of 200 pages; however, cost is based on volume; average is 4 11 hours, assuming average of 7 hours Disbound book: cost is based on item size; dissertations, theses; anything of at least 200 page s that can go through the high speed scanner Archival/photographs: all print photographs that are not oversized; aerials, regular photographs; manuscripts and archival materials where the physical collections have already been processed Large format: times are similar for A/V items Print newspapers: for broadsheet newspapers that must be cut Born digital: includes ingest of vendor materials; harvest and processing UF serials; FTP receipt/harvest and processing newspapers; partner CD/DVDs Other formats: other formats may require specialized staff skills and should be estimated based on actual materials. For instance, materials in the round require skilled staff time for set up, capture, and post processing (a minimum of 8 hours for hat sized and smaller o bjects) with additional time required for travel and set up for imaging conducted off site and for larger objects. Material Type by Equipment and File Size Material Type Unit Equipment File Size in TB Total Hours Bound books 1 book CopiBook 0.00579357147 (20.25MB/page as average of rgb/bw; 300 pages) 7 Disbound books 1 book HighSpeed 0.00579357147 2 archival/photos 1 page Flatbed 0.0000193119049072265625 0.09 large format 1 page large format 0.0002384185791015625 2.5 print newspapers 1 page CopiBook 0.00006103515625 0.025 vended digitization, newspapers on microfilm, NDNP compliant 1 reel (1,000 pages) workstation only 0.00006103515625 5 vended digitization, newspapers on microfilm, non compliant 1 reel (1,000 pages) workstation only 0.00006103515625 34.5 Example Projects: Bound Books: Baldwin Library of Historical Children's Literature (NEH Grant, Phase III) Overview Details Catalog records created by Cataloging DLC handled digitizati on (imaging, image processing, QC with structural metadata, OCR, loading, and archiving) for 2,500 books over 2 years, or 1,250 books per year. For each of the two grant years, dedicated staff

PAGE 3

Copyright status already known to be public domain Physical material prep. and post proc. by Preservation DLC digitization total average time for a 200 page book: 4 1/6 11 hours time for cost share in the DLC: 2.15 FTE Total of 2,500 volumes or 500,000 pages over a two year period 1,250 volumes per year 6,000 pages/week; approximately 30 volumes of 200 pages each Scanning & Initial image processing (deskew, crop) Kodak DCS 24n megapixel DSLR camera: 3 min/page x 200 pg = 600 min/60 = 10 hrs/volume Copibook scanner: .60 min/page x 200 pg = 3 1/3 hr/volume Flatbed scanners: 3 min/page x 200 pg = 600 min/60= 10 hrs/volume Pre processing, QC and preliminary XML creation (derive jpgs from master tiff images, create table of content images t o use in XML creation, check for missing and/or unacceptable images, assign page numbers, division names, and chapter titles). From numbers recorded in previous two phases, approximately of the volumes imaged have no errors necessitating rescanning; of the volumes have errors 40 min/volume for imaged volumes with no errors 60 min/volume for imaged volumes with errors Mark up (metadata review and revision; text review): 10 min/volume The full grant proposal is online here Aerial Photographs: Florida Aerial Grant (LSTA Grant: Phase III) Overview Details Metadata and material prep and post proc.: Map Library DLC digitization: 1,390 hours for 13,418 imag es: 9.6 photos/pages per hour Plus: DLC cost share of .23 for one year for ingest of another 7,473 already digital images, and training and supervising students Digitize 13,418 historical aerial photographs and 120 paper indexes Incorporate 7,473 aerial photographs from FDOT In total, link 21,417 aerial photos to georectified images OPS Scanning: 1,125 hours (Five scanning technicians for 15 hrs/week for 15 weeks each) OPS Metadata/quality control student: 225 hours (13,418 images @ 60 images/hr) OPS Digital camera operator: 40 hours

PAGE 4

(120 paper index images @ 3 paper index images/hr) DLC cost share of .23: for ingest of other 7,473 images, system upgrades, and training and supervising students Large Format Architectural Drawings/Photographs: Flagler Architectural Drawings (NPS Grant proposed) Overview Details 267 architectural drawings/ blueprints OPS time: 654 hours Average pages per hour, without factoring in cost share time: 0.40 pag es per hour Plus: DLC cost share, years 1 and 2 267 architecture drawings, blueprints and related material OPS time: 654 hours DLC cost share, year one: .10 DLC cost share, year two: .18 Archival and Mixed Materials: Historic Everglades (NHPRC Grant) Overview Details Spreadsheets for metadata by Special Collections Material prep. and post proc. by Preservation Digitization by DLC: Monthly averages 3,216 pages/260 hours = 12.369 pages per hour Plus: DLC cost share, .30FTE for each of the thre e years DLC cost share: .30 FTE for each of the three years 99,690 pages (90,400 pages; 9,040 letterbook pages; and 250 photo prints/ negatives) in 31 months; average of 3,216 pages per month Overall average of 3,216 pages per month; actual production per month will vary for the letterbooks and photos, which are more time consuming OPS: 1.25 FTE for each of the three years; or 60 hours per week for 31 months; 4.33 weeks per month, or 260 hours per month Monthly: 3,216 pages/260 hours = 12.369 pages per hour Based on experience with test sets, we're building in a 10% reshoot rate for pages, 15% reshoot for letterbooks, and 15% for photos. Adjusted estimates are: 99,440 pages 10,396 letterbook p ages 288 photographic materials. This estimate assumes use of CopiBook scanner with white sheet

PAGE 5

backing for letterbooks, and, use of flatbed scanners for all photographic materials and other pages. Some individual sheets may withstand sheet feed scanner based on experience with similar collections, but we will not count on it. All pages images will be 300 dpi color (24 bit) images. All photographic materials will be 600 dpi grey scale (8 bit) images. Student labor, no staff costs, archival pages: $0.2 5/page scanning + $0.25/page image correction/QC + $0.03/page mounting/archiving + $0.01/page media + $0.02/data logging each file Subtotal: $0.56/page + $0.06 (10% error correction) Each page unit = $0.62/page Student labor, no staff costs, ph otographic materials: $0.40/page scanning + $0.25/page image correction/QC + $0.03/page mounting/archiving + $0.01/page media + $0.29/data logging each file Subtotal: $0.98/item + $0.09 (10% error correction) Total each photo unit = $1.07/im age Time Requirements by Workflow Component Digitization Workflow Category Type of Process for the Workflow Category Processing Required Average Time Requirements Metadata Catalog record available DLC evaluates existing record, ingests, and massages records as needed. Average time: 1 5 minutes per item Spreadsheet available and accurate DLC reviews, enhances, imports, and verifies. Average time: 40 minutes 2 hours per spreadsheet; average spreadsheet has 200 items Longer for extensive

PAGE 6

spreadsheets or those with new mappings or categories. Note: this is only for the import process. The DLC trains others on what information is needed and assists in creating spreadsheet until the creator is comfortable doing so alone. Spreadsheet available, but incomplete or inaccurate Example: a Word file with a table with a single line listing titles, authors, and dates without any consistent separation (n o columns, tabs, or commas that can be used to create tabular data). DLC finds a way to separate the rows into tabular data if possible, or copies and pastes all information into a spreadsheet in the correct format. Then, DLC sends the spreadsheet to the s elector with any recommendations for added fields and asks for feedback. Average time: 1 minute per item to create the spreadsheet item Additional time required : 40 minutes 2 hours for the completed spreadsheet No catalog record, spreadsheet, inventory, finding aid, etc. & Materials can be determined. Example: a box of only books with no other information. DLC reviews materials, sorting and creating metadata as possible. DLC offers training for future spreadsheet and metadata creation. OR Average time: 10 minutes per item. Additional time required : 40 minutes 2 hours for the completed spreadsheet OR Average time: one or more 1 hou r meetings + Cataloging

PAGE 7

For items needing actual catalog records in a traditional format, DLC sets a meeting with Cataloging and together they es tablish a workflow to have the items cataloged in Cataloging and then returned to the DLC for digitization. time to catalog materials. No catalog record, spreadsheet, inventory, finding aid, etc. & Materials cannot be determined. DLC reviews materials, sorting and creating metadata as possible. After sorting and review, DLC staff create a brief spreadsheet. If a Collection Manager is available, DLC staff send the spreadsheet and ask ask the Collection Manager for feedback. If no Co llection Manager is available, DLC staff attempt to work using the newly created spreadsheet. Average time: varies and can only be determined on a case by case basis Copyright Permissions cleared Permissions status clearly documented and provided when physical materials received. Average time: 0 1 minute to check documentation in files and update if needed. Officially Published in US pre 1923, Clear Public Domain Information is available in a published document. No requirements to consult documentation on length of copyright by year or country; no requirements to consult book copyright renewal database. Average time: 1 minute to read and verify information to veri fy status as cleared. Archival, permissions status communicated after inquiry Average time: 1 3 minutes to call or email to check and update documentation. Permissions not cleared, but permissions status and the need for DARK archiving clearly documented and provided when physical materials received; Dark Archiving, if identified as such, requires no additional res earch. Average time: 0 1 minute to check documentation in files and update if needed.

PAGE 8

Or Permissions status easy to ascertain Permissions not cleared, but wanted and permissions status clearly documented and provided when physical materials received; Or Permissions status easy to as certain Requesting permissions Average time: 20 minutes Average process includes checking all pertinent copyright rules, searching for copyright holder, sending permissions request to copyright holder; updating documentation in files that permissions r equest was sent and noting the information found on the copyright holder. When applicable, scheduling for follow up inquiry. Note : Some materials are significant enough for the allocation of additional resources for pursuing permissions. Those are a case by case basis and normally require at least 2 hours. At least 30 minutes of this time is normally in meetings with collection managers where the necessary background is communicated on how to possibly locate the rights holder and why the particular materi als are significant. Unclear Copyright Status, Holder, etc. Copyright research, and requesting permissions. Average time: 10 20 minutes for copyright research Copyright research consists of searching for information on the materials and copyright holder. If information can't be located quickly, the item is deferred unless it warrants additional resources. Additional average time required: 20 minutes to request p ermissions. Only

PAGE 9

required if copyright holder is located. Material Preparation Disbinding a book Also includes any clean up of physical materials, placing in folders and boxes that are labeled and placing those on appropriate book trucks or shelves to be reviewed for appropriate imaging technology Average time for disbinding a book: 8 minutes per bo ok Cutting newspaper pages (normal newspaper size*) Includes placing in boxes that are labeled and/or placing those on appropriate book trucks or shelves to be queued for imaging *Some newspapers (i.e.; Iguana ; Justice ) are 8 1/2 X 11 and are cut u sing a paper cutter, and then go through the high speed scanner. Average time: 20 minutes per inch of newspaper One month of newspapers from August 2008, with no born digital titles, is 16 inches. One month of newspapers from October 2009, with 37 newspa pers born digital (total of 72 newspaper titles in the Florida newspaper queue), would be under 1/2 of this or under 8 inches. Preparing archival files Sorting, separating, unfolding, flattening, removing staples, paperclips, debris, etc. Average time: varies and can only be determined on a case by case basis Collating, de duping Breakdown Separating out / checking title: 5 sec/title Collating for input into tracking: 2 secs for monthly, 30 secs for daily Inputting into tracking (cal ling up tracking, inputting, printing tracking sheet, placing on Average time: for new and non organized or inventoried collections, varies and can only be determined on a case by case basis Average time for collating newspapers: 47 seconds for one month of a monthly newspaper; 3:45 for one month of a daily newspaper

PAGE 10

shelf, record in xls for physical tracking): 40 secs for monthly and 3:10 for daily Average time for de duping: varies, but close to collation time after initial physical material ingest, inve ntory, and review; duplicates do add an additional time component if they cannot be discarded or returned and must be arranged and kept for an unknown length of time Imaging: Physical Materials Books Disbound, and can go through the highspeed scanner Average time: 10 15 minutes for 300 normal pages (300dpi grayscale, time increases if many color pages) Average time, brittle: 45 60 minutes for 300 pages *Time level varies if the scanner has to be c leaned. Brittle pages must be scanned at a slower rate to help prevent rips and jamming. Books Bound, average book Average time, scanned on a copibook: 90 110 minutes for 300 normal pages (no foldouts, tip ins or oversized pages) Average time, if oversized: use times listed for maps and oversized items below Average time with processing: See the post

PAGE 11

processing for images section for books for a more accurate assessment of the time for scanning and image processing for a single item Processing time required is directly related to the imaging technology, so it will vary based on the scanning equipment used. Maps and Oversize Items One full capture using the large format camera, not multiple captures and splicing (as is require d for many oversize materials) Average time: 15 minutes for a single capture and processing Average time for multiple captures and splicing: 30 minutes for two captures (includes processing and splicing), 10 minutes for each additional capture (e.g.; 3 captures=40minutes; 4 captures=50 minutes) Photos, Loose Photos, loose and not oversized, are scanned on the flatbed scanners at 600 dpi Average time: 1 3 minutes to scan per photo Photos, Mounted (scrapbook, etc) Photos, Mounted (scrapbook, etc) Average time: 45 60 min. for 75pgs Photos, Aerials Average time estimate for scanning and image quality control is based on three successful Florida aerials grants. Average time: 9.6 photos/pages per hour Slides, 35mm Color slides are scanned at 4000dpi and with the bulk loader, to scan 24 per hour Time increases for older, non plastic mounted slides because they tend to jam the slide scanner. Average time: color slides 4000dpi 24 per 60 min. to scan

PAGE 12

4x5 color transparencies 4x5 color transparencies 600dpi Average time: 3 min. per transparency to scan only Slides, Glass Scanning only: 4x5 600dpi 3 min. 4x5 900dpi 3.5 min. Average time: 3 3.5 minutes each Archival materials Average times for archival materials vary widely because of: special handling needs and average length. If all of the pages are for the same item and can be handled the same way, the overall time is reduced and overhead from switching to a new item and lab eling it is reduced as well. Average time, scanned on a copibook: 90 110 minutes for 300 normal pages (no foldouts, tip ins or oversized pages; no need for backing; all pages are for the same item) Newspapers: Current Average time per page in color: 30 sec Average per page in black and white: 15 sec Newspapers: Bound Additional time depends on: the gutter; whether the paper can be captured 1 up or 2 up; turning odd and even pages; whether a glass plate is required to flatten the pages Average time per page: at least 3x more than for unbound newspapers Newspapers: Brittle (requiring large format camera) Average time: at least 3x more than for normal unbound newspapers, can be even higher Object, Flat Using DSLR camera Average time: set up time can be several hours for a single shot; set up is the largest time component Object, Rotation Using DSLR camera connected to turntable in DLC. Additional time is required for equipment packing, traveling to location, setup, and Average time: set up time can be several hours for a single item for 126 images; set up is the largest time component

PAGE 13

repacking and returning. Digital Reformatting / Digital Conversion from Analog Audio: Record Cassette tape Reel to Reel tape Video: VHS Record Average time: Actual digitization time equivalent to length of audio or video file. Thus, 1 hour of audio takes 1 hour to digitize. Set up time is in addition to this; however, estimate includes set up time within t he actual time required because of variances from the degree of supervision needed for the digitization process. Digitization time may or may not need direct supervision at all times. If it needs to be supervised or not impacts how much other work can be done simultaneously, Other work is most often image post processing. Digital File Ingest Imaging Ingest: Retro files Files on CDs, DVDs, portable hard drives, SAN Average time: varies Example: ingesting the 94 issues (for v. 1 18) burned to disk for FLMNH bulletins required over 12 staff hours. Time required was to work with the disks (two had cyclic redundancy errors), normalize the file quality, qc the files and notice that pages were missing, locate the missing pages or rescan, reprocess, and then OCR and load. Imaging Ingest: Born Digital IR Materials Variables include server space available, number of items, size of each item, format of each item (PDF, HTML, AVI, AVI streaming which n eeds to be ripped or which requires contacting AT for copies) Average time for 1 volume, new item: 4 min. Time includes entering item into tracking, downloading item, exporting/converting to TIFF Additional volume for serial item already in tracking: 1 4

PAGE 14

minutes Average time for new groups of materials: varies based on number and type of items Example: HPC documents found, required 3 days Example: Hard drive from Harn museum for publications and newsletters, required 5 days Imaging Ingest: Born Digital Newspapers 1 3 minutes per issue covers time to check spreadsheet, add item to tracking with brief data, match new BIBVID to vendor naming structure, and bulk rename, checking data while doing so. Average time: 1 3 minutes per issue, if from a hard drive, not tarred or zipped, do not have errors, and have some human readable title and date identification (in the file name itself, in a spreadsheet or xml file) Average time if on CD/DVD: 5 15 minutes per item.* *Includes time to copy files from CD/DVD to a hard drive. Also includes time to recheck copy process because the CD/DVDs have a much higher likelihood of errors. Imaging Ingest: Vendor Files Variables are: File identification and usable structure for batch renamin g; Files on drives or decaying Average time: varies Examples: Vendor digitization of newspapers on microfilm, when digitized to NDNP standards, has an average of 9 hours per reel o r 210 pages per hour.

PAGE 15

disks; Files tarred and zipped have greater frequency of integrity errors; Vendor digitization of newspapers on microfilm, when not done to NDNP standards, tends to average a minimum of 70 hours per reel or 29 pages per hour. Post processing for digital files Splitting pages Required for bulk digitization from microfilm scanned with 2 pages per image. Average time: varies. Splitting separate items For digitized microfilm, partner files, and retro ingests not separated into items Average reel has 5 10 items. Time required depends on the quality of the film and the accuracy/inclusion of description images (i.e. targets that say the item title & reel position). Average time: 15 30 minutes to split a reel of digitized microfilm int o items Scan, crop, deskew, levels for Baldwin Books Scanning & Initial image processing (deskew, crop) Kodak DCS 24n megapixel DSLR camera: 3 min/page x 200 pg = 600 min/60 = 10 hrs/volume Copibook scanner: .60 min/page x 200 pg = 3 1/3 hr/volume Flatbed scanners: 3 min/page x 200 pg = 600 min/60= 10 hrs/volume Average time if DSLR camera: 3 min/page Average time if scanned with Copibook scanner: .60 min/page Average time if scanned on a flatbed scanner: 3 min/page *Please note: in most cases, the times for scanning and image processing are inseparable because the imaging technology used does alter the amount of image processing (deskewing,

PAGE 16

cropping) required. Crop, deskew, levels, color correction disbound volumes Average time for disbound volumes : 60 90 min for 200 pgs Batching and Copyright blur Time depends on the amount of material in copyright. Okeechobee News normally requires 1 minute; Miami Times normally requires 30 minutes Avera ge time: 1 30 minutes Quality Control Review and Structural Metadata Creation Brief items Short research items (under ~40 pages) where a table of contents is very unlikely to be used and wouldn't prove of much benefit only have pagination and quality review during QC; no table of contents style metadata is added Average time: 1 3 minutes per i tem (item is normally under 40 pages) if no errors IR Average time: 1 3 minutes per item (item is normally under 40 pages) if no errors Newspapers Sections (A, B, C) and page numbers added, final quality review of item Average time: 1 3 minutes per item (item is normally under 40 pages) if no errors Books, Complex Average time estimate for QC alone is based on the Baldwin Phase III grant time requirements. Average time: 40 60 min/volume (average of 40 for volumes with no errors and 60 for volumes with errors) Photos, Aerials Average time estimate for scanning and image quality control is based on two successfully completed grants for Florida aerials. Average tim e for scanning and image quality control : 9.6 photos/pages per hour OCR; Loading; Archiving to FDA and Internally OCR Average time: OCR runs constantly against available materials. Average labor time is 15 20 minutes per day for all materials to be processed that day. Time is to check process, refine any jobs as needed, and correct any errors.

PAGE 17

Archiving to FDA FTP and loading drives and mailing (forms, error correction, ingest of reports) Average time: 3 hours to set up external hard drive for file transfer and start file transfer (10 minutes), transfer files (varies based on size of all files being transferred; done on a sepa rate machine and does not interfere with other work), and then drive to drop off the drives and drive time to return to work. Average drop off has been 10 hard drives in one trip. Goal is to have FDA catch up on backlog and be able to FTP daily work and ha ve that process easily without the need to use external hard drives. Archiving internally Required components of burning DVDs: 1. Labels: printed in batches of 100, 5 minutes to renumber and print: .33 seconds printing time for each label 2. Labeling each DVD: 10 sec 2. Burning: 7 8 minutes per DVD (4.4GB) 3. File sort: 20 seconds per DVD 4. Filing DVD: 10 seconds per DVD 5. Transferring files: moving files from the SAN to a local drive to burn locally and not across the network. Done o vernight to reduce time delay; otherwise can take 1 3 Average time required with burning DVDs: 9 minutes for each DVD (4.4GB)* *Note: DVD time is as though there's no overlap during burning. Normally 3 4 DVDs are being burned simultaneously. Thus, instead of 9 minutes each equally 36 minutes for 4, the time for 4 is closer to 15 minutes with the 9 base plus additional overhead and checking time. Average time expected with Tivoli automation: 0; time wo uld be replaced with 100%

PAGE 18

hours depending on drive availability and system time load verification Load and metadata verification Average time: 1 hour per day for brief validation using only file names and the m=han page; spot checking under 10% of load item s Post processing disposition & ongoing changes and corrections Returning physical materials, updating holdings records to discharge or withdraw item from DLC Example: all IFAS documents must be completed before they can be returned, and they must be properly ordered for all issues. This means that the DLC must store all completed items, keep them in order, and must only file newly completed items in the correct order. O nce all are done, only then will the holding records be updated in one large batch. Average time: varies on requirements for returning items. Material reclamation Pulling folders, relabeling boxes Average time: varies. Metadata updates, Manual Involves updating the metadata of one or more items. Single items are done manually and large projects (including serial hierarchy changes) employ combination of automated and manual methods. Average time: 10 minutes/title (manua l assignment) Serial Hierarchy Prior work required manual updates for each item (10 minutes X 100). With new tool, DLC staff can update serial hierarchy for batches of items. Tool is being refined for optimal performance. Average time: 10 minutes for 100 items As Abby Smith notes in the CLIR report on "Strategies for Building Digitized Collections": Reliable and meaningful cost data about digitization are rare and not often useful in comparative contexts. Costing out the elements of digitizing means beginning with selection and going to physical preparation, cataloging, physical capture, creation of m etadata, mounting and managing files, designing and maintaining the site, providing additional user services, and going through to implementing a long term preservation strategy. Virtually every step in digitization involves human intervention and skill, a nd these costs, unlike those of storage, for example, are unlikely to go down. ( Section 4; 2001)

PAGE 19

Estimated Times from Other Digital Library Centers Brown University Library's Estimated Timelines for Production ( online ) Estimated duration will be defined per project. Projects should be in the Digital Technologies production queue at least 4 months before they are requested to be comp lete. Submissions made without adequate planning time will be reviewed on a case by case basis by ITS representatives and approved only if there are resources available. Scans, images for the web, digital projects Flatbed scanning (10 14 items an hour) C amera Room o Book or Broadside with no problems -5 6 pages per hour o Problematic Book (closely bound, fragile, rare, custom support) -3 4 pages per hour Metadata -10 15 minutes per item Quality Control for color, cropping, artifacts, and metadata -tot al 5 minutes per item (reshoot if necessary) Create submission package Project management (coordination, communication) -20% of overall project effort Website development Production time is dependent on the scope of the project. Estimate for a minimum o f 4 6 weeks for production of a web project, an estimate which will fluctuate and may in fact be longer if the overall volume of work is high. Types of work ( Glossary ) Digital collection : framework to access materials which include digital scans (repository items) plus additional content including history, essays, keyword searching, genre searching Digital exhibit : smaller groups of materials which showcase selected items; scanned objects are presented with descriptions within a curated, narrative flow Web site : non collection driven Maintenance of collections, exhibits, and web sites : continued development that oc curs after the site officially launches. This activity is handled as a different project. Digitization : finite number of items to be scanned that are not related to a specific collection or exhibit Print materials : Designing promotional materials for Libra ry related activities only (i.e., brochures, calendars, posters, catalogs, bookmarks, signs) Application Development : software that improves efficiency for users and improve efficiency of internal processes Web Services Support : includes Josiah, and specia l systems support Technology projects : such as ABET Event coordination : including documentation and other promotional activities (OUL requests) Metadata creation : creation of MODS records for material; if MARC records already exist in Josiah, those may be converted to MODS; if no MARC records, MODS must be created with SR

PAGE 20

assistance; additional decisions to be made: should there be authority control of headings; should there be subject analysis; abstracts must be provided for materials Average File Sizes Online files are approximately 1/3 of the archival file size. Type Average Archival File Size Online, Derivatives (33%) photo 27 MB 8.91 MB archival page, color 27 MB 8.91 MB archival page, grayscale 13.5 MB 4.45 MB average archival 20.25 MB 6.68 MB newspaper page 64MB ( Texas Digital Newspaper Program ) 21.12MB microfilm reel (1,000 frames) 64,000 MB 21,334 MB Map/large format 250 MB 82.5 MB 30 minute oral history audio & PDF transcript (born digital; multiple derivatives; assumes 500MB audio and 15 page PDF at archival grayscale size) 705 MB (varies) 233 MB (varies) 30 minute video from DVD file (born digital; multiple derivatives; assumes maximum full DVD size of 4.4GB) 3.5 GB ( varies; assumes 1GB original file and alternate formats that add an additional 2.5X the size) varies widely (can even be 0 if streaming via YouTube or similar and full files available only on request) slide, 35mm (scanned at 4000dpi) 52 MB 17.16 MB Color transparency (scanned at 900dpi) 52 MB 17.16 MB Resources: MB to GB to TB converter Estimating pages: "A cubic foot of records comprises about 2,000 pages." ( http://www.archives.gov/foia/ufos.html ). The average archive box is 5 inches.

PAGE 21

Project Planning Resources for Library Faculty and Staff A number of variables impact the actual time it takes to complete a project; on average: One third of the effort will be project planning, preservation preparation, management, and oversight One third of the effort will be archival description and indexing One third of the effort will be the actual digitization ( cite ) New Projects Projects/Collections o Proposal template for n ew digital collections/projects o See the Smathers Libraries Copyright Policies One off and small requests with the digitize on demand queue o Online form to enter and track requests o Tracking spreadsheet Draft training for metadata for new projects Draft timeline for new projects For only one or two items, contact us to see if the digitize on demand process can meet the request Information for potential partners I nformation in this document originally available : http://digital.uflib.ufl.edu/technologies/documentation/average_times.htm http://digital.uflib.ufl.edu/technologies/documentation/average_filesizes.htm ht tp://digital.uflib.ufl.edu/technologies/projectplanning/