Features Desired in a Digital Library System (2011)
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00103112/00002
 Material Information
Title: Features Desired in a Digital Library System (2011)
Physical Description: Archival
Language: English
Creator: Digital Initiatives and Services Committee ( DISC )
Publisher: UF Libraries
Place of Publication: Gainesville, FL
Publication Date: 6/30/2011
General Note: Created by DISC for Common Digital Library platform 6/30/2011. History: Initial document, Features Desired in a Digital Library System to Replace FCLA'S Textual Collections and Visual Collections, created by the Florida Center for Library Automation (FCLA) and the Digital Library April 6, 2006 to evaluate DL systems, leading to the purchase of DigiTool in 2006. Revised and prepared for committee review and comment by G. Clement (FIU) and L. Taylor (UF), with additional editing by M. Sullivan (UF) and L. Dotson (UCF), April 30, 2009. Reviewed and approved by the State University Libraries’ Digital Initiatives Subcommittee (DISC), September 8, 2010.
 Record Information
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UF00103112:00002


This item has the following downloads:

( DOC )

( PDF )

Full Text


Created by DISC subgroup for Common Digital Library platform 6/30/2011.

History: Initial document, Features Desired in a Digital Library System to Replace FCLA'S
Textual Collections and Visual Collections, created by the Florida Center for Library Automation
(FCLA) and the Digital Library April 6, 2006 to evaluate DL systems, leading to the purchase of
DigiTool in 2006. Revised and prepared for committee review and comment by G. Clement (FIU)
and L. Taylor (UF), with additional editing by M. Sullivan (UF) and L. Dotson (UCF), April 30,
2009. Reviewed and approved by the State University Libraries' Digital Initiatives Subcommittee
(DISC), September 8, 2010.

Working Definitions

Bibliographic item: All the pieces that together form the basis for a single bibliographic
description. Can be a book, map, website etc. Bibliographic items can be simple or compound
objects. Even simple objects (a single photograph) will likely have multiple manifestations.

Simple object: a single file or set of related files with no hierarchical relationship between the
files; associated with a single descriptive metadata record.

Compound or complex object: a set of files with a hierarchical relationship, associated with a
single descriptive metadata record.

Manifestation: version of a given bibliographic item with a specific file format (e.g., PDF, JPEG
images, full-text file, etc.)

Collection: A named grouping of bibliographic items based on some common characteristic,
such as provenance or subject.

Curator: Somebody who can make changes to the content of a collection.

Administrator: Someone who can change parameters affecting a collection or multiple
collections. Administrator has privileges of the Curator by default.


A. Architecture

1. Architecture supports multi-site use.

2. User permissions:
(a) Architecture allows multiple levels of user permissions, which can be configured
based on collections, collection groups, or institutional units, for example.
(b) various levels of administrator and staff user permissions are available for
institution staff to change system settings and content.
(c) simple and secure user (non-administrator) account creation is available for
students and faculty to upload files and add metadata.

3. Architecture facilitates library staff in setting up collections and assigning or ingesting
items to collections.

4. System does not require users to have a static IP address.

5. There are no conventions that must be followed for naming directories or files, or the
conventions are documented, verified, and easy for library staff to follow or create,
and/or they are followed through an automated process as part of a tool or application.

6. Collections are logically, not physically, defined; they are easily created, deleted and
redefined by library staff. A bibliographic item can easily be added to a collection,
assigned to a new collection, allocated to multiple collections, or removed from a
collection by library staff.

7. The system can accommodate bidirectional connection between itself and other tools
- that is, if a user is directed to a page within the platform from an outside discovery tool,
the path back to that tool should be clear and automatic.

8. Text can be stored in Unicode and/or UTF-8.

9. Indexes:
(a) Indexes can be updated to include new or changed content without having to
reindex the entire database
(b) Indexing runs in the background (no downtime for using the system during
(c) New items can be indexed in real time so that they are available to the public

10. Collections can be created, populated, and viewed by authorized users while
remaining invisible to unauthorized users.

11. Customizations can be tested by library staff in a way that is invisible to unauthorized
users and that does not affect the rest of the system. Having a test function within the
system would satisfy this requirement, as would having a separate test instance of the

12. All content from the current PALMM Collections can be imported into the system
with no loss of information or functionality. All content in UFDC Sobek, USF Fedora,
UCF CONTENTdm, non-PALMM DigiTool, and other current SUS systems can be
imported into the system with no loss of information.

13. System support:
(a) The system components are affordable, dependable, and supportable by existing
staff resources. This includes all software required to run the digital library system in
actual operation: database, operating system, digital library software, and support
software required in addition to the digital library software itself.
(b) Open-source tools will be weighted more heavily because they can be tested,
validated, maintained, developed, and budgeted to a more exacting level for more
accurate initial requirements and future projections.

14. System natively supports content in multiple languages.

15. The system supports multilingual interfaces For example, automatic support if library
staff provides translations; or set search terms already automatically supported with
translations already in place.

16. Documentation that is usable accompanies the code including clear and concise
comments and examples.

17. Custom configuration settings are available at the collection level for collection-
specific behavior and appearance with collection settings overriding global settings.

18. Custom pages allow the creation of collection home pages and other landing pages
based on institution, format, topic, etc.

B. Content

1. All of the content from the PALMM and State University Libraries' collections can be
supported in terms of file format, file relationships and structure, including multimedia

2. The system must support at least the following formats:
(a) TIFF images

(b) JPG /JPEG images
(c) JP2 / JPEG 2000 images
(d) Single-page and multi-page PDFs
(e) Text
(f) Audio
(g) Video
(h) Streaming audio / video (URLs to streaming server)
(i) Remote content (URL links to externally stored files and embedded viewers as
(j) Files intended for download rather than display (e.g. data formats, spreadsheets)

3. The system supports the following special genres:
(a) EAD finding aids (with structured display, links to digitized content, XML to HTML
translation and option to also display as PDF)
(b) Serial display with hierarchy (for newspapers, journals, and other serials)
(c) Audio for simple object (music file alone), and for complex/compound objects
(oral history with a transcript that can be displayed while audio is played)
(d) Books/monographs (structured table of contents, page turning and "go to")
(e) Newspapers (NDNP and METS/ALTO formats, search term and full article
segmentation highlighting)
(f) TEl-encoded full-text

4. Must allow integrated multimedia collections - can have text, images, audio, video,
etc. all in the same collection.

5. Must support related objects, defined as groups of objects with some relation to each
other, such that:
o if one is retrieved, all are retrieved
o the relationship among the objects is made clear
o related objects do not have to all be in the same format
o any number of related objects can comprise a group

6. Must support complex objects with METS structural metadata. Must preserve METS
for export.

C. Metadata

1. System has documented, verifiable support for ingest, display, and translation of the
primary descriptive metadata in use (simple and qualified DC, MARC21, MODS and
VRA Core). System is not solely library-centric or MARC-centric - must work for
museums, archives and gallery collections as well.

2. DL System has a readily available easy process and tools for library staff to:

(a) input/update metadata
(b) add local fields (including administrative fields not shown to the public)
(c) ingest existing metadata records
(d) edit ingested existing metadata records
(e) and export metadata records.

3. Metadata can be created/edited online, or created offline and uploaded.

4. Metadata can be:
(a) in the system before an object is in the system and associated with the object
when the object is loaded
(b) added to the system at the same time as the associated object is loaded
(c) added to the system and associated with an object after the associated object is

5. Has input forms and edit routines for descriptive metadata in:
(a) simple Dublin Core
(b) qualified Dublin Core
(d) MODS
(e) VRA Core

6. Pre-existing metadata in the above formats can be loaded as XML records or as tab-
delimited or CSV files with associated mappings.

7. It is possible for library staff to design our own metadata input/update templates.

8. Simple forms for metadata entry can be provided for untrained users (for IR

9. It is possible to include technical and administrative metadata elements which do not
display to the public.

10. It is possible to enable and maintain a controlled vocabulary (standardized or user
generated) for any given field. A tool or method is available for making desired changes
easily in a manner that meets library staff needs.

11. Bibliographic records from the Aleph library catalog, OCLC records, or any MARC
records from anywhere, can be easily imported into the DL system.

12. The system can expose metadata to search engine crawling/indexing to ensure good
coverage in major search engines.

13. EXIF and IPTC metadata embedded in JPEG and TIFF images can be automatically
extracted. Users may map this metadata to Dublin Core or Qualified Dublin Core fields.

D. Ingest

1. Metadata can be harvested from OAI-PMH accessible collections for inclusion in the

2. The system supports both:
(a) manual upload to ingest
(b) automatic batch upload to ingest

3. If any translation/conversion is needed prior to ingest, a documented process with a
tool/application is available that library staff feel is sufficiently simple and has adequate
support for their needs.

4. Provides immediate verification of ingest success or, in the case of ingest failure,
provides error messages that communicate to staff what needs to be fixed for successful

5. Ingest processing is speedy enough to meet library staff needs. (For each DL System
under review, discussions over the value of increased speed should consider the
benefits of that speed in relation to the costs/delays for staffing, software version
upgrades, etc).

6. Thumbnail images can be created at the time of ingest from all image and document
formats supported in the system. Default resolution and size can be over-ridden at

7. Custom thumbnail images created outside of the DL can be:
(a) added to the system at the same time as the associated object is loaded
(b) added to the system and associated with an object after the associated object is

8. The system can automatically create multiple file formats from TIFF images. The
process should be testable so that library staff can evaluate the process of creating
derivatives and products (multiple manifestations created from the TIFF file) for quality
and any other needs. File formats available for automatic creation from TIFF include at
(a) searchable full text via OCR
(b) JPEG2000 images, with library-defined resolutions (not just a default set that
cannot be changed)

9. The system should provide options for how uploaded TIFFs are handled, for example:

(a) create derivatives and do not store TIFF
(b) store TIFF but do not display to users
(c) store and display TIFF to users.

10. The system can automatically index full text from formats including PDF, Word,
Open Office, HTML, and XML.

11. When a complex object with manifestations exists in the system, it should be
possible to replace a specific file or files without having to re-ingest the entire object.

12. The system can accommodate a single ingest process for universities using
ProQuest ETD Administrator (Possible SWORD-like process)

13. System offers an IR mode of ingest, that supports the following functions:
(a) non-staff, authorized users can submit content and metadata by a simple process
(b) content and metadata are not added to the system (or are added with provisional
or non-display status) until reviewed
(c) authorized staff are enabled to review and approve, edit or reject metadata and
(d) submitters are notified by email, text message, or other electronic communication
about the approval status of the item.

E. Search and retrieval

1. System has a Z39.50 server, equivalent JSON interface, or other documented
system-access method.

2. Users have the option to search or to browse. A simple search view (single search) is
always available.

3. For serial publications, the user should be able to search for individual articles by
author and title. The user should also be able to list and browse the tables of contents of
issues, listed in reverse chronological order.

4. The user can choose to search metadata only and both metadata and full text

5. Both Google-like simple search (all fields, one search box, all terms OCRed) and
advanced search (choice of specific fields, limits, choice of Boolean operators) are

6. Users can search and browse:
(a) within a single collection
(b) across all collections

(c) across groups of collections defined by staff
(d) across ad hoc groups of collections defined by the user

7. Assistance for search and navigation is provided through:
(a) Alternate suggestions when no results found
(b) Faceted browsing
(c) Clickable links within metadata (author, subject, format, etc)
(d) Pre-determined canned searches

8. Hits are displayed in a way that makes sense to the user; it is clear whether an object
is a book, photo, recording, etc.

9. The results returned from a search should be sortable by author, title, publication date
and relevance:
(a) any of these can be set as the default view by the user for that session / account
(b) any of these can be set as the default view by staff for general use
(c) different default views can be set for different collections

10. The results returned from a search can be represented visually in document space
ala AquaBrowser or similar tools.

11. When performing a cross-collection search and retrieving hits from multiple
collections, it is clear to the user which collection each hit comes from.

12. A "new additions" feature is available to display the "n" most recently added items.

F. Display and Use

1. An outline or table of contents display is available for complex structured bibliographic
items. It is possible to expand and contract any heading in the outline hierarchy.

2. When a textual object is retrieved by a full text search:
(a) the number of occurrences of the term in the object is displayed in the list of hits.
(b) When the textual object retrieved by a full text search is displayed, the search
term is highlighted on the page.

3. When multiple manifestations (e.g. image and full text, audio and transcript) are
available, they can be displayed simultaneously on the screen.

4. Branding is obvious, explicit, and restrained as wanted for both collection owning
repository (could be library, museum or agency) and the digitizing repository (could be
library, museum or agency). The branding is in place at the collection level and item
level (all views).

5. Multiple brands (icons) can be associated with and displayed with an object.

6. All collection items display under a collection specific to the collection-owning
repository, as well as in other collections as selected by the collection-owning repository.

7. Users can display, download, print and/or email content (unless these functions are
restricted for a particular computer file, bibliographic item, or collection).

8. Restrictions on access and use can be implemented at the computer file and/or the
bibliographic item level by password and by IP filter. When an object is restricted, the
restriction is clear to the user.

9. Objects and records may be restricted under embargo, ideally with automatic release
of the embargo once it expires.

10. There is a portfolio ("my collection") function for end users.

11. The implementation can control display characteristics such as what fields and labels
are used.

12. The end user can control some display characteristics such as the number of hits to
show on a page and how the results are displayed with options such as thumbnail,
citation only, title only, and hierarchical (for newspapers and volume/issue materials).

13. Easy to understand help files and/or tutorials are available to assist users with
search, display, and use functions.

14. Interface should be attractive and easy to use.

15. Easy-to-use training materials are available for all user levels - robust user-
community involvement a plus especially if the user community has effective input into
the design/development process.

16. A "bookmarkable" URL should be displayable for all bibliographic items.

17. Links (URLs) embedded in any field will display as clickable links. The system has a
convention for representing anchor text to display as an actionable link instead of the

18. RSS - Really Simple Syndication for user created feeds to search for recently added
items, subjects, authors, etc.

19. Share feature - Users can share an item via email, Facebook, Twitter, or other social
networking sites.

20. Commenting capabilities - Users can write a comment about the digital item.
Moderated comments written about the item can be displayed.

21.Tagging feature - Users can add a tag to describe a digital item. Moderated tags for
the item can be displayed.

22. User can save searches.

23. User can see search history.

G. Export

1. The system can export simple objects as files and associated metadata.

2. The system can export both simple and complex objects as packages with METS

3. Regardless of the format of origin, bibliographic data can be exported in MARCXML
for import into a catalog system.

4. There is an OAI broker capable of exporting all metadata, regardless of the format of
origin in oai_dc format. Additionally:
(a) Custom OAI sets can be created using a logical search of content.
(b) OAI harvesting can be disabled for certain content (test content, incomplete
collections, etc)

5. Designated content can be exported from DL to FDA automatically, without additional
effort (sending, processing, any manual work) on behalf of library staff.

6. User can export a set of saved items (portfolio) for use by another tool (e.g. Omeka).

H. Management and reporting

1. Ad hoc and canned reports can be run. Documentation is available on existing
automatic reports and samples of reports are available for evaluation by library staff for
their needs.
2. The system automatically logs usage statistics which can be aggregated for any time
period on:
(a) number of searches (by collection, by object contributor, and by date/time)
(b) materials accessed (by title and aggregated into various categories)
(c) users
(d) user sessions

3. Sample usage reports are available for review by library staff

4. The system provides counts of objects at both the bibliographic and file level:
(a) by collection
(b) by contributor
(c) created since [date]

5. The system can provide a report of the most popular titles in a specified time period
(a) by collection
(b) by contributor
(c) system wide
(d) title and single volumes for serial items so that the usage is tabulated for both
single issues and for the aggregate of all volumes for the particular serial title

6. The system keeps a count of the number of times each bibliographic item is
rendered, and can display this with the metadata for the item in the public interface.

7. The system can automatically send monthly(?) reports to authors regarding their
usage statistics.

I. Budget

1. The DL System has clear cost figures for the existing system and enhancements.

2. When evaluating the DL System, cost considerations should include:
licensing cost
o cost per record
o costs of additional software for the DL System host
o costs of additional software/tools for each of the libraries
o costs of customized programming to accommodate the libraries' needs (staffing
costs, with timelines available for review that detail implementation plans)
o costs of hardware and/or support for hosting the DL System (server space, other
equipment, staff)
o costs related to near-future migration from the software (dependent on defined
development path for any selected software, and planned support)