<%BANNER%>

Born Digital Archives at UF: A Sabbatical Research Report and Recommendations

University of Florida Institutional Repository
Permanent Link: http://ufdc.ufl.edu/IR00003159/00001

Material Information

Title: Born Digital Archives at UF: A Sabbatical Research Report and Recommendations
Physical Description: Report
Creator: Nemmers, John

Notes

Abstract: This report summarizes research conducted while on sabbatical from August 2012 to December 2012. During that period, I researched methods to appraise, preserve, and provide access to born-digital archival materials. The results of this research should provide the groundwork for the creation of policies and procedures and the development of a digital archives program in the Smathers Libraries and UF. The report also includes a separate document containing specific recommendations about policies, procedures, technology, etc.
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by John Nemmers.

Record Information

Source Institution: University of Florida Institutional Repository
Holding Location: UF
Rights Management: All rights reserved by the source institution.
System ID: IR00003159:00001

Permanent Link: http://ufdc.ufl.edu/IR00003159/00001

Material Information

Title: Born Digital Archives at UF: A Sabbatical Research Report and Recommendations
Physical Description: Report
Creator: Nemmers, John

Notes

Abstract: This report summarizes research conducted while on sabbatical from August 2012 to December 2012. During that period, I researched methods to appraise, preserve, and provide access to born-digital archival materials. The results of this research should provide the groundwork for the creation of policies and procedures and the development of a digital archives program in the Smathers Libraries and UF. The report also includes a separate document containing specific recommendations about policies, procedures, technology, etc.
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by John Nemmers.

Record Information

Source Institution: University of Florida Institutional Repository
Holding Location: UF
Rights Management: All rights reserved by the source institution.
System ID: IR00003159:00001

Full Text

PAGE 1

1 Recommendations & Comments J ohn Nemmers Sabbatical Research The recommendations and other content below w ere produced initially by sabbatical research completed by John Nemmers in Fall 2012 T he sabbatical report available at http://ufdc.ufl.edu/l/IR00003159/00001 includes general conclusions/recommendations but specific recommendations are available in this document. In addition, these recommendations are available on a wiki for dis cussion and collaborative editing by UF staff. Table of Contents Recommendations General Comments Regarding Manuscript Collections General Comments Regarding University Archives Records Accessioning and Processing Workflow Recommendations Create a Born Digital Task Group Seek Education, Training and Networking Opportunities Create/revise Legal Agreement Forms for donors and UF units Make Donors and UF Units participants in the accessioning process Create Media Storage Boxes and house all media together Create Guidelines re: Creating Disk Images or Transferring Files Conduct a Digital Materials Survey before accepting/acquiring digital accessions Create a Digital Media Tracking Spreadsheet Track all activities and decisions by Documenting Everything Use Accession Records in Archivists Toolkit to document digital holdings Assign Unique Identification Barcodes to all physical media Create a Consistent Approach to File Management for digital information packages To ensure identification and capture label information Photograph Media For some types of storage media use Write Protection mechanisms Request IT to create a Quarantine Workstation Request IT to create a Trustworthy Archives Storage location on servers Develop a Digital Archives "Toolkit" of various software applications Ask IT about the possibility of collecting Obsolete and Near Obsolete Hardware Implement procedures for Capturing Digital Content from Media Implement procedures for Capturing Online Digital Content Implement procedures for Preserving Email Inventory and establish control over Extant Born Digital Holdings (follow up project: Complete Accessioning and Processing Extant Digital Holdings ) Survey UF Records to ide ntify and plan for transfer to Archives Many of these recommendations involve the creation of new or revision of existing Policies, Guidelines and Forms

PAGE 2

2 Recommendation: Create a Task Group Create a task gr oup to discuss/ implement the recommendations in this report and to develop new recommendations. This group can be charged with creating various Policies, Guidelines and Forms needed in a digital archives progr am. Initially, this group can consist of members of the Archivists & Manuscripts unit (e.g., John Nemmers, Peggy McBride, Dennis Kozak, and Cathy Martyniak), but to be successful it also should include personnel from IT, the Digital Library Center, and pos sibly Libraries administration. This group also should initiate discussions with UF offices and units. Optimally, members of this group such as Peggy McBride and Dennis Kozak should participate on UF wide committees/groups and serve as liaisons between the Libraries and the rest of UF. The task group should assess current and needed resources, such as: Personnel. Who will contribute to the digital archives program (include archives, IT, etc.)? How much FTE will the program require? Will personnel need tra ining/development? Software, hardware, storage media/space, online services, etc. How much will it cost to purchase/implement/maintain hardware/tools/services? Administrative support (Libraries, UF). What are the costs associated with advocacy and promotin g the program? Funds. What existing funds are available? What internal/external grants are available? At a minimum, the members of this task group should read/view the following resources: naging Born Digital Heritage 12:1 (March 2011). http://rbm.acrl.org/conten t/12/1/11.full.pdf+html or https://scholarsphere.psu.edu/downloads/c534fn93r Digital http://www.oclc.org/content/dam/research/publications/library/2012/2012 06.pdf Colleges and Universities." http://www.nyshrab.org/training/erecords/ Recommendat ion: Seek Education, Training and Networking Opportunities All archivists should seek education/training/networking opportunities relating to digital archives and digital preservation. There are numerous reports, articles, webinars/tutorials, project site s, etc., that point to the need for education and training regarding born digital archival materials. Two recent surveys highlight the importance of education/training: Dooley, Jackie M., and Katherine Luce. 2010. Taking Our Pulse: The OCLC Research Sur vey of Special Collections and Archives Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2010/2010 11.pdf. Survey respond ents identified management of born digital materials as the top area in which education and training are needed. Nelson, Naomi, et. al. Managing Born Digital Special Collections and Archival Materials SPEC Kit #329. Association of Research Libraries, 2012 This survey produced similar responses that born digital training is a priority among ARL institutions. Ideally, non archivist personnel from IT, the DLC, etc., also will receive the same education/training so that everyone shares the same information. The Society of American Archivists offers a Digital Archives Specialist ( http://www2.archivists.org/prof education/das ) curriculum and certificate program, which includes sev eral workshops that would be valuable as we develop our digital archives program. It is advisable that at least one archivist should complete SAA certification. We must seek administrative funding to attend workshops and

PAGE 3

3 meetings. Of course, there are seve ral less expensive alternatives including the New York State Historical ht tp://www.nyshrab.org/training/erecords/ ), and the Digital Preservation Management tutorial hosted by M.I.T. ( http://www.dpworkshop.org/ ). Recommendation: Revise the deed of gift and transfer agreement forms so that they cover born digital materials. Alternatively, we could create a separate digital materials agreement/submission form that can be appended to the deed of gift or transfer agreement. The agreement document should cover: restric tions, copyright, sensitive information, disposal of unwanted data (e.g., duplicates or files infected with viruses), and methods for providing access to researchers (e.g., in the Reading Room only, online, or limited to specific people or IP addresses or via the VPN). The archivist and the donor/UF unit should agree whether the data will be captured using disk images or file transfer. This should be stipulated in the donor agreement or records transfer agreement. A disk image is a bit stream copy of the st orage medium. The donor/UF unit should be aware that disk images also preserve deleted files that are recoverable. File transfers or logical disk images involve the copying of selected files/directories, so deleted files are not copied and unwanted files c an be skipped. See the recommendation re: Creating Disk Images or Transferring Files Tufts has developed submission agreement models: http://sites.tufts.edu/dca/about us/research initiatives/taper tufts accessioning program for electronic records/deliverables/ Chris Prom has developed a template: http://e records.chrisprom.com/recommendations/develop submissioningest policies/submission agreement form/ Recommendation: Whenever possible, ask donors and UF units to submit metadata along with their digital materials. Preferably, this metadata should be submitted in Excel, a database or a delimited file format. We can develop a standard metadata f orm for their use. If they are unable/unwilling to submit metadata we should ask them to provide as much information as they can. In particular, we should ask them to: Explain their file naming and file management practices (ideally, we would play a role in establishing those practices in the first place) Identify and flag directories/files that contain or are likely to contain sensitive information Provide contextual information for important files or groups of files (e.g., were there shared creators, why were the files created, etc.) Identify which files also exist in paper version (i.e., they were printed) W e need to gather information from creators/donors about the types of digital materials they have and the types of software/hardware used to create t he digital materials. This information will help dictate how the materials will be accessioned, arranged and described. Recommendation: Create physical media storage boxes.

PAGE 4

4 All digital media will be stored together in a single location rather than being stored with their analog parent collections. In other words, all digital media (disks, CDs, etc.) from all collections will be stored together in one set of media storage boxes. Media will be removed from analog collections using separation sheets. Origin al locations will be documented using the Digital Media Tracking Spreadsheet Topic for future discussion: Do physical media need to be retained indefinitely? Certainly, it makes sense to retain the media during processing and for a period of time after processing to ensure successful preservation, but can we consider discarding the physical media once the content and contextual information has been preserved? Recommendation: Create guidelines t o help archivists decide whether to create disk images or simply transfer the files. In most cases creating a disk image is the preferred method for accessioning data. A disk image is a bit stream or bit by bit copy of everything on the storage medium. On e drawback to creating disk images is that they include everything on the original medium, including deleted and possibly unwanted data, and any unused space. If there are viruses on the storage media, then those viruses would be included in the disk image as well. The viruses are basically harmless while captured in a disk image, but it is possible that they could be activated if accidentally extracted from the image at a later date. With file transfers archivists use a tool like Windows Explorer to simply copy the directories and files from the original medium to a quarantine computer. One drawback to simple copying is that it may result in the loss of metadata associated with the files. One advantage is that the archivist can select only those directories /files that will be retained, and the process will not capture unwanted files or deleted files. It is advisable to complete virus scans and remove all viruses before accessioning the data either via disk imaging or file transfers. The virus removal process may result in the deletion of infected files, so be aware that information may be lost along with the viruses. There are occasions where we may have to create a disk image and preserve the content exactly as it is, even if there are viruses, sensitive inf o, duplicate files, empty space, and deleted/unwanted data. In other words, we would preserve everything the good and the bad. For example, certain types of media such as read only CDs require the creation of disk images because virus removal is not possi of the image can be preserved and used to make access derivativ es. Creating disk images is slightly different for each type of storage medium. Recommendation: When appraising possible donations or transfers, conduct a survey of any digital materials included. Ideally, this should be done prior to accepting/transfer ring any digital materials, but the survey can be conducted upon receipt of digital materials. The survey should be completed with input from the donor or UF unit so that we can gather information about content/context and identify potential issues (e.g., obsolete technology, sensitive information, unwanted content, etc.). A survey conducted before accepting any transfers or donations of digital materials allows us to determine if we have the resources necessary to properly store and manage the data. We sho uld create a standard survey form. There are numerous survey forms available for use as models, but a few are included below: AIMS Digital Material Survey http://www2.lib.virginia.edu/aims/whitepaper/AIMS_final_appF.pdf

PAGE 5

5 Stanford University Libraries Digital Record Survey https://doc s.google.com/document/d/1eQUD56vGQpZ _C_Y viK_ASRFWojB0gxW Wqo79QSDE/edit?pli=1 Recommendation: Create a digital media tracking spreadsheet. This spreadsheet will include fields such as accession number, container number, medium ID/barcode number, medi um type, manufacturer, maximum medium storage size, folder title if applicable, label information (dates, file formats, etc.), PC/MAC/unknown, etc. The spreadsheet will also include workflow fields such as virus scan completed, virus actions taken, imagin g completed, imaging notes, sensitive info notes, current storage location (i.e., quarantine workstation, archival storage, digital repository storage). There are models available including: https://sites.google.com/site/workflowdocumentation/2 accessioning/2 1 establish physical control Recommendation: Document all decisions and actions. This one is simple and doesn't r equire much discussion. All experts agree that it is critical that every decision/action be documented. We can use AT to for some of this documentation, but we can also simply create readme text files explaining our decisions/actions and save these files a long with the digital objects/metadata. Recommendation: Use Archivists Toolkit or ArchivesSpace to record accession information about digital holdings. Update the accession record for the analog collection to include the total number of physical media ( e.g. hard drives, floppy disks, CDs, laptops etc.). Create a separate accession record for all born digital files acquired in an accession. The new accession record should link to the Digital Media Tracking Spreadsheet and the resource record for the Smith Collection in AT. Example: if the legacy Smith Collection includes 3 CDs then we would first update the Smith Collection accession record indicating the presence of the 3 CDs. Next, we would cre ate a new accession record for those CDs that would include the number of files and total size. Example: if a newly acquired Jones Collection consists entirely of digital content (i.e., there are no analog materials), there would only be one accession rec ord in AT. Recommendation: Assign barcodes with unique identification numbers to all physical media. These barcode numbers will be used to track and manage the digital data. The application of pre printed barcode labels and the use of barcode scanners w ill save considerable time during processing. Whenever possible the barcodes will be applied directly to media. All media such as CD/DVDs that are not in cases should be housed in archival sleeves and the barcode should be placed on the sleeve. Recommend ation: Create a consistent approach to file management for all digital materials ingested into the archives system.

PAGE 6

6 Following the OAIS model, there are two types of packages we're concerned with: the Submission Information Package (SIP) and the Archival I nformation Package (AIP). The SIP is transferred from the creators/donors and assembled/generated during accessioning. This package includes the digital object plus all associated metadata. The AIP is the version that is stored and preserved. It includes everything in the SIP, plus derivatives/copies created during processing as well as any metadata that may have been enriched during processing. A typical SIP might include the following folders/subfolders: top folder = Accession number/Barcode number o sub folder 1 = original files write protected o subfolder 2 = copies/derivat iv es to be used during processing o subfolder 3 = file inventory including technical/descriptive metadata o subfolder 4 = photos of media if captured from media After processing, the AIP w ould include all of the above plus post processing objects and any new/enriched metadata. Recommendation: Photograph all storage media. Insert print outs of the photos with separation sheets in the original locations in analog collections. Images of med ia can also be saved in the project directories on networked storage. Recommendation: For certain types of physical media (e.g., floppy disks) use write protect tabs to preserve the digital content and prevent accidental writing to the media. Also, we sh ould acquire and use write blocker hardware. Some types of floppy disks include write protection tabs that can be used to ensure that no new data is written on the media. In addition, our Quarant ine Workstation should include a write blocker as part of its hardware configuration (e.g., Tableau Forensic USB Bridge). The write blocker turns a read write drive into a read only drive so that no digital content can be changed/added/deleted. Recommen dation: Request that the IT department set up a dedicated workstation for digital archives accessioning/processing. This workstation will be quarantined (i.e., it will not be networked) to prevent the spread of viruses, but the workstation has to be up to date with antivirus software. This workstation should include a write blocker as part of its hardware configuration. The write blocker turns a read write drive into a read only drive so that no digital content can be changed/added/deleted. This workstati on should include an external drive for regular backups (particularly useful when accessioning numerous files over several days). Recommendation: Request that the IT department set up a dedicated space on our servers for trustworthy storage of digital ar chives holdings.

PAGE 7

7 We must implement some storage solution even if it is simply using media storage and later getting server storage. We need two separate storage solutions: one for archival masters (server or external hard drives) and one for access (UFDC) Note that if we use physical media storage rather than servers, we have to have backup drives. Confirm that this storage location will only be accessible by archivists and trusted IT personnel. Confirm that this storage location will be backed up regula rly, including off site copies. We have to stress to IT and Admin when making this request that storage is a perpetual cost and not a one time expenditure. Born digital materials must be preserved forever. Recommendation: Create a digital archives "Toolk it" of various software applications. This toolkit should include Archivematica, FTK Imager, and WinDirStat. Archivematica is intended as a full service digital preservation system that handles a variety of tasks including preparing digital objects for in gest, ingesting them into storage, and providing access to the archived material. There really is no single tool that covers all aspects of the e records lifecycle, but Archivematica comes a lot closer than other tools available currently and it should sat isfy most of our needs. FTK Imager is used to create disk images (free). The full version of Forensic Toolkit (purchase) can be used to create forensic metadata. To search for sensitive information, Identity Finder is a good tool. This program also can be used to recoverable). WinDirStat is an excellent tool for appraisal, selection, etc. It audits the directory structure of a drive/device and provides a visua l display of all files on that drive/device. There are many tools and services available for website archiving, but many digital archivists recommend Archive It because it is very customizable and easy to use. It is probably less expensive than using free software, which takes a lot of labor to setup/manage. Quick View Plus, Treesize Pro, and Disk Analyzer can be used to review files without modifying anything. These and other tools such as Fast Duplicate File Finder and Clone Spy are useful for identifying duplicate files. If transferring files manually (as opposed to creating disk images), Karen's Directory Printer can be used to document the files/directories by capturing metadata such as checksums, file names, formats, creation dates, file sizes, etc. Th is metadata is generated as a tab delimited file that can be imported into a spreadsheet. FITS can identify thousands of file formats, extract technical metadata for many of these file formats, and even validate a small number of the file formats. Tools su ch as Aid4Mail, Thunderbird, Mailstore Home and Offline IMAP can be used to capture/view email. Note that some of the emerging digital archives tools such as Archivematica and Curator's Workbench bundle some of the tools above (or similar tools) in their applications. For example, Archivematica bundles FITS. Recommendation: Investigate with the IT department the acquisition/preservation of selected obsolete and near obsolete hardware. p drives, microcard readers, etc., as well as the necessary connectors/drivers that enable these to connect to the Quarantine Workstation

PAGE 8

8 There is no need to attempt to collect/preserve unusual hardware (e.g., drives capable of reading 8" floppies). Simply maintaining 5.25" and 3.5" drives would address most of our needs. We should ask IT to gather retired drives from UF surplus. Some archivists report that their IT departments are unwilling to maintain obsolete or near obsolete technology, so we may investigate the possibility of using hardware maintained at other repositories or through vendors. Recommendation: Follow these basic procedures for capturing digital content from media. Note th at some of these steps will be completely manually and some will be completed automatically using Archivematica. Apply barcodes to media (record in media tracking spreadsheet) On the dedicated digital archives Quarantine Workstation create a project directory for receiving the digital content. This directory will include a subdirectory containing the actual data copied from the media, as well as any files documenting the capture process. Take pho tos of media and save photos in project directory. Ensure Write protection by using a write blocker when possible and/or ensure that write protection tabs are used on physical media Insert/connect the physical storage medium to the dedicated digital archives processing computer (e.g., insert a floppy disk into a floppy drive, attach a USB drive to a USB port, connect an external hard drive, etc.). Do not open any files. Run virus scan software to i dentify and remove harmful content. Copy data from the physical medium to the data subdirectory on the processing computer by creating a disk image (i.e., a bit stream copy) of the storage medium or by simply copying the directories and files from the orig inal medium to the data subdirectory on the processing computer. One drawback to simple copying is that it may result in the loss of metadata associated with the files. One drawback to creating disk images is that they include everything on the original me dium, advisable to also document all directory information including file names, sizes and dates. For more info see the recommendation re: Creating Disk Images or Transferring Files Generate a checksum on the disk image or generate checksums for each file if you copied the files individually rather than creating a disk image. A checksum is a unique signatur e or key created using an algorithm for any unique digital file. Commonly used checksums are MD 5 and SHA 1. Search for and identify sensitive information to restrict or redact. The actual restricting/redaction may be completed later during processing. Be aware that text searches will not discover sensitive information present in images (e.g., scanned documents stored in TIFF format) or PDFs without recognized text. Identify deleted data, duplicates and other unwanted files to remove. The removal of these f iles may be completed later during processing. Search for duplicate files by comparing checksums and other characteristics. Create access derivatives and move access copies to access repository. This step is optional at this point; it may be completed late r during processing. Document all decisions/steps by creating a text file and saving it in the project directory. Return the physical media to digital archives storage location. Note: we may decide to discard at some point in the future. Create/revise acce ssion records in Archivists Toolkit. Copy the complete project directory (which contains the data subdirectory) to archival storage.

PAGE 9

9 Document access restrictions and determine additional preservation activities needed. Recommendation: Create and implemen t procedures for capturing online digital content such as websites. Recommendation: Create and implement procedures for preserving email. here's a good workflow we might adapt: https://docs.google.com/presentation/d/12ZxXQIczy7SDViWL1xj3pz8h5xiQQtMIuMYKg4gcsLk/embed? hl=en&size=s#slide=id.p13 Recommendation: Conduct an inventory and establish physical control over extant born digital archival holdings. This would be a good mini grant project for the Task Group to pursue. Conduct an inventory of all born digital material s already part of our archival holdings. Many of these materials can be identified by searching our archival descriptive information for terms such as: electronic, CD, DVD, disk, diskette, floppy, disc, USB, drive, Zip, Flash, computer, cartridge, data, di gital, etc. Physically count and describe all media using the Digital Media Tracking Spreadsheet (including type/manufacturer of each medium, maximum storage size of each medium, etc.). There is n o need to open and view the digital files at this point. To determine the estimated number of bytes needed to store all extant digital holdings, multiply the number of physical items by the maximum storage size for each item. For example, if we have 300 CD s with a maximum storage capacity of 700Mb each, the total storage needed would be 210Gb. Make note in the media tracking spreadsheet of information such as dates and file formats if indicated on media labels. This information can be used to determine if d igital content will require particular software/operating systems. Assign Unique Identification Barcodes to all physical media. Physically separate all digital media from analog holdings and trans fer to Media Storage Boxes use Write Protection devices on media when possible. Photograph Media and insert print outs of the photos with separation sheets in the original locations. Update existing Accession Records in Archivists Toolkit to indicate the number of digital media found in each collection. Either as part of this project or as a follow up project (possibly with external funding) Complete Accessioning and Processing Extant Digital Holdings Recommend ation: Undertake a project to complete accessioning and processing of extant born digital holdings. The project also should produce an access/preservation plan. Note that this can either be completed as part of the proposed Extant Born Digital Holdings project or as a follow up project (possibly with external funding).

PAGE 10

1 0 Using Archivematica, capture descriptive/technical information such as label information, type of medium, creator, dates, and inf ormation about the software/hardware used to create/use content. Using Archivematica, transfer digital content from physical media to digital storage space (i.e., ingest digital content) Formulate plan for providing access to and ensuring preservation of d igital content. Recommendation: Conduct a survey of UF Records to identify archival digital records and plan for transfer to the University Archives. We should consider a pilot project involving selected UF records creators. After the progra m demonstrates successful capacity/services we can broaden scope university wide. Policies, Guidelines and Forms The policies, guidelines and forms below are required to ensure an effective and efficient digital archives program. Policies: Collecti on Policies (revise existing/create new) Born Digital Materials Policy covering preferred file formats, preservation actions, access, etc. (e.g., see http://e records.chrisprom.com/recommendations/develop submissioningest policies/electronic records deposit policy/ and http://e records.chrisprom.com/recommendations/electronic records program statement template/ ) Guidelines/Procedural Documents: Accession Workflow Transferring Digital Materials (via Electronic Transfer or Media) (e.g., http://e records.chrisprom.com/recommendations/develop submissioningest policies/transfer guidelines/ ) Capturin g Digital Records (see recommendation re: Creating Disk Images or Transferring Files ) (e.g., these Hull idiot guides are very good: http://www.hullhistorycentre.org.uk/discover/pdf/Idiot%27s%20Guide%203%20 %20FTK%20Imager.pdf ) Housing, Retention and Disposal of Storage Media Access to Restricted Materials Preservation an d Access Guidelines (e.g., http://e records.chrisprom.com/recommendations/supported formats/ ) Preserving Email Capturing Online Content Forms: Digital Media Tracking Spreadsheet (see recommendation re: Digital Media Tracking Spreadsheet ) Digital Materials Survey Form (see recommendation re: Digital Materials Survey ) Deed of Gift and UF Records Transfer/Submission Agreement (see recommendation re: Legal Agreement Forms )

PAGE 11

11 Checklist (can be part of Tracking Spreadsheet): virus check, imaging/transfer of files to quarantine, create preservation metadata, etc. General Comment s Regarding Manuscript Collections Thoughts/Points for Discussion We have to specify in our collecting policy and guidelines which formats and media we are willing to accept, as well as the acceptable methods for transferring digital materials. We also have to specify the formats we are willing to provide to users. One logical option would be to only provide access to normalized derivative files (e.g., a PDF/A instead of the original Wordstar file). See recommendation re: a project to inventory and estab lish control over Extant Born Digital Holdings (follow up project: Complete Accessioning and Processing Extant Digital Holdings ) see rec ommendation re: conducting a Digital Materials Survey before accepting/acquiring digital accessions from donors General Comments Regarding University Archives Records Thoughts/Points for Furt her Discussion A major goal should be to provide guidance re: file naming and file organization practices to UF records creators. We can not be passive/reactive about e records. We have to specify in our collecting policy and guidelines which formats and media we are willing to accept, as well as the acceptable methods for transferring digital materials. We also have to specify the formats we are willing to provide to users. One logical option would be to only provide access to normalized derivative files (e.g., a PDF/A instead of the original Wordstar file). see recommendation re: conducting a Digital Materials Survey before accepting/acquiring digital transfers from UF units see recommendation re : a project to Survey UF Records to identify and plan for transfer to Archives Accessioning and Processing Workflow This is a proposed workflow for appraising, accessioning and processing bo rn digital materials. This workflow can be expanded considerably, and ideally will exist as a flow chart that includes all Appraisal 1. Conduct survey 2. Determine resources 3. Work with creator/donor to generate descriptiv e metadata, identify sensitive information, etc. 4. Complete deed of gift/transfer agreement Acquisition and Accessioning 1. Acquire the materials (by accepting media from the creator, by copying files from the creator's computer, by transferring files via FT P or online service, or by using web capture applications). 2. Establish physical control (apply barcodes, separate from papers, fill in tracking spreadsheet, capture label info, photograph media, etc.) 3. Virus check on quarantine workstation

PAGE 12

12 4. Create disk image or transfer files 5. Generate checksums 6. Capture technical metadata 7. Gather descriptive metadata (ideally, creator is supplying most of this) 8. Survey file formats and assess preservation needs 9. Check for sensitive information 10. Identify arrangement/order and create a file inventory (includes checksums, date last modified, date created, folder structure) 11. Create copies/derivatives to be used during processing 12. Transfer captured data and metadata to secure networked storage 13. Update/create accession record in AT Processi ng 1. Create descriptive metadata 2. Identify and flag restricted files 3. Identify sensitive information (restrict or redact) 4. Create rights metadata 5. Create access derivatives 6. Update EAD 7. Transfer access derivatives to UFDC or other access point



PAGE 1

Research Report Fall 2012 Sabbatical John R. Nemmers Descriptive & Technical Services Archivist (Associate University Librarian) University of Florida George A. Smathers Libraries May 28 2013 This report summarizes research conducted while on sabbatical from August 2012 to December 2012. During that period, I researched methods to appraise preserve, and provide access to born digital archival materials The results of this research should prov ide the groundwork for the creation of policies and procedures and the development of a digital archives program in the Smathers Libraries and UF The Libraries have been managing born digital archival materials at UF for many years. Almost all of the ext ant born digital materials have been acquired as part of analog archival collections managed by the Archives & Manuscript unit. These archival collections include personal papers of individuals and families, the records of corporations/organizations, and t he official records of UF. The born digital holdings include hundreds of file formats on multiple types of physical media, including floppy discs, hard drives, USB memory devices, magnetic tape, and computers. However, there are almost no established polic ies and procedures for these born digital materials, and no plans for preserving and providing access to much of the data that is in use today (e.g., databases, websites, social media, complex data sets, email). The principal goal of our digital archives program should be to ensure preservation and future readability of born digital materials through an authoritative and trustworthy process. It's not enough just to ensure pr eservation of digital objects; w e need to ensure their use and access within the pr oper context (e.g., how they were created/maintained by the creators, what software/hardware were used, what relationship s exist between files, etc.). A major challenge is the technology itself and technology obsolescence, but other challenges include righ ts management, security, privacy, authority, and resources (staff, funds, equipment). It would be impossible to establish a fully formed, fully supported, properly funded program right out of the starting gate, so my goal was rather to make a series of r ecommendations that hopefully will lead to a full program. Some recommendations are near future: those actions that can implemented right away, usually with minimal resources. Some will be long term, primarily because these goals will require the participa tion and support of administrators, IT experts, and other key people and units across campus. Literature Review Prior to the start of my research program, I began conducting a thorough review of the literature Rather than create a new bibliography I used the excellent, numerous bibliographies that already exist. I did not rely solely on these extant bibliographies and I did conduct my own searches for resources. Appendix A includes a list of bibliographies use d and selected resources that I found to be particularly useful. Research M ethodology M y research concentrated on the tools, workflows, procedures and practices used to appraise, accession and process born digital archival materials. In my original resear ch proposal I had stated the intention to examine all aspect of digital archives programs (e.g., preservation, arrangement, description and access). However, I quickly determined that it was unwise at this point to try to tackle everything at once. Since p roper appraisal and accessioning are the foundation for all other archival activities, I decided to concentrate on these areas principally. And a lthough I did investigate the basic theories behind these and other digital issues (including digital preservat ion and digital curation) I was much more interested in learning about the practical steps that archivists accession and process their born digital holdings. I began with several questions: What resources do we need to prop erly manage born digital materials? Which tools are most useful here at UF?

PAGE 2

What tools can be implemented with minimal resources? How should we track and store di gital media such as disks? S captured the digital content/ metadata? What workflows must we develop to appraise and accession born digital materials found in analog collections? Collections that consist entirely of born digital materials? What policies and procedures can we implement within the Smathers Libraries? Within UF? Statewide? A major component of my research methodology was the installation, use and evaluation of selected software tools available for digital archives programs. I had previous experience with a few of the digital archives tools available and I was aware of others that I wanted to evaluate, but the literature review revealed numerous applications that had bee n developed in the past few years. I had no intention of evaluating all of the tools currently available because there are simply too many, and I began the research program with a good idea of which tools might be most useful here at UF given our needs and resources. I did review evaluations of numerous software tools completed by colleagues Practical E Records report. My research program also included discussions with colleagues around the country who have implemented digital archives programs or w ho routinely use digital archives tools With support from the Smathers Libraries I was able to travel to North Carolina in October 2012 to conduct interviews with archivists at Duke University and t he University of North Carolina Chapel Hill I selected t hose institutions because they have created some of the software tools that are used regularly in digital archives and their personnel are leaders in this field. It was an invaluable learning experience to be able to witness their processing workflows and practices firsthand. My notes from these visits are included as Appendix C. In addition to the North Carolina visits, I informally consulted colleagues to gather information concerning digital archives activities, tools, and best practices. I spoke with a rchivists and librarians at the University of Illinois at Urbana Champaign, the University of Miami, North Ca rolina State University, and M.I.T Finally, I participated in webinars and training sessions on: Long Term Storage and Digital Curation FERPA, HI PAA and Privacy at the University of Florida PDF/A ( archival file format ) Digital Preservation Management Preserving Electronic Records in Colleges and Universities Following the sabbatical period, I have continued to seek out training opportunities and I have attended a series of by the Society of American Archivists. Digital Archives Tools During the second half of the sabbatical I tested and reviewed selected software applications used in the preservation and management of electronic records Over the last few years there has been a dramatic increase in the number of tools available for use in digital archives programs. Many of these tool s are fairly immature and developing quic kly. Not intending to comprehensively test and review the dozens of tools available I instead selected a small number of tools to assess I selected tools based on several criteria including functionality, cost, su stainability, adoption rates, standards compliance, user friendliness, coverage in the professional literature, and recommendations made by colleagues For example, I was far more interested in selecting tools that would not require advanced computing skil ls knowing that our primary users would be archivists rather than IT experts.

PAGE 3

Similarly, I had no desire to test 4 5 individual tools if those tools were bundled together in a sixth tool that I could review. Based on these criteria, I quickly created a sma ll list of tools that I eva luated during the sabbatical. It should be noted that some of the tools I selected for evaluation were not evaluated individually because they f DROID, the National Library of New Zealand Metadata Extractor JHOVE, BagIt and multiple checksum generators. Archivematica https://www.archivematica.org/wiki/Main_Page Developed by Artefactu al Systems with funding/support from multiple entities including UNESCO City of Vancouver, Yale. Free; open source. Archivematica is intended as a full service digital preservation system that handles a variety of tasks including preparing digital objects for ingest, ingesting them into storage, and providing access to the archived material. Primarily uses normalization for digital preservation. Architecture is based on a suite of micro services (e.g., a bundle of tools) which are managed used a web Operates on Linux systems, and requires a virtual machine to run in Windows or Mac. The architecture and workflows are based on the OAIS standard, and it also uses METS, PREMIS and DC for metadata. There is a fairly large community of adopters/others discussing Archivematica. Documentation is provided, and appears to be very thorough. The web based dashboard is fairly easy to use, but certainly having advanced computer skills is helpful. It was not easy to install, and I had to seek out discussions and tutorial videos before getting the system installed properly. Once installed, though, I was able to use the system with little difficulty. One of the most attractive aspects to Archivematica is that is bundles a variety of tools needed for digital archives work, including FITS (file identification), BagIt (packaging digital objects and metad ata for storage), Clam AV (anti virus), ICA AtoM (description and access), MD5 (checksum generator), and several file normalization tools. There really is no single tool that covers all aspects of the e records lifecycle, but Archivematica comes a lot clos er than other tools available currently. This tool should satisfy most needs of any digital archives program and I recommend that we adopt it at UF Curator s Workbench http://www.lib.unc.edu/blogs/cdr/index.php/tag/curators workbench/ Developed by UNC Chapel Hill for the Carolina Digital Repository. Free ; open source Excellent tool for automating accessioning/processing /ingest activities and preparing dig ital materials for submission to a digital archives repository. Captures files, generates manifests and checksums, normalizes metadata, and used to arrange objects/folders. Very useful for batch processing. Produces a submission package ready for ingest i nto the repository Operates on Windows, Mac, and Linux systems. The interface is very intuitive and easy to use. One of the most useful features is the C rosswalk tool, which allows the user to map creator supplied metadata fields to MODS elements For example, if a creator produces a spreadsheet containing metadata about the digital files, the crosswalk tool can be used to identify which fields in the spreadsheet map to which MODS elements. This gives the archivist a lot of control over the final descri ptive metadata included in the submission package. All activities undertaken in the Workbench are documented in a METS manifest file which tracks all digital objects, copies, and metadata. I was able to visit UNC Chapel Hill and talk with the archivists from this visit are included in Appendix C). This tool was created primarily for use by UNC so it was designed with their workflows in mind (e.g., preparing submission packages for Fedora) but is easily

PAGE 4

adapted by othe to the software). Along with Archivematica this tool seems to be one of the best available at fulfilling multiple functions. Duke Data Accessioner http://library.duke.edu/uarchives/about/ tools/data accessioner.html Developed by Duke University (Seth Shaw). Free; open source. Extremely user friendly. Includes a simple interface Allows users to migrate data from media to new storage (server, quarantine computer, etc.). It generates checks ums and documents the transfer process. Allows for the capture of metadata (e.g., information found on labels). Requires very little computer expertise or IT support. Allows for JHOVE and DROID plugins for file identification. I had used this tool previo usly when accessioning disks that were acquired as part of an analog collecti on. The tool is very easy to use encounter any problems. The tool is useful if a more advanced alternative like C Archivematica is not available, but it does have limited functionality. During my visit to Duke, Seth Shaw told me that he no longer uses it in his own workflow. If UF adopts Archivematica then there is no need to use this too l. FITS/JHOVE http://hul.harvard.edu/ois/digpres/tools.html Developed by Harvard University. FITS bundles various tools including JHOVE (another Harvard tool) DROID, Exiftool, the National Library of New Zeala nd Metadata Extractor, FFIdent, and Windows File Utility. FITS runs these tools to identify and validate file formats and to capture technical metadata for files. The output produced by the multiple tools is normalized and consolidated. FITS identifies er rors and conflicting results and produces its results in XML Operates on Windows and Unix systems (command line). FITS was designed to be incorporated into larger applications/workflows (e.g., Archivematica), so the lack of a graphic interface is intentio nal. If UF adopts Archivematica then there is no need to use this too l separately FTK Imager http://www.accessdata.com Developed by AccessData. Free. (Note: AccessData also sells the much more powerful FTK which has a lot more forensics firepower ) Ope rates in Windows. FTK Imager is used to create disk images of hard drives and media such a s USB devices and to transfer and preview the content of th ese media using a write blocker FTK Imager can create either 1) a forensic image, which is a complete replication of the structure and contents of a storage device, including deleted files and unu sed space, or 2) a logical image, which does not include deleted files or unused space. Unless required by a creator/donor, it is probably best to create logical images so that we can simply capture the files/folders as they appear when using the drives/me dia. This is an excellent tool for creating disk images, but it does need to be used in combination with a write blocker because AccessData does not guarantee that the software does not write to the drive/device. It is extremely easy to use, although it is intended for a tech savvy audience so it helps to have some advanced computer skills in order to understand all of the options available. It was easy to install and I encountered no issues during testing. I recommend that we adopt it at UF

PAGE 5

WinDirStat http://sourceforge.net/projects/windirstat/ Free; open source. Excellent tool for appraisal, selection, etc. Audits the directory structure of a drive/device and provides a visual display of all files on that drive/devi c e. Each file type is assigned a color, so it's easy to see at a glance which file formats are most prevalent, which files might pose a pre servation problem, etc. Documentation is a bit weak, but isn t really needed since the tool is so easy to use. This is a very useful tool and I recommend that we adopt it at UF. Conclusions and Recommendations One principal conclusion is that we should implement our digital archives program incrementally, using low cost tools and methods in early stages and slowly building the hardware, software, expertise and other necessary resources over time We should concentrate in the early stages on building the support and infrastructure, including policies, guidelines and workflow documents. It costs little to create policies and guidelines needed to manage a successful digital archives program, and we will need these documents as we develop the pr ogram and promote it to administrators, creators, donors and other stakeholders. One encouraging conclusion that I arrived at fairly quickly in my research is that there are several basic activities that we can accomplish now with minimal resources. We can : Capture metadata Identify and validate file formats Complete virus scans Create disk images (e.g., using FTK Imager) Generate checksums Document all decisions and activities Take photos of media Initiate discussions with campus units Seek out training/education opportunities Collect hardware/software Create policy/procedural documents Many of these activities can be accomplished by acquiring free software, and I recommend that UF adopt both Archivematica and FTK Imager I concluded early in my research that I wanted to focus primarily on the accessioning/ingest of born digital materials. One of the reasons why I opted to focus on accessioning was this realization: Although we should be applying the same archival processes to born digital material s as we do to analog materials (i.e., appraisal, acquisition, arrangement, description, preservation, etc.), we have a drastically smaller window of time in which we can establish intellectual and physical control. When we acquire a new accession of analog materials, we can put them on a shelf for years or even decades before returning to them. Digital materials, however, can become inaccessible within only a few years because of technology obsolescence. Most digital materials that are accessioned properly can remain in the processing queue for years without losing accessibility. A nother conclusion I reached fairly quickly is that a ppraisal and description are going to be far more important tha n arrangement for the majority of born digital materials. With our analog holdings we traditionally devote a lot of time to arranging the materials in a logical way so that researchers can more easily find and use resources. Most digital materials can be easily searched (e.g., searching email or Word documents), so researchers will not need to rely on the contextual arr angement to help them locate/access resources. There are several preservation options we can a pply to born digital materials (e.g., migration, emulation, printing paper copies, etc.), but n ormalization is probably this best solution for most files. Normalization is the conversion

PAGE 6

of files to a format that should be stable/persistent such as PDF/A The PDF/A format is a preservation file format that is most useful for text based, static documents. It is not as useful for spreadsheets, databases, or dynamic files such as web pages. The PDF/A format can contain readable/searchable/extractable text, but it preserves authenticity by restricting content modifications It is worth noting that most digital prese rvation experts agree that it is impossible to preserve original digital materials indefinitely. All that we can attempt to do is to preserve the ability to reproduce those materials. Finally, I believe that w e can t ake advantage of tools services and expertise already availabl e in the Libraries and in the state. W e have a significant advantage over many other institutions because we have SobekCM, UFDC, and the Florida Dark Archive/ DAITSS Of course, these tools do not constitute an entire digital archives system. For example, UFDC is primarily an access/discovery system and would only be one portion of a much larger digital archives system that also would include people, policies, other technologies, and activities that occur outside of any preservation repository or de livery tool In other wor ds, we do not have to invent new preservation and access systems; we can build our digital archives program with significant components already in place. I have created a working list of specific recommendations. I will post these recommendations on a Born Digital @ UF wiki for ongoing revision and discussion with other participants Ongoing/Future Activities In addition to the recommendations above, there are several activities that I plan to undertake in the near future: I will publish this report and disseminate my recommendations online. I will p romote my recommendations and the digital archives program through talks/meetings with others in the Libraries, including archivists, administrators, IT experts, and librarians I began this process in November 2012 when I p resented preliminary results of my sabbatical research and led a discussion at a meeting It is my hope th at promotional efforts will be expanded to include UF administrators and other stakeholders such as records creators. I plan to continue assess ing various tools and keeping up to date on digital archives developments. I will continue to participate in work shops and webinars such as the Digital Archives Specialist curriculum and certificate program offered by the Society of American Archivists. I have offered to lead a workshop for members of the Society of Florida Archivists. I have had discussions with colleagues at various institutions (e.g., the University of Miami, North Carolina Chapel Hill, MIT, etc.) about participating in a collaborative grant project, and I hope to seek internal and/or ex ternal funding for a project to manage our extant born digital holdings. I plan to continue discussions with colleagues at the University of Miami about collaborating to implement state and region wide policies and best practices in Florida and the Caribb ean. For example, we have discussed creating a processing manual for born digital materials UF can become leader in this effort for the state and region. We can p rovide tools and services much as we already do with SobekCM/UFDC. We should s eek to dissemin ate our information and expertise in this area by offering training and encouraging ongoing discussion

PAGE 7

Appendix A Resource s Used in Literature Review Rather than create my own bibliography I used the excellent, numerous bibliographies that already exist. I did not rely solely on these extant bibliographies and did conduct my own searches for resources. This document includes a list of bibliographies used and selected resources that I found to be p articularly useful. AIMS Work Group. 2012. AIMS Born Digital Collections: An Inter Institutional Model for Stewardship. http://www2.lib.virginia.edu/aims/whitepaper/AIMS_final.pdf http://www.clir.org/pubs/reports/pu b149/pub149.pdf Dow, Elizabeth H. Electronic records in the manuscript repository Lanham, Md.: Scarecrow Press, 2009. You've Got to Walk Before You Can Run: First Steps for Managing Born Digital Content Received on Physical Media. An OCL http://www.oclc.org/content/dam/research/publications/library/2012/2012 06.pdf rd Managing Born Digital Collections in (March 2011). Accessed June 19, 2012. http://rbm.acrl. org/content/12/1/11.full.pdf+html or https://scholarsphere.psu.edu/downloads/c534fn93r Harvey, D. R. to do it Manual New York: Neal Schuman Publishers, 2010. The b ook also has website with checklists to help develop plans and procedures: www.neal schuman.com/curation Lee, Christopher A. Chicago: Society of American Archivists, 2011. Nelson, Naomi, et. al. Managing Born Digital Special Collections and Archival Materials SPEC Kit #329. Association of Research Libraries, 2012. New York State Archives. Preserving Electronic Records in Colleges and Unive rsities: Getting Your Program off the Ground W ebinar created by Stephen Goodfellow for the New York State Archives with a grant to the New York State Historical Records Advisory Board http://www.n yshrab.org/training/erecords/ Pearce Moses, Richard and Susan E. Davis, eds. New Skills for a Digital Era Colloquium proceedings. Society of American Archivists, 2008. http://www.archivists.org/publications/proceedings/NewSkillsForADigitalEra.pdf Prom, Chris. Practical E Records blog: http://e records.chrisprom.com/resources/bibliography/ Society of American Archivists. Campus Case Studies Reports by university archivists on working solutions for born digital record s. http://www2.archivists.org/publications/epubs/Campus Case Studies Society of American Archivists Manuscript Repositories Section: http://www2.archivists.org/sites/all/files/ReadingsResourcesTools Final_0.docx

PAGE 8

TAPER: Tufts Accessioning Program for Electronic Records Final Report. http://sites.tufts.edu/dca/about us/research initiatives/taper tufts accessioning program for electronic r ecords/project documentation/ Tufts University and Yale University. "Fedora and the Preservation of University Records Project." http://dca.lib.tufts.edu/features/nhprc/

PAGE 9

Appendix B Major Issue s/Considerations Preservation Options bit stream copying (multiple copies of native file) refreshing (copying native file to newer media) system preservation (maintain all hardware/software/media this is short term solution) emulation (operating old operative systems within new) migration (copying files from one system/media to another may lose formatting) normalization produce paper/microfilm versions Ingest considerations what can we accept/not accept (file formats, size, etc.) what is transfer method what is minimum metadata Storage/ Security bring files into secure repository with firewall, read only access and intrusion detection who can access/write to what parts of storage? document all activities over time every action taken to collect, organize, categorize, maintain, preserve, retrieve, use or dispose of record must be auditable (must have unalterable audit trails). Audits should reveal nature of action taken, the entity undertaking the action, and the time of a ction. Who has authorization to do all of the above? We need to document, monitor, enforce and update authorizations. For example, IT staff should not have authority to delete/move files without archivist approval. Authenticity records must be authentic free from tampering but chain of custody is often impossible trustworthiness is assured by maintaining the provenance of the materials, documenting the custodial history and storage environment, what actions have been taken and by whom Access use authorization policies to limit access to specific users who and when remote or in person accessing copies not original monitor who has access at all times, and monitor who has actually accessed the files Digital library systems usually fail when providing access to aggregated archival materials (Fedora, mind or the need to present archival records in contex t. Sensitive or Restricted Data Restricted data/info whose use is limited by law, contract or other legally binding agreement (e.g., donor agreement) credit card, SSN, bank account, driver license

PAGE 10

medical records student records including grades, schedu les, ID numbers, financial, class rosters, essays, correspondence, discipline classified documents in political collections ability to set an object to no public access until a certain date (and then system automatically opens on that date, or alerts archi vist to make change) should be able to redact restricted content by creating a redacted copy (best practice is usually not to redact the original) Digital Forensics interpretation of date/time stamps capture of authentic digital copies extract file met adata and record relationships from original file system important to get informed consent from donor before forensic processing such as creating disk images (captures "deleted" files, browse history, etc.) can use forensic tools before acquisition to iden tify issues that might need to be covered by donor agreement (e.g., so that you can restrict all emails with certain persons) can use forensics analysis tools to get overview of file formats, identify problem items. Normalization after accessioning, fil es are normalized to create access versions (source files are maintained indefinitely for as long as they are viable) e.g., PDF/A Email attachments pose a problem (various file types, possible viruses) who owns copyright authenticity fixity Websites in house/open source solutions can be costly in terms of staff time devoted to web capture many institutions opting for contracts with California Digital Libs Web Archiving Service (WAS), Archive It, etc. Social Media capturing content on fa cebook, twitter, etc. many student groups on campus do most/all of their business on facebook/others Databases constant migration to newest version simple databases can be output to delimited text files convert to XML Hardware/Software Required A f requently used approach is to have 3 processes/stations: 1) quarantine station (receiving point, virus check, check for authenticity, checksum generated), 2) preservation station (normalization), 3) storage station (where both original bitstream files and converted versions are stored also where offsite backups are created) Use a write blocker (Tableau Forensic USB Bridge)

PAGE 11

FTKImager to create disk image. Storage environment: Will physical space be shared or dedicated? Collecting all such open source/acces s tools and archive them too, thereby forming a digital tool shed (e.g., OpenOffice can open old file types) Collecting hardware such as 5.25 floppy drives, Zip drives, etc. to be retained until hardware malfunction Miscellaneous Considerations Do we need to keep original floppies/zip disks/USBs/etc.? We can take photos of physical media and discard after capture donor agreements should cover exactly what digital objects should be preserved. For example, the bits or the files, the entire PST file (incl uding calendar etc) or just the emails. ideally, we should be educating creators about proper records management but reality is that 1) there are numerous creators already sitting on lots of files, and 2) many won't listen the goal should be to minimize th e number of times a file is "handled" (i.e., a file is accessioned/ processed only once)

PAGE 12

Appendix C Summary of North Carolina Site Visits and Interviews With support from the Smathers Libraries I traveled to North Carolina in October 2012 to conduct i nterviews with archivists at Duke University and the University of North Carolina Chapel Hill NC Chapel Hill visit Jay Gaidmore, Jackie Dean, Lawrence Giffin UNC created an archival e records focus group to examine existing analog and digital work flows Curators Workbench (CW) can capture and stage files, generate a manifest with fixity info, arrange folders and objects, migrate custom metadata, export submission packages Tabular metadata is created prior to ingest into CW (e.g., by creator or cur ator). Then CW has a Crosswalk tool which is used to map the metadata to the correct MODS fields (very slick) Each project in CW is built upon an underlying METS manifest, which tracks all digital objects, their replicas, and their metadata. When the proj ect is ready for submission an export function translates the internal METS into a submission package ready for ingest. They use a staging area for processing the files. The staging area can be on server or local computer. Checksums are generated for each staged files. The staged files replicated to a disk resource, but not copied to tape until ingest. Workflow for CW: 1. U nprocessed files and tabular metadata go into CW 2. CW generates METS manifest, see step 4 3. CW captures objects/folders, see step 5 4. METS manifest goes into ingest service in Fedora 5. Objects/folders go into IRODS grid, which includes Staging Storage, Archival Storage, and Access Storage (IRODS is a rules based system) Theoretically, using IRODS they could have rules managing migr ation, virus checks, fixity checks, etc., but right now they haven't done the programming to make this happen. Currently the IRODS grid receives files and digests from Fedora and takes action to preserve them based on rules automatically verifying the dige sts and replicating fiel d s to a remote location. The grid can also be used to verify the integrity of files over time and trigger a repair action. The grid also can do virus scans, format migrations, data subsetting and technical metadata extraction routin es. Standard practice is to use the staging area on the local computer. They do have problem of when to delete the staged files ( shy about deleting the staged files even after the ingest has gone through). They use write blocker (Ta bleau Forensic USB Bridge) and FTKImager to create disk image. Workflow is very case by case. They don't usually separate media from collections or even from analog. The processing archivist will pull media without separation sheet if doesn't hurt context They sometimes flag media and return later. They assign unique ID number to media. They scan content for viruses and sensitive info

PAGE 13

In CW a r epository collection contains all objects, may have a folder hierarchy, may have restricted access A r eposito ry object has : MODS for description, PREMIS XML for events/admin stuff, original files, derived files, folder structure data, manifest with identifiers and fixity info (METS) Duke visit Seth Shaw W orkflow: 1. record accession info in AT 2. create sepa ration sheets if needed 3. assign barcode to media (double if using separation sheets) 4. take photo(s) of media 5. acquire local copy (disk image or authentic copy) 6. scan content for viruses and sensitive info 7. move to dark storage 8. erase local copy steps 3 7 are tracked using a special media database in Sharepoint doesn't use DataAccessioner anymore uses write blocker Archive It for website archiving he loves it very customizable easy to use less expensive than using free software, which ta kes a lot of labor to setup/manage Thunderbird uses this for accessing email FTKImager creates disk image Barcodes all media. He then uses the barcode # as the disk image filename when creating it in FTKimager. Also uses number as the filename for t he digital photo(s) he takes of the media. He has a barcode scanner at desk and whenever he needs to enter the number he simply holds media under scanner. In AT, he has a user defined field in the Accession record that is "Electronic media present?" (Chec k box) that the accessioning archivist checks off. Seth can then run report on which collections have digital, and which one's he has already processed. FTKImager does not give fixity for CDs/DVDs so he uses Jacksum to create a checksum. Identity Finder tool for finding SSN, credit card, etc. ($40/ $60 ) Also uses this program to shred the files ( just delete files) If they preserve files with sensitive info, he puts it on part of server requiring VPN access only. Takes photos of all l abels on media and puts photo image in the same folder as the Disk image. Uses TeraCopy for moving files from local computer to server. Uses Sharepoint to track digital processing U ses WinDirStat application for surveying file directories (good for appraisal, selection)

PAGE 14

Kryoflux tool for connecting 5.25 floppies to USB CD's/DVDs produce ISO disk image files Floppies produce IMG disk image files Drives produce AFF disk image files Almost all media are separated from collections and stored in media storage.