In the first session of day three of the Society of American Archivists (SAA) Annual Meeting, the discussion continued on the collection development in the area of born-digital papers, and the management of born-digital assets.

The chair and panelists included:

Chair, Mark A. Matienzo Yale University
Panelist, Simon Wilson University of Hull, United Kingdom
Panelist, Peter Chan Stanford University
Panelist, Gretchen Geugen University of Virginia

For insight into the project, check out the team’s blog here.

AIMS Background

Geugen introduced the project. “The AIMS project, funded by The Andrew W. Mellon Foundation, represents a co-operative strategy among four partner institutions, to energize collection development in the area of born-digital papers, and to empower librarians and archivists in the management of born-digital assets. The four partners in the project led by the University of Virginia are Stanford University, University of Hull and Yale University.” The AIMS framework consists of four parts:

  • Collections development,
  • Accessioning,
  • Arrangement and description, and
  • Discovery and access.

Preservation is not expressed outright because it’s considered a natural part of the framework.

Guegen defined collection development: institutional policies and actions to bring in material for end-users with a special focus on prioritizing and developing relationship with creators.

Elements of collection development include:

  • A non-negotiable prerequisite -- establishing the collection policy.
    • What types of materials will the institution collect?
    • What part of the lives are you collecting?
      • Consider that these individuals will have private, work and public lives, as well as external communities and content.
  • Donor relationships are parallel to analog, but born-digital materials and technology need to be determined at an early stage.
    • Review data creation with the owner.
    • The archivist should use a donor survey as a prompt only.
    • This is enriched curation.
  • Enhanced feasibility.
    • Try a test capture.
      • Does the team have the infrastructure in place?
      • If the team hasn’t encountered the records creation software before, digital archivists should triage to diagnose technical concerns.
  • Negotiate all agreements.
    • At this time precedent doesn’t exist. Remember: restricted access in the reading room does not equal unrestricted access online.
    • Copyright agreement is key.
    • The collecting repository will be the “sole” repository of born-digital materials.
    • Understand the capabilities/limits for capturing born-digital materials.
    • Plan the preservation strategy and the technical capabilities.
    • Delivery capability and limits.
    • Ask how and with what will files be restricted or deleted.
    • Establish a creative process and relationship with born-digital materials.
  • Prepare for accessioning.
    • Scope extent, time, appraisal, new methodologies and enriched curation.
    • Test the capture.

Matienzo continued the slide deck, explaining that accessioning occurs when the archival institution takes physical and legal custody of a group of records from a donor and documents the transfer in a register or other representation of the institution’s holdings. Within the AIMS framework, accessioning means processes which establish physical, administrative and intellectual control over transferred records; assessment and documentation of future needs; documentation of actions taken; and the beginning of safe storage and maintenance.

Archivists must be diligent and understand their actions, Matienzo confirmed. If they can’t provide control over the records, they’re not successfully accessioned into the repository. With computer media it’s even more complicated-- the institution may want to reaccession.

Elements of accessioning include:

  • Prerequisites;
  • Transferring records and gain administrative control;
  • Physical control and stabilization;
  • Intellectual control and documentation to support further processes; and,
  • Maintaining accessioned records.

Archivists are urged to establish partnerships to transfer the knowledge from creator to institution. Archivists should be familiar with the concept of physical control and transfer formats -- especially be aware of the media’s condition (think what would happen to the entire digital collection if you introduce records laced with a virus). Gain usability of the records in the intellectual control phase. Prepare the records to be maintained over time. Finally, secure the storage location.

Case Study: Re-Accessioning at Yale

The case study is a collaborative capacity building across two repositories: Manuscripts and Archives and the Beinecke Rare Book and Manuscript Library. The goal: to accession a wide variety of records creators (common types of media: floppy disks, optical media, zip disks, USB flash drives) and address previously received accessions containing electronic records on media. The project is still in the testing phase.

The goals of reaccessioning include identification, documentation and registery; risk mitigation of media deterioration and obsolescence; and basic metadata extraction from the media’s filesystem. Next, Matienzo demonstrated workflow: the accessioning process, retrieving media, assigning ids to media, write-protecting media, recording and identifying characteristics of media in the media log (a SharePoint list containing unique identification of media, records physical and logical characteristics, and success/failure rates of transfer), creating images, verification and recording results.

He recommended that archival teams pay particular attention to disk imaging. Using a “forensic” bit-level imaging process ensures data on the media is not manipulated. Use software to acquire image and employ a hash-tag approach. If imaging is successful, extract the metadata that can be repurposed for descriptive administrivia and technical metadata. Use command-line tools such as SleuthKit or fiwalk. XML is only appropriate for long-term storage. Package and transfer using BagIt.

Wilson continued the presentation describing the purpose of arrangement and description: to apply context and intellectual controls and provide a means of discovery. He outlined the SAA definition, which minimizes the amount of handling. Within the AIMS framework, arrangement and description are processes which establish intellectual control of the material -- including implementation of policies and procedures. Once again there were prerequisites. The plans for processing included gathering supporting information; files captured from media; converting files; appraisal strategy; assessing arrangement options plus considering preservation issues. Processing equaled implementing arrangement strategy and adding descriptive data as well as preparing for discovery and access.

Case Study: The Papers of Stephen Gallagher

In 2005, Gallagher gifted hard copy archives. In 2010, 14,320 files, 13.6 GB of born-digital materials. The objective of this project is to create an integrated catalogue to accommodate paper, born-digital and future accruals. The approach: the current work equaled higher priority. Each work was considered a distinct “project.” The hierarchical structure reflected HIS WAY OF WORKING. The team built archival principles of control that creator, archivist and user could all understand. Organization at the series level was the most logical solution (all related files were placed in the series -- a reasonable return for the effort, according to Wilson).

300 files were created using FinalDraft screenwriter software. The team purchased the software to view files as created so they could identify appropriate formats for long-term preservation. Issues arose immediately. For example, the team didn’t know that the title-page was in the file but not displayed. Questions concerning copyright and third-party content, the commercial implications of whether or not access via repository equaled publication, and the re-purposing of work from one (unsuccessful) project to another were just a few indications of the complexity of this project.

They acknowledged that each collection was unique; therefore, approaches could vary. Should they integrate born-digital material with existing materials and their respective arrangements? Or should they one-off collections? Are there likely to be subsequent accruals? Collection types differ for personal papers and organizational records. Should the same personnel work on paper and born-digital components? Can we appraise without knowing the contents? The sheer volume of material -- repositories always face the depositor’s perception that storage is cheap.

Hypatia is an initiative to create a Hydra application (Fedora, Hydra, Solr, Blacklight) that supports the accessioning, arrangement and description, delivery and long-term preservation of born digital archival collections. The key features are identified (drag and drop to create the intellectual arrangement, ability to return to original order of the material, the ability to view some file types to add descriptive metadata, and a high level of granularity); next, Hypatia needs an intuitive graphical interface.

Chan concluded the presentation with commentary on discovery and access, which refers to the systems and workflows that make processed or unprocessed materials and the metadata it supports available to users. The goals of discovery and access:

  • To make material available to user communities;
  • To apply appropriate access restriction; and,
  • To provide access.

Case Study: Stephen Jay Gould Papers

These materials are extensive and created in both analog and digital formats. Facet browsing was applied to the born-digital (that is, assigning SSData labels to files which then became facets). In order to see contents on the Web, Chan combined the files in HTML to see a visual of, for example, a WordPerfect file. Chan admits he is very fortunate to be at Stanford where there is no difficulty in asking the Computer History Museum if he may borrow a machine to read punchcards or build a 5.25” floppy capture station.

Among the many impacts of collection development, he found there were no restrictions to file formats, computer media (even if they’re older technology like punch cards or floppy disks), or file types (be they computer program, data set, document, or a spreadsheet). The cohort agreed on and sought permission to post contents online. Chan invited the public to annotate.

Accessdata FTK was used to search files with restricted information, annotate files with appropriate descriptive metadata and rights metadata (note the use of fuzzy hash -- assigning closeness between two files). Transit Solution was used to transform files to HTML display. Chan wrote the XSLT transforming programming. Basic programming was written to ingest, and he created network diagram for 50k emails.

As the session concluded, an important question came from the audience during Q&A. Are there tools to analyze the amount of time it takes to ingest materials? asked an audience member. No, and we need them because it gets more complicated, the team responded. A lot of the collections at Yale are hybrid -- that led us to overthink processing of digital component, Matienzo said. We were too detailed -- we needed to be strategic and focus on accession.

To learn more about the 2011 program for the Society of American Archivists Annual Conference, visit SAA’s website here.

Editor's Note: You may also be interested in reading: