The Society of American Archivists (SAA) offered its “Managing Electronic Records in Archives & Special Collections” workshop in Seattle May 10-11th in support of its Digital Archives Specialist [DAS] certificate.

The class, taught by Tim Pyatt and Seth Shaw, had three workshop goals:

  • Introduction to the basic elements of an electronic records program
  • Develop strategies for working with records creators
  • Understand open source tools available for ingest and management of electronic records.

MER 1 and MER 2

Day one, the class opened with an introduction to SAA’s definition of managing an electronic records program:


Mr. Pyatt and Mr. Shaw introduced the first of many references to open standards and available presentations with an overview of digital preservation: Salo’s Needs Pyramid.

Also, the instructors offered a unique definition of a record: “a persistent representations of activities, created by participants or observers of those activities or by their authorized proxies.”

It’s Only a Model

They continued with an overview of the PREMIS project’s Data Model: Preservation Metadata Implementation Strategies (PREMIS).

“The goal of PREMIS was to define implementable, core preservation metadata, with guidelines/recommendations for management and use.” Essentially, it’s a checklist on the recordness of an electronic record.

  • Provenance: Who has had custody/ownership of the digital object?
  • Authenticity: Is the digital object what it purports to be?
  • Preservation Activity: What has been done to preserve the digital object?
  • Technical Environment: What is needed to render and use the digital object?
  • Rights Management: What intellectual property rights must be observed?

In May 2005, PREMIS released Data Dictionary for Preservation Metadata: Final Report of the PREMIS
Working Group. This 237 page report includes:

  • PREMIS Data Dictionary 1.0: a comprehensive, practical resource for implementing preservation metadata in digital archiving systems;
  • Accompanying report (providing context, data model, assumptions);
  • Special topics, glossary, usage examples; and,
  • A set of XML schema was developed to support use of the Data Dictionary.

The instructors continued their resource survey with the question, “what does it mean to have a digital repository?” See the Open Archival Information System (OAIS) Reference Model, Figure F-1. Our instructors noted: this model is really about committee work. OAIS addresses how to set up a repository and the features it should have.

They continued to InterPARES Book 2 Appendix 14: Managing Chain of Preservation and the California Digital Library and UC3 Merritt Curation of MicroServices project. The Merritt project devolved technical infrastructure function into a set of independent, but interoperable, microservices that embody curation values and strategies.

Since each of the services is small and self-contained, they are collectively easier to develop, deploy, maintain and enhance. Equally as important, since the level of investment in and commitment to any given service is small, they are more easily replaced when they have outlived their usefulness.

“Claims of trustworthiness are easy to make but are thus far difficult to justify or objectively prove,” said Mr. Pyatt and Mr. Shaw. Essentially, greater mutability equates to less trustworthy electronic records.

Our instructors completed the section describing two projects: DRAMBORA and NESTOR. Developed jointly by the Digital Curation Centre (DCC) and DigitalPreservationEurope (DPE), the Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) represented the main intellectual outcome of a period of pilot repository audits using the Trusted Repositories Authentication Checklist (TRAC) undertaken by the DCC throughout 2006 and 2007.

It presents a methodology for self-assessment, encouraging organizations to establish a comprehensive self-awareness of their objectives, activities and assets before identifying, assessing and managing the risks implicit within their organization.

A Different Kind of SIP: Storage Infrastructure and Planning

Strong storage infrastructures have certain attributes: redundancy, distribution, security. Optional but no less important is self-checking/healing. As the instructors outlined examples of storage locations, the class took an unexpected (but not unwelcome) slant towards digital preservation in a university setting.

Storage Issues for Universities

  • Enterprise systems data (financial, student records)
  • Employee desktop data (email, files, etc.)
  • Research data sets
  • Faculty pre-prints
  • Electronic Theses and Dissertations (ETDs)
  • Websites (president, press releases, student organizations)
  • Campus publications (bulletins, etc.)

Personal Digital Archiving

  • Faculty research
  • Manuscripts collections
  • Personal blogs
  • Video, audio
  • Photos
  • Social networks
  • The Cloud and beyond

While the company owns all data in the corporate world, faculty in a university is blessed with some personal autonomy over their records. However, this approach can influence a records and information manager’s success. To work with faculty more closely, the instructors suggested:

  • Advocacy
  • Understand your campus structure
  • Keep the right people informed
  • Don’t assume IT understands
  • Policy vs. custodianship – different age for archivists
  • Don’t be afraid to get in over your head

I Got Your Workflows Right Here

Day two highlighted four digital preservation workflows.

Workflow 1: The Basic Electronic Records Example

Assumptions: this is a media based workflow. The archivist will rely on file reader availability.

  • Accession & Store
    • Survey materials (number of discs, types, known or estimated volume)
    • Create checksums (unique ids per record)
    • Copy the media to another disk
    • Verify checksums
  • Arrangement & Description
    • Considerations: series, depth of description (group, media, and top level folders)
    • Extent: volume, file count, folder count
    • Describe materials (survey contents of files, sampling, Droid format reports)
    • Access restrictions: special hardware/software, local use only
  • Access
    • Researcher use agreement
    • Copy requested material to a reading room computer

The Digital Record Object Identification (DROID) tool is an automatic file format identification tool. It is the first in a planned series developed by The National Archives under the umbrella of its PRONOM technical registry service.

Workflow 2: Forensics Workflow

Assumptions: use a write blocker & FTK Imager

  • Accession & Store
    • Survey materials (number of discs, types, known or estimated volume)
    • Create disk images
    • Virus scan
    • Search for PII
  • Arrange/Describe
    • Consider: Series, depth of description (group, media, top-level folders)
    • Extent: volume, file count, folder count
    • Describe materials (survey contents of files, sampling)
    • Access restrictions: special software, local use only, sensitive content
  • Access Researcher Use Agreement
    • Copy requested material to a reading room computer
    • Provide disk image
    • OR Export data
    • OR Create Virtual Machine (e.g. xmount, Virtual Box)

For a strong example of successful digital forensics, please see the Born-Digital Program at Stanford University Libraries.

Workflow 3: Archivematica Workflow

Assumptions: You have a virtual appliance or Ubuntu repository package in place.

  • Accession & Store
    • Prepare SIP for ingest (format directory, add MD5 checksum file and DC template)
    • Ingest SIP
    • Appraise SIP for submission
    • Quarantine and virus scan
    • Format normalization: preservation and access versions
  • Arrange/Describe
    • Review SIP: delete unwanted files and folders
    • Add descriptive metadata to dublincore.xml file in the metadata folder
    • Attach records (donor agreements, correspondence, etc.) in the metadata/submission Documentation folder
  • Access
    • Upload DIP (access copy) to ICA-Atom: search and retrieval interface
    • AIP (preservation copy) preserved in repository/storage system of choice

Workflow 4: Web Capture Workflow

  • Accession & Store
    • Inventory seed to capture (“Owning institution”, URL, anticipated capture frequency, usernames and passwords, etc.)
    • Configure & run the capture
  • Arrange/Describe
    • Consider: Series & depth of description
    • Extent: Volume, URL count
    • Describe materials
    • Access restrictions: Special software, local use only (for authenticated materials)
  • Access
    • Copy requested material to a reading room computer
    • AND/OR Zip the capture directory & email or copy to a disk
    • AND/OR Create a virtual machine instance with browser and plug-ins appropriate for the captured site.


This class is a very nice summary of available resources on digital preservation. It’s inclined too far towards the academic setting, of course, but that’s expected.

Thanks to anecdotes from the class, I’m intrigued by certain trends.

  1. Archivists want in on the life cycle at the moment of records creation. Once upon a time, archivists collected records at the end of the information life cycle. Today, it’s (finally!) publicly acknowledged: every records and information management professional wants to capture records at the moment of records creation. Unhealthy competition is brewing between records managers and archivists and I am increasingly concerned. It calls the cultural question: when will the true interdisciplinary project, a joint effort between archivists and records managers to identify the exact moment of records custody, happen?
  2. The line between theory and practice is still pretty thick. Archivists debate a lot of maybes. To their credit, they will debate and arrive at a thoroughly well-thought out solution. But even the most progressive institutions in the room could anecdotally report success only because they applied for grant money -- not because their institution provided internal support.
  3. Scale. Archivists bemoan lack of resources. They envy Records Management for its attention from IT. The scale of what they want to accomplish is so large they feel overwhelmed. Implementations seem to chip away at small bits. By the way, they’re also experiencing at least 10 percent failure rate at ingest for floppy disks.

Also, a recommendation: SAA / ARMA / AIIM / EVERYONE: please publish a corresponding twitter hashtag for the workshop/conference/whatever so non-attendees can enjoy the content, too.

Private worries aside, this class is a treat and the instructors are VERY good. You should consider taking a class with SAA: here’s the Continuing Education calendar.

Editor's Note: To read more of Mimi Dionne's archives-related writing: