The Society of American Archivists (SAA) Annual Meeting continued the second day with a focus on interdisciplinary teams and long-term preservation planning.

The chairperson and panelist included:

Chairperson Paul Jordan, International Monetary Fund
Hannes Kulovitz, Austrian National Archives and Vienna Institute of Technology

Hannes Kulovitz, Austrian National Archives and Vienna Institute of Technology

Mr. Kulovitz’s presentation, Lessons Learned in Preservation Planning, introduced the supporting reasons for a preservation plan, thePlanets’ long-term preservation planning tool Plato, and the components of a preservation plan.

According to Kulovitz, organizations should invest in long-term preservation because the quality of actions fluctuates across different tools; the properties of digital objects are always different; requirements vary across different users and usage scenarios; and organizational preferences, costs, risk tolerances and technical constraints are different for every organization and technical environment.

A preservation plan defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a given set of digital objects or records. To ensure digital content remains accessible to and authentic for future users, a plan must be created that takes into account legal and technical constraints such as storage space, infrastructure and delivery, copyright issues, costs, user needs and object characteristics. The preservation plan includes:

  • the preservation policies,
  • legal obligations,
  • organizational and technical constraints,
  • user requirements and preservation goals,
  • describes the preservation context,
  • the evaluated preservation strategies and,
  • the resulting decision for one strategy.

Provided that the actions and their deployment, as well as the technical environment, allow it, this action plan is an executable workflow.

Kulovitz emphasized the need for practical implementations designed by an interdisciplinary team.

Although he listed plans for digital preservation of video console games, interactive multimedia art, electronic theses dissertations and bitstream preservation of digital photographs, in this session he cited three preservation planning case studies in scanned images. An excellent technical report on the three cases can be found here. Pay particular attention to the requirements trees.

First Case Study

In partnership with the British Library, 2 million images in TIFF-5 format with a size of about 40MB per image (80 TB of storage) were scanned from old newspaper pages. The project framework included transfer of data to a new carrier, valuation of adequacy of current methods and high-level requirements. The requirements of the newspaper collection resembled other scanning projects you’ve heard: lossless compression only (lossy not allowed); the target format was standardized, and storage costs were reduced. Although several destination formats were suggested and weighed, the successful format was JPEG 2000 lossless compression using ImageMagick.

Second Case Study

In cooperation with the Bavarian State Library, the task was to preserve a collection of digitized 16th century printings and to analyze the benefits and drawbacks of migration from TIFF6 to JPEG2000 with currently available tools. The books held 21,000 prints (about 3 million pages). All pages were stored in TIFF-6 and totaled 72 TB. A comprehensive requirements elicitation workshop resulted in an objective tree that addressed multiple questions (ubiquity, OCR-possible, stability, creation of pdf possible, standardization, licensing, retain filename, open source, duration, log output, image size identical, additional metadata, etc.). Of special note to this project: JPEG 2000 has challenges with color and costs. The Bavarian State Library pays to retrieve and reingest files, but not store, so migration had significant costs attached to it. The Library decided to keep the images in their original format.

Third Case Study

The team worked with the State and University Library Denmark to analyze a collection of scanned pages from a series of yearbooks, transferring the masters from several GIF formats. Analysis and evaluation led to the recommendation to migrate the images to TIFF-6 despite the growth in file size. This was an evaluation exercise only.

The team acknowledged several lessons learned:

  • Preservation projects have important commonalities and important requirements, but each institution is different. The team must understand the organizational context, its governing mandates and legislation, the organizational policy and its user community. To scope the preservation plan, remember the formula bit-stream preservation + process = costs.
  • The relevance of having a policy (market survey by planets in 2009) cannot be underscored enough. A marketing survey by Planets in 2009 certified that most organizations are aware of the issue of digital preservation, but only about half are actively planning. A policy is more likely to be formulated when organizations need to maintain large volumes of variety of objects. Yet having a policy may lead more easily to obtain more budget for digital preservation.
  • Define requirements carefully. Requirements are vital for a preservation plan. The planning approach must be flexible. Do not focus on a specific preservation strategy, nor tailor your requirements to a solution in mind -- fix the problem, not the solution. Naming requirements are often difficult because the team is interdisciplinary.
  • The planners must understand the characteristics of the collection and its digital objects. Different stratification strategies center around file types, time to migrate, content and size, file size versus relative file size, image height, image width versus image size, just to name a few important components. Megabyte, processing time, duration, choice of the unit is not always easy. The right choice of sample objects is important. Start with the most at-risk objects, but only perform a sample test.
  • The planners must understand what objects they have and how to measure them. Define the collection (a homogeneous set of digital objects). Have sample objects ready with basic metrics. Build a requirements tree to document the needs that the future file format and preservation strategy shall meet. Understand the infrastructure: past, present and future technology plus potential methods and strategies. Assign measurable units and consider objectives scales versus subjective scales (for example, euros per object) and find consistent ways to measure and to determine what has been captured or lost. By default all requirements are equally important, but remember that institutions are often reluctant to recognize that not everything can be preserved.
  • A transformation of measured values to uniform scale will project the degree to which requirements shall be met. The requirements are weighed using utility analysis. Each requirement has a unified scale of 0--1 or 1--5 and a unit (pixels, euros, ordinance of yes/no). After careful analysis, if the element equals zero, it drops out as a requirement, but weightings can have an influence on ranking at the end.
  • Preservation action shall be applicable to entire collections, but if the preservation plan is for a heterogeneous set of digital objects, then the team should create a set of preservation plans.

Final Thoughts

In conclusion, preservation planning tries to find the “optimal” preservation solution. A simple, methodological, repeatable, model to specify, document and evaluate requirements is a basis for well-informed and accountable decisions. The approach can and shall be applied iteratively (for example, if it takes a second to migrate a MB, but the object loses information, what can the team build back in?). Decision makers need to be clear on when to start a plan and what its scope should be.

Editor's Note: You may also be interested in reading: