The Society of American Archivists (SAA) 2012 annual meeting, Beyond Borders, began Monday, August 6 in San Diego with strong pre-conference sessions. I attended Digital Forensics for Archivists (DFA), a course focusing on specific tools and services that archivists need to use for their work with digital archives.
This is one of the many courses offered by SAA in its Digital Archives Specialist (DAS) Certificate Program.
I understand if the emerging partnership between law enforcement and the archival enterprise seems unusual. But consider: digital forensics has established principles, technologies and methods for extracting data and associated metadata that closely parallels archival repositories’ best practices.
In other words, this class is not for the faint of heart.
Enter our hero, instructor Dr. Cal Lee, Associate Professor of the University of North Carolina at Chapel Hill. Prior to class, he distributed two illustrative papers:
- Digital Forensics and Born-Digital Content in Cultural Heritage Collections by Matthew G. Kirschenbaum, Richard Ovenden and Gabriela Redwine with research assistance from Rachel Donahue, and
- his own Extending Digital Repository Architectures to Support Disk Image Preservation and Access, collaboratively written with Kam Woods and Simson Garfinkel
which we dutifully read. Obligation became pleasure as the first treatise unfolded; however, at 109 pages it’s a bit of a tome. In ten pages the second article summarizes the first (let’s hear it for brevity!). I recommend them both.
Motivation and Scope
Dr. Lee opened his commentary with thoughts on motivation. “Archivists are often responsible for acquiring or helping others access materials on removable storage media,” he said. “Often information is not packaged nor describes as one would hope. Information professionals must extract whatever useful information resides on the medium, while avoiding the accidental alteration of data or metadata.”
He defined digital forensics as “the process of identifying, preserving, analyzing and presenting digital evidence in a manner that is legally acceptable.” The practice involves multiple methods of discovering digital data and recovering deleted, encrypted or damaged file information. He presented compelling points as to why archivists should care.
Two streams of activity show great promise for informing the practices of archivists:
- a handful of innovative projects of collecting institutions exploring the application of digital forensics to acquisition, and
- vendors and academic programs providing digital forensics training.”
Dr. Lee explained that digital objects are sets of instructions for future interaction. “Digital objects are useless if no one can interact with them. Interactions depend on numerous technical components.” He outlined the seven levels of representation:
- Level 7: aggregation of objects. A set of objects that form an aggregation that is meaningful encountered as an entity.
- Level 6: object or package. An object composed of multiple files, each of which could also be encountered as individual files.
- Level 5: in-application rendering. As rendered and encountered within a specific application.
- Level 4: file thru filesystem. Files encountered as discrete set of items with associate paths and file names.
- Level 3: file as "raw" bitstream. Bitstream encountered as a continuous series of binary values.
- Level 2: sub-file data structure. Discrete “chunk” of data that is part of a larger file.
- Level 1: bitstream thru I/O equipment. A series of 1s and 0s as accessed from the storage media using input/output hardware and software.
- Level 0: bitstream on physical medium. A set of physical properties of the storage medium that are interpreted as bitstreams at Level 1.
He cited interaction examples of each level. “Archivists fear three complicating factors the most,” said Dr. Lee. “Medium failure/bit rot, obsolescence, and volatility.” He launched into a detailed overview of the where and how a computer stores information: the computer memory hierarchy, sectors, clusters, magnetic disk, hard drive structure, caching, configuration/log files and the increasingly popular solid-state drive as well as areas designed to store temporary data.
- Extracting Insight from Unstructured Data
- Box Cops to Bad IPO Timing, It's Time to Unbox
- Are You Too Old to Work in Tech? IT's Midlife Crisis
- Big Data is Getting Smaller and Smarter
- Who Are the 100 Fastest Growing Software Companies?
- Chaos Reigns at Content Management Vendors
- B2B Marketers: Think More Like Brand Marketers