With the expected explosion of email alone in the next 6 years, there needs to be a better way to perform discovery than keyword searches. The answer may be in visual analysis.

The annual joint meeting of the Council of State Archives (CoSA), the National Association of Government Records Administrators and Archives (NAGARA), and the Society of American Archivists (SAA) continued at the Marriott Wardman Park today as an extraordinary panel predicted electronic discovery methodologies in 2015.

Dr. Vickie Lemieux of Cifer (Centre for the Investigation of Financial Electronic Records) and Donald C. Force, PhD candidate at University of British Columbia, were moderated by Mr. Jason R. Baron, the Director of Litigation, Office of General Counsel of the National Archives and Records Administration (NARA).

Baron, who represents NARA to the Sedona Conference ®, opened the session with a single comment on the projected growth of email: by 2017 the number of emails collected from all presidential administrations will grow to a billion. As the records and information management professionals attending the session absorbed silently this painful truth, Force stepped forward to provide baseline arguments for support of records professionals in e-discovery.

E-Discovery Needs Records Professionals

Force began with an overview of the Federal Rules of Civil Procedure’s Rule 37e, or Safe Harbor clause. He acknowledged there is a direct correlation between Rule 37e and Rule 26e. Everything is discoverable and if it’s not privileged then it should be produced.

One of the major issues around discovery is if a company must provide then it should preserve. The issue of preservation is within the context for litigation’s sake. Yet it’s almost a Catch 22: if electronically stored information is short-lived, how does a company rise to the duty to preserve on short-term storage locations, like flash drives? Courts acknowledge two certainties:

  1. Not every scrap of paper should be preserved. Perfect is not expected, but determined on a case by case basis; and,
  2. Failure to preserve could demand spoliation and further sanctions.

Sanctions can be light or severe. Today they occur on case-by-case basis, but are usually in the form of additional discovery; cost-shifting (popular with back-up tapes); fines; special jury instructions; preclusion (a judge prevents a party unable to produce documents during discovery from presenting them at trial); and default judgment/dismissal. Force cited several cases for the audience to think about, including:

  • Jeffries v. Chicago Transit Authority (IL, 1985)
  • Doe vs. Norwalk Community College (CT, 2007)
  • Phillip M. Adams & Associates, LLC, v. Dell, Inc. (UT, 2009)
  • Wilson v Thorn Energy, (NY, 2010)

Safe harbor, he concluded, depends on the boat. Courts look favorably upon a party with well-documented, applied and suspended retention and disposition policies. Communication with the legal team, IT and administration is essential.

Can Visual Analysis Help?

Dr. Lemieux built upon Force’s legal citations to show the audience the path beyond keyword searches. In an era of exabytes and petabytes of data, she asked, is a technology called visual analysis the answer to ediscovery challenges?

Visual analytics is an art as well as the science of analytical reasoning augmented by human factors and data analysis against massive data spaces. It combines computation fire power with visual representation -- it crunches and represents data so the viewer can interrogate it. Visual metaphors range from bar charts to galaxy views. Social network analysis lends itself to network diagrams while content analysis diagrams present in clusters, bubbles, and galaxy views.

The four cornerstones of visual analysis are:

  1. Data representation and transformation
  2. Analytical reasoning
  3. Visual representation and interactions
  4. Production, presentation and dissemination

Visual analysis can be applied in four ways of concept searching, automatic document grouping, de-duping techniques and email coding. Visual analysis is moving into ediscovery and Lemieux offered a few projects for consideration including The University of Maryland’s retrospective analysis of email rhythms over time and Maria Esteva’s collections of unstructured Spanish documents that might be discoverable using the concept of archival bond.

The process is straightforward. Bring the data set into the visual analysis application (perhaps PNNL’s Starlight or In-spire). The objects are pre-parsed with xml data. The operator applies the algorithm to cluster and the reviewers survey relevant cluster(s) to the case. This is performed iteratively until the team is satisfied.

The advantages: with key word searching, the team has to know what it’s looking for (is there a needle in the haystack and what does it look like?). With visual analytics, the team can see the entire universe of data collectively. But the applications are not user friendly. The team needs a technology expert and a domain expert to analyze which actually provides more diversity.

The negative becomes a positive because teams interpret data differently and each ediscovery case is different. Lemieux closed her presentation with a prediction. Despite high expectations, have always lagged behind technological advances.

After today’s learning lessons in e-discovery, the approach will switch. The first pass will be visual analytics and the second pass will be keyword search analysis.

The CoSA, NAGARA, and SAA 2010 Joint Annual Meeting will continue through Sunday, August 15, 2010 at the Marriott Wardman Park in Washington DC.