Tackling Unstructured and Semi-Structured Data

5 minute read
Marisa Peacock avatar

Digital Reff Tackles Unstructured Data
Managing unstructured data is a daunting task. Good thing Digital Reef is here to help. Today, they emerge from stealth mode after two years of developing their unstructured data management platform, achieving early customer traction under the name of Auraria Networks.

Digital Reef plans to manage unstructured and semi-structured data, that is data not already managed by database management systems, by helping large enterprises deal with key business issues that cannot be properly addressed using traditional solutions. Issues such as eDiscovery, data risk mitigation, knowledge reuse and strategic storage initiatives often require a great degree of scalability and performance. The longer left unchecked, the volume of data expands and thus costs companies both time and money and increases risk.

Digital Reef hopes not only that their new approach will garner interest, but that the current economic climate will help them capitalize on the importance of managing unstructured data as a means of saving money.

Where's the Data?

Designed to rapidly address very large stores of unstructured data, without manual effort or disruption to data center or business activity, Digital Reef's solution provides enterprises a set of analysis and classification tools that allows them to manage critical data that they had little or no control over previously. 

Companies can be helped to:

  • Locate specific kinds of data, including sensitive data like Social Security and credit card numbers
  • Identify regulated data for compliance
  • Pinpoint relevant documents for pending legal action
  • Find intellectual property that can be reused for competitive advantage

Everything Needs to Be Managed

Because CIOs don't often know what they have or where their data lives, the act of managing unstructured data has become more difficult. Whether it's documents, spreadsheets, emails or presentations created, modified, copied and distributed by employees or customers, the intellectual property contained within is very important. Companies need to manage all legal, personnel and executive documents.

A Single, Searchable Index

What has traditionally been a "siloed process" is now a single solution, that can automatically crawl data and build a single, searchable index. The index is created using both the file metadata and the entire contents of the file that can be searched across many modalities. By being able to handle massive amounts of data -- up to 4 terabytes a day! -- huge files that would have once cost third party legal firms a lot, are now easily batched, stored and searched.

A Similarity Engine

A similarity engine, as it is called, is capable of scanning through the entire body of an organization's data and determining which files are similar in content, metadata and structure. Similar documents are grouped together and it is indicated why they are grouped. This allows users to readily find information that is most interesting to them. This classification system creates the organization of the data organically, without requiring the design of a pre-determined taxonomy and without any system training.

Under Digital Reef's Hood

Digital Reef's unstructured data management platform aims to deliver a powerful and unique approach to "bringing order to content chaos". Here's a summary of all the features it offers.

Digital Reef Architecture

  • Designed to scale up and down
  • Multi-tiered architecture designed for huge scale and enterprise-class security
  • Federated search index has small storage footprint and unique performance optimizations
  • Multi-tenant, role-based security
  • Operates on commodity hardware
  • Requires minimal system administration


Digital Reef’s browser-based UI

Learning Opportunities

Digital Reef Features

  • Search on metadata and entire contents of files using: keyword, phrase, Boolean, fuzzy and proximity search
  • Discover documents of similar content to an example (or group of documents)
  • Identify exact and near duplicates
  • Reconstruct email threads across the enterprise
  • Detect sensitive or personal information via pattern matching
  • Federate search across the entire enterprise
  • Automatically organize documents based on content similarity
  • Copy, move, delete and transform documents based on selected criteria

Digital_Reef_Rel2 0 4_closeup.jpg

Close up view of Digital Reef’s interface” or“28 million documents and 800,000 email threads in the database

Digital Reef Benefits

  • A single Web-browser UI to find all data assets, wherever they exist in the enterprise
  • Index, analyze, classify and manage massive volumes of data
  • Locate and manage relevant information without being a subject matter expert

Diagnosing Unstructured Data

To help demonstrate the importance of managing unstructured data, Digital Reef also published a white paper of the same name, which outlines the problems of information management, the symptoms that it can manifest, the diagnosis and ultimately the cure.

Suffice it to say, they are confident that Digital Reef is the reliable antidote. Yet, considering that:

...over 85% of all corporate information is in unstructured forms and IDC projects that the growth of this unstructured data (61.7% CAGR) will far outpace that of transactional data

as well as,

Within the next two years companies will spend over $35 billion trying to control the symptoms they commonly refer to as eDiscovery ($4.9B, Forrester  Research), compliance ($9.3B, IDC), storage and archiving ($6.6B, Radicati), and knowledge management ($14.4B, including content management, IDC)

and that,

IDC forecasts that the volume of electronically stored information (ESI) will grow by an order of magnitude between 2007 and 2011

If Digital Reef thinks that they can not only undertake the task of restructuring massive amounts of data, but cure companies of their bad habits while saving billions of dollars, who are we to stop them?

You can learn more about managing unstructured data with Digital Reef or read their CEO's blog, Navigating Unstructured Data.