see the trees not the forest

Digital Records Are More Than the Sum of Their Parts

4 minute read
Paul Cleverley avatar

Increasing volumes of digital records (aka ‘big data’) have led, through information practices, to very large managed information collections where the whole may be greater than the sum of its parts.

The collection is an artifact (an aggregate object) that may have the potential for emergent properties. Whether the records collection resides in SharePoint, another electronic document management system (EDMS) or the file-system is probably irrelevant because these are mainly just storage container buckets.

Creating From Raw Ingredients

Combining the triplets of statistical vector space, knowledge representations (such as ontologies, taxonomies and authority lists) and natural language processing rules creates a recipe that has the potential to turn raw ingredients (text) into something more than the sum of its parts.

This may provide the potential to produce (through text analytics techniques embedded into search based applications) differentiating insights through latent associations and trends between words that are not present in any explicit single record — in collections too large for a human to practically read.

Whether this potential is actualized may depend on whether the organization has the means to make that transformation.

Discriminatory Associations

Consider a scenario where an entire collection of digital records are automatically analyzed and presented to the user through a series of algorithmically constructed search driven prompts to browse.

Where these categories have been constructed from the text, rather than superimposed a priori categories, where we can be blinded by what we know. This could be significant in terms of human information behavior and search outcomes, as the business professional is no longer limited by their own agency in terms of a priori knowledge of keywords or the a priori knowledge of specialists creating pre-defined categories/taxonomies as a means to explore and discover new knowledge.

Similar Associations

Consider another scenario where an entire collection of digital records are automatically analyzed in a way to support analogue hunting. So a business professional can enter a context into a search system which returns ranked search results lists, not of records (containers), but of entities (salient answers) within the text of those containers that are most similar to the context. These entities may include names of people, places, companies, projects or technologies.

These similarities can be created automatically by using the statistical patterns (fingerprints) of the words that occur around the names of the entities within your records compared to the words entered in your context.

The search result suggestions are a product of the entire collection, a form of collective intelligence.

It is not contained in any single record. It may surface hitherto unconsidered options for business professionals to support decision making under uncertainty.

Learning Opportunities

In these scenarios, the algorithms become a type of epistemology in themselves; how people come to know.

Workplace Research

Studies within two quite different organizations evaluated how algorithms to produce both the discriminatory and the similar might stimulate new knowledge creation. To what extent something is similar or dissimilar, is a comparative technique. It is the whole information collection that acts as the base ‘reality’ to which something can be compared.

Examples where the ‘whole’ surfaced new insights were documented. When business professionals compared the value of these techniques to their existing search tool functionalities deployed in their organizations, a statistically significant difference was found. The potential to make significant improvements was identified.

Using the Knowledge We Have

Many organizations have deployed sophisticated enterprise search and discovery tools and in conjunction with robust information organization practices have made significant improvements in finding information.

However, there is evidence that some publicly available techniques to stimulate new knowledge creation from information collections as a whole, have yet to be widely recognized and deployed.

This may present an opportunity for information and knowledge management practitioners to further exploit their information asset. As it is often said, it’s not what you have that’s important, it’s what you do with what you have.

Title image by Barn Images

About the author

Paul Cleverley

Paul Cleverley is a geoscientist and practitioner turned information scientist who works as a researcher with the department of Information Management at Robert Gordon University in Aberdeen, UK. He also maintains a blog, Enterprise Search & Discovery: Systems Thinking.