two people walking against backlit windows
PHOTO: Mahkeo

Large organizations all share a common goal: they want a consistent, risk-based, cost-effective and easily used approach to categorize, classify and handle important content, from creation to disposal. Critical to achieving this goal are a content governance strategy and enabling technologies. 

Unfortunately, most enterprises lack an overall content governance strategy and implement the enabling technologies in an ad hoc, inconsistent and siloed basis. Compounding the issue is often a complete lack of understanding and enforcement of "content disposition" rules, resulting in inefficiency, risk and frustration for individuals, teams and organizations.   

For content governance to succeed, organizations need to implement a comprehensive and easy to use governance plan across all content repositories. Automating as much as possible of information lifecycle management, tagging, classification and the application of security and policy enforcement makes content governance possible at scale. 

Microsoft is the solution many organizations use to create and manage large volumes of their content. Microsoft has been making strides in the area, including its April 2017 launch of the Advanced Data Governance (ADG) platform to address common content governance challenges.

This two-part series will provide an overview of content governance best practices and technology features primarily found within Microsoft’s Office 365 workloads, including SharePoint Online, Exchange Online, OneDrive, Teams and Groups. You can strengthen and extend these best practices to systems outside of Microsoft with third party solutions found in Microsoft's independent software vendor ecosystem. Additionally, we'll cover the important announcements Microsoft made at its Ignite conference related to content governance and the ADG tools.  

Transparent content governance is comprised of four key components:

  • Content Types
  • Metadata Inheritance
  • Retention Labels
  • Information Lifecycle

In this first part, we'll take a look at the role content types and metadata inheritance play in a strong content governance program and put this into context with related announcements from Microsoft Ignite. Part two will tackle retention labels and the information lifecycle and share an overall view of content governance best practices.

Related Article: Microsoft's New Advanced Data Governance: First Impressions

Content Types and Metadata Inheritance

Content types are a foundational element of content governance. When properly defined, content types allow organizations to establish retention and records management rules for each type of document and enable the consistent enforcement of retention policies for documents in multiple repositories. Content types make it possible to manage the metadata and behaviors of a document or item type in a centralized, reusable way. Content types define the overall retention and metadata inheritance rules and these are often maintained by user departments who are most familiar with the content.

Governance teams — which should include business representatives from the applicable user departments, along with legal and compliance representatives — need to create an inventory of content types for the organization. This inventory should also include retention requirements to support business needs. Building this foundation makes it possible to automate governance rules and ends the usual reliance on users choosing whether or not to follow an organization’s content governance policies.

Related Article: Ask Digital Experts to Help Define Your Digital Policies

A Metadata Inheritance Model

Content type-based metadata inheritance works like this: when a unique document is uploaded to a content repository, the metadata or rules associated with either the library or the content type can be inherited into the document’s metadata. So instead of users needing to understand how to populate metadata, the document can be automatically populated with metadata based on rules in the content type when it is created or uploaded. 

The example framework below illustrates how a typical customer uses content types:

Example Content Type Hierarchy
Figure 1 - Example Content Type Hierarchy

SharePoint Online recognizes content types as logical objects to which documents can be assigned. These logical objects are associated with a set of attributes or properties. The following examples will help flesh out the relation and interaction between these related object types.

Content Type: Document

A high-level content type is Document. This Document content type exists as a default in SharePoint, and contains basic metadata common to all documents in both SharePoint Online and Microsoft Office, containing properties like “Name,” “Title” (of the document) and (Last) “Modified Date.” System-managed metadata is maintained in OneDrive and SharePoint Online for some fields; e.g., when the document is modified, the last modified date is updated. The system fills these properties, users cannot modify them.

Related Article: Is SharePoint Your Best Choice for ECM?

Content Type: Client Document

The next level of content type in the example is the “generic” Client Document. This content type represents the enterprise (or Base Client) document and can contain properties common to all documents such as “Owning Organization,” “Lifecycle State” and “Security Classification.” Default values can be set for these such as: Lifecycle state can default to “Temporary.” Security class can default to “For Official Use Only,” and Office of Record can default to the organization defined in the user’s profile.

The concept of inheritance is one of the key properties of SharePoint. In this example, the Client Document is a child of a base Document, so it includes all the properties of a base document (“Name,” Title” (of the document), and (Last) “Modified date”), as well as the properties described in the previous paragraph. For example, if an organization implements the example generic Client Document, it can set up any final document managed within SharePoint to include attributes like “Office of Record” in addition to base properties like “Last Date Modified.”

Organizations can share content types across site collections in their deployment by using a Managed Metadata service to set up content type publishing. Content type publishing helps organizations manage content and metadata consistently across their sites, through centralized creation and updates of content types, which can then be published through an update to multiple subscribing site collections or Web applications.

Figure 2 - Content Hub Service
Figure 2 - Content Hub Service

Content Type: Functional Level

The next Content Type describes content types at the functional level. These content types will be children of the generic Client Document, meaning they inherit all of the properties described thus far. But they will include additional information. In this example, Engineering, HR and Finance are Organizational content types.

Figure 3 - Example Content Type Hierarchy
Figure 3 - Example Content Type Hierarchy

The figure above illustrates how content type-based metadata inheritance leverages an enterprise content type, organizational content types and a local content type to associate a retention label with the creation of a specific piece of content, such as a Process Design Method or a Plant Operating Manual. If you have local content types, which are, by definition, local to a site, there is no way to manage the enterprise labels assigned to these content types if they are not in the Content Type Hub Service.

Related Article: Your Intranet Is Only as Good as Your Metadata

Microsoft Ignite Content Governance Announcements 

Microsoft made many announcements related to content types and metadata inheritance at Ignite.

  • Microsoft introduced a capability to support records managers as they create file plans using the new Office 365 File Plan API. With the new capability, businesses can manage retention labels that are then applied to all Office 365 workloads. Nishan DeSilva, principle engineering manager lead at Microsoft spoke of Information Governance Solutions (www.askvirgo.com) demonstration of a File Plan tool that enables its customers to define and maintain a File Plan in the Cloud and publish retention labels to the Advanced Data Governance Service in the Security and Compliance Center in Office 365.
  • A new Keyword Query Language (KQL) command was revealed that enables organizations to assign retention labels based on the assignment of a content type.  This will be very useful to organizations looking to assign retention labels on the ingestion of content or based on a business process.
  • Shilpa Ranganathan, a resident machine learning (ML) expert at Microsoft, shared how the Office 365 security and compliance solution tool kit can modernize data sets by leveraging ML analytics to provide key insights and increase the breadth of coverage to wider range of data types. She demonstrated important new autoclassification based on the machine learning, which can be trained to automatically and accurately classify the organization data based on the comprehension of the semantic meaning of the data. This new capability also brings the ML to features like ingestion on ROT analysis, classification and supervision to make them smarter. Microsoft is also delivering out-of-the-box auto-classification classifiers to bootstrap this ML capability for organizations.

Ignite also included many general announcements related to content governance and records management.

  • Microsoft is implementing real records management for all of Office 365 that is built for the largest companies. Vivek Bhatt, finance manager at Shell Oil, spoke on stage to share the company's experience here. He indicated the company is all-in with the Microsoft cloud and the use of Advanced Data Governance to manage its content. Shell has successfully migrated most of its legacy ECM content to Office 365 from on-premises and competing ECM platforms and is systematically implementing the capabilities that organizations have long used to define Records Management, including:
    • Events: An event in an ERP or line-of-business system is tied to the beginning of a records management disposition lifecycle.
    • Hierarchical file plans and retention schedule can be used to maintain and publish retention labels which are then applicable to all of the Office 365 workloads. This enables, for example, emails to be managed and disposed of in place in Exchange Online instead of being exported to another system.
    • Dashboards and Intelligent Alerts: The volume of electronic information is too big to be manually managed. Microsoft is implementing the controls to manage this information using filters, alerts, autoclassification, dashboards and machine learning.
  • Content Boundaries enable different labels and policies to be applied and managed based on geography, organization or other regulatory requirement. This enables centralized records management with local exceptions, an important requirement of global organizations.
  • Microsoft Information Protection (MIP) is the evolution of Azure Information Protection which will apply to content outside of Office 365. The Roadmap is to enable labels for share drives in Azure, but this is not likely to be achieved in the short term.

Check back tomorrow, when we'll explore retention labels, the information lifecycle and share some content governance best practices.