2011 is not only said to be the year of the mobile, but also a key time for cloud computing. As we look to leverage cloud environments for cost savings, among other things, we need to understand how different products and solutions support the cloud. This article focuses on enterprise content management and business process management in a cloud environment, with a specific look at some of the players in these markets.
The Knowledge Economy & the Cloud
We all agree that we are working in a knowledge economy and organizations are striving to create an empowered workforce, with the in-house knowledge capital being critical for a company's operations and performance, not to mention innovation and competitive advantage. Organizations are focusing on how to improve business operations, bring down costs of servicing customers and reduce time-to-market for new products and services.
Workflow automation, straight-through processing, paperless environment, service oriented architecture and multi-channel integration and distribution are a few key initiatives that companies pursue to meet their goals and objectives. Social media and networking is a paradigm shift that is changing the way an organization interacts with its ecosystems of customers, suppliers, partners and employees.
While the social paradigm and its adoption are very critical to today’s business landscape and for the next generation customer interaction model, we will not be focusing on this phenomenon in this article. We will focus on the cloud computing paradigm, especially on how BPM and ECM tools and technologies can be better leveraged through a cloud operating model and architecture.
A lot of progress is being made on how the cloud paradigm can be leveraged to create, store, manage and deliver content and information to users, primarily through Enterprise Content Management (ECM) tools that are available in the market (both proprietary and open source). This helps in the digitization of information required to execute business processes and transactions.
Adding Business Process Management (BPM) capabilities on top of content management and digitization will help work move to users and process participants. It will automate work assignment through skills-based routing, make available the relevant content required for each process step, tracking of pending work and monitoring process performance and user productivity.
ECM & BPM in the Cloud
Solution architectures that combine the capabilities of BPM and ECM are not new, but providing these solutions using a cloud-based delivery platform is rapidly evolving and being adopted in the marketplace. Of the myriad of BPM and ECM vendors who have been in the market for a long time, quite a few claim that their products are cloud-enabled and cloud-ready. These vendors and the systems integrators who provide services around these products are aggressively promoting the cloud-based offerings to various customers, with multiple levels of success. We have been evaluating some of these products like Alfresco, Nuxeo, KnowledgeTree, SpringCM in the ECM space and Appian (Cloud BPM), Pega (SmartPaaS) in BPM space.
This article focuses on some of the important aspects of cloud architecture and attempts to evaluate how these ECM and BPM product vendors are aligned with core cloud architectural principles required to run their software on private and public clouds.
It is expected that readers will have a basic understanding of different ECM/BPM products and cloud computing architecture to appreciate our observations and viewpoints.
Let us look at the cloud architectural features that customers and cloud services providers (we also refer to them as systems integrators (SI)) will be looking for in the cloud version of these ECM and BPM products.
Single Tenancy and Multi-tenancy
SI(s) will be looking for a platform (PaaS -- Platform as a Service) for the following reasons:
- They can build innovative business solutions (SaaS -- Software as a Service) rapidly and sell them to their customer base, especially, Small and Medium businesses. Ideally, they would like to build once and then be able to deploy to multiple customers (also called Tenants in the cloud world) with minimal customization.
- They would prefer to have flexible configuration options to onboard new tenants quickly and efficiently. This ensures that they achieve economies of scale and therefore provide a competitive, pay-as-you-go pricing model.
- These SIs will be interested in the maximum reuse of the application layer/logic across the tenants.
Customers may not be too concerned about the application logic layer (mostly driven by PaaS), unless and until it is transparent to them and the data is isolated. Customers will be more interested to ensure that they are paying only for what they are really using (both compute and storage) and this is usually driven by the Infrastructure-as-a-Service (IaaS) layer.
In a single instance, multi-tenant environment, IaaS and PaaS layers are shared across multiple tenants and the single instance of the software/application logic can serve multiple customers/tenants. But, in a single instance and single tenant scenario, the IaaS and PaaS layers are tenant specific. As the compute power is not shared, economies of scale are not leveraged much and the tenant ends up paying more per transaction. The cost of operating a cloud instance that serves a single tenant will be more than offering the same instance to multiple tenants.
The question then arises -- what is the difference between an on-premise/hosted application and a single instance/single tenant cloud application? Well, because of the inherent and elastic nature of the IaaS layer, you can scale up/down the IaaS resources (like VM, CPU Cycle, Storage, Memory, etc.) that are required to run the cloud application on demand (almost instantly). This will not be possible if the application is running on on-premise hardware, since this hardware is a sunk cost and the customer is responsible for resource utilization, not the cloud/IaaS provider. Metering and billing come with a true cloud provider offering , which will not be the case for an on-premise application or ASP delivery model. So, what is obvious is that vendors need to build their product in such a way that Paas/application layer can support multi-tenancy inherently to run on the IaaS layer. This will help maximize reuse and optimize costs.
Some ECM products like Alfresco and SpringCM, and BPM products like Pega support multi-tenancy, a critical component for multi-company cloud implementations, as it maximizes use of hardware, application platform and common logic. Data isolation is managed by application layer logic for Alfresco, SpringCM and Pega.
- The multi-tenancy (MT) features of Alfresco helps to enable and configure it to run as a true, single-instance, multi-tenant environment. A single instance (installed either on a single server or across a cluster of servers) of Alfresco can host multiple independent tenants. The Alfresco instance is logically partitioned such that it will appear to each tenant as if they are accessing a completely separate instance of Alfresco. Client facing interfaces (like Alfresco Share, Explorer, etc.) are aware of the tenant’s domain, authentication, user context and accordingly, can store and retrieve tenant specific data.
- The Pega SmartPaaS cloud offering inherently supports multi-tenant architecture with respect to access, security, and the execution model. Independent business applications (rules and business process) can be built using the PaaS layer and SaaS applications are segregated by access and organizational rules, but can share non-specific sets of rules to provide common functionality.
- SpringCM supports true single instance, multi-tenant architecture in which the cloud service provider can provision and maintain one instance of the platform that can be used by all customers/tenants.
- Appian (a BPM cloud provider) follows the single instance/single tenant model inherently. No application components and process information is shared across tenant(s) for this product. This works well with customers who have concerns and apprehensions of a shared data model and potential security issues and threats. Adopting a single instance/single tenant model helps in those cases where customers may not prefer the business process/platform layer (PaaS) to be shared with other tenants.
- KnowledgeTree (an ECM cloud provider) used to follow a single instance/single tenant architectural strategy until recently. They currently support multi-tenancy.
- Nuxeo, a Document Management Product does have partial support for multi-tenancy. The Nuxeo (version 5.1) architecture cleanly isolates the repository with domain and workspaces, though it does not isolate user management. As we speak, there is an effort going on to make this product multi-tenant in all respects.
We will deep dive into how ECM software vendors like Alfresco, SpringCM and Nuxeo (partial) enable their single instance cloud platforms to work in a multi-tenant environment. Whenever a tenant is provisioned in the instance, it creates an independent domain (@xyz) so that each tenant is domain-aware and can be identified using its domain name. Domain information will primarily be stored in the schema/database.
Based on the user’s authentication (user1@xyz), a dispatcher sends the tenant-specific, context aware request to the PaaS layer, where the PaaS common logic handles the request and stores/retrieves tenant specific data. The whole process may not be as simple as described though.
The following diagram depicts the conceptual architecture of an ECM/BPM multi-tenant cloud enabled solution at very high level:
Cloud Infrastructure (IaaS) support
Alfresco, Nuxeo and KnowledgeTree have their cloud version (Amazon Machine Image -- AMI) running on Amazon EC2. Customers, end-users, application developers and service providers can sign up for an Amazon EC2 account (including storage space on S3 or EBS) and run their own Alfresco, Nuxeo or KnowledgeTree cloud instances with minimal effort. Amazon provides options for operating systems (Linux, Windows) as well as database (Open SQL, Oracle etc.) Customers can also engage cloud infrastructure providers like RightScale to setup a robust production environment in EC2 very quickly and manage/monitor it efficiently.
The SpringCM cloud platform operates on hundreds of virtual machines in its data centers.
Pega’s SmartPaaS cloud offering is housed in a data center providing best-in-class physical and network security with excellent redundancy, scalability and reliability. SmartPaaS Platform-as-a-Service (PaaS) is developed using IBM middleware on Amazon Elastic Compute Cloud with the support of Amazon Web service (AWS) on-demand technology infrastructure, providing instant scalability and elasticity. Capgemini’s global Cloud Computing Center of Excellence is also involved in this effort.
Appian’s cloud offering (Appian Cloud BPM) standard and premium -- both runs on its own Amazon EC2 or Rackspace cloud hosting environments.Appian’s Standard and Premium offerings offer localized hosting in the United States, European Union or Asian / Pacific regions. Free 30-day trials provided by Appian are available,offering the complete BPM Suite solution hosted in an Amazon EC2 environment.
This is the most critical and sensitive piece of the puzzle that influences an organization’s decision to move to cloud. Let us explore the possible strategies that can be adapted by any cloud vendor.
Separate database and separate schema for each tenant
In this model, each tenant gets its own separate schema and database. This looks very attractive from a customer perspective as it guarantees better security and enables more customization, but comes at a bigger cost as each tenant needs to have its own database infrastructure in the IaaS layer and the cloud application/service provider is not able to capitalize on economies of scale that will be realized if one database schema / instance was to support multiple tenants. Some customers might feel that their sensitive data should not be resident with other tenants’ data. Appian and KnowledgeTree support this type of data architecture.