- What is SharePoint 2010? Vision and Reality
view comments - Knowledge Management in 2012? Probably Dead
view comments - Myths & Realities of Drupal
view comments - iPad 3 vs. New Samsung Tablet: War Starts in February
view comments - 5 Signs Your Company Doesn't Get Social Business
view comments - 5 Critical Steps to SharePoint Information Architecture Planning
view comments - Is There A Business Case For Using SharePoint as an Enterprise CMS?
view comments - Alfresco Enterprise 4: Social, Collaborative, Mobile, Cloud Connected Content Management
view comments
Are You Ready to Manage <em>Semi-Structured Data</em>?
Bill Inmon recently wrote: “For the most part, the worlds of structured data and unstructured data operate as if they were in a vacuum. With a few exceptions, there is no bridge or interface between the two worlds. However if a bridge between the two worlds can be created, it is possible to build entirely new kinds of systems.”
We suggest that bridge involves utilizing “semi-structured” data alongside structured data to quickly and easily enable pervasive business intelligence with a greatly reduced degree of complexity and cost.
There is clear recognition that there exists data that does not fall into the more easily defined categories of structured and unstructured data, but there is no equally clear consensus as to what comprises “semi-structured data.” Most mistakenly assume semi-structured data is just another term for XML. Our view is that semi-structured data goes well beyond XML to include a far more plentiful, far more common source of semi-structured data.
What is Semi-Structured Data?
We define it as business-relevant data which does not follow a fixed schema; it does follow in its entirety an overall implicit structure, but may have some irregular structures, and the data is either self-describing — such as through the presence of labels or headings — or readily deductible.
Using this definition, the primary source of semi-structured data is not XML, but rather, by far, existing reports and business documents, published from enterprise information systems, both within the organization and provided to the organization by external sources. Reports enable the presentation of data in human readable format.
In fact, reports and business documents overwhelmingly exceed XML-coded content in the workplace, as evidenced by unabated paper consumption for the past several years.
Why is Semi-Structured Data Important?
This has several implications, not the least of which is cost: the cost savings from leveraging existing reports as a live data source is compelling. In fact, Gartner estimates up to 40 percent of an organization's typical programming budget is spent simply to perform data extraction and combine data from disparate sources.
With such a massive collection of data, it is clearly in the best interest of the organization to capture the value of this underlying resource. But how?
Virtually every existing data Extraction, Transformation and Loading (ETL) tool and enterprise reporting solution available relies heavily on “structured” data sources, such as “raw” data within production databases.
Yet these solutions ignore semi-structured data, particularly data buried within existing reports, such as ERP, HR/payroll and industry-specific information, which are relied upon to fulfill auditing and industry compliance requirements.
Further, legacy systems typically contain a vast amount of accounting, operational and transactional reports, rich in data already containing logic and business rules.
Report Mining Presents an Alternative
We propose an approach that effectively uses semi-structured data as a source of BI by recognizing, parsing and transforming it into customized structured data — with no database programming skills required.
This approach, which is called Report Mining, capitalizes on both this vast repository of data, and the economies of cost and efficiency that can be realized by putting this information to work in the hands of the right people across the enterprise who need it most.
Report Mining utilizes the data buried within existing reports and automatically transforms it into live data sources, either alone or in combination with additional reports, spreadsheets, databases, PDF files, HTML pages, etc.
Continue reading this article:
Featured Events View all
| Add event
|
RSS
- Feb 22, 2012 – Intelligent Content Palm Springs 2012
- Feb 26, 2012 – SPTechCon - Sharepoint Conference San Francisco 2012
- Feb 28, 2012 – (Webinar) How to Build Great Mobile Websites
- Mar 6, 2012 – Get Social with Microsoft & Telligent in Dallas
- Mar 8, 2012 – Get Social with Microsoft & Telligent in New York
Who's Hiring? View all
| Post a job
|
RSS
- Web Content Manager in Newport Beach at Orange County Museum of Art
- Principal Business Consultant in Paris at Saba
- Director of Customer Success Management in Nova Scotia at Radian6
- Software Engineer -- Media Solutions in Bucharest at Adobe
- Technical Writer in Charleston at Blackbaud
- Interaction Designer in Maryland at Inmedius
- Project Manager in London at Brandworkz
- Sales Director, Consumer Electronics at Synacor

Receive
the Free CMSWire Newsletter
Email It