For as long as I can remember, the definition given for Metadata is “Data about Data”. We have all said this in interviews, heard it from candidates, seen it on presentations, and (almost) always nodded our heads in agreement.
In the transaction processing world, where “data-in” is the paradigm, the definition is precise. The databases store the business data in the relational format and the system tables / catalogs describe the structure of that data – the columns, type, size, etc. This data about the structure of business data is “Metadata”.
In the Business Intelligence world, that definition of metadata is incomplete. A more precise definition of metadata has two components:
Metadata in BI = “Data about Data” + “Information about Information”
The first component “Data about data” is “Technical Metadata” and is similar to the metadata in the OLTP world. Having said that, the technical metadata in BI is arguably more complex, as it not only encompasses the databases but needs to cover the ETL and Reporting tools as well. Each of the tools in the overall BI landscape has its own metadata and this data has to be looked at in a comprehensive fashion to understand data lineage etc.
Even among BI tools, there are different categories – Tools that expose its metadata completely, tools that gives an handle to its metadata thro’ pre-defined APIs and tools that do not allow any access to the metadata. Given the industry direction and the evolution of Common Warehouse Metamodel (CWM) compliance standards, it is only a matter of time before the tool architecture is designed to expose the technical metadata. CWM is a fascinating topic of its own and you can get a feel for it by visiting this website: http://www.omg.org/technology/cwm/
To me, as a BI practitioner, the second piece of the metadata puzzle is more interesting. “Information about information” aspect of metadata is “Business Metadata” and understanding it is crucial to implementing the BI vision in any enterprise.
As an analytical information consumer, there are 2 important requirements:
- Need direction to access the required analytical content
- Where can I get Sales by Product for different locations over the last 2 years?
- Am interested in Customer related Analytics. Where do I access it?
- Once the content is retrieved, need guidance on how to make sense of it
- Report shows Forecasted Sales for next quarter in the chart. How is this value calculated?
- Does the total inventory value displayed in the report include the Raw material inventory or does it exclude it?
Business metadata when properly organized should provide direction to both the points mentioned above.
Metadata management in BI deals with integration of technical and business data in a way that is useful for the organization. The challenge of metadata management becomes even more daunting when one considers both structured and unstructured data. Having said that, it is important for BI practitioners to understand the true nature of BI metadata and provide implementable solutions in their specific organizational context.
In my future posts, I would discuss this fascinating area of Metadata management, with its manifestation as “Technical and Business Metadata” in both structured and unstructured data domains.