|
Business Intelligence Metadata Repository
Before we proceed to understand the key BI Metadata architecture scenarios, let's take a quick look at the core and common component. This metadata component of the BI system should be populated and connected across various components such as ETL tool, database, data modeling tool and reporting and querying tool to the extent available facilities of various tools permit. Metadata should be used for scheduling various system management activities, to the extent possible. The sources of metadata are expected to be
- Data modeling tool,
- ETL tool,
- RDBMS and MDDB and
- Reporting and querying tool.
In addition, business metadata can also be stored in appropriate metadata repository. Capacities of BI tools should be exploited for this purpose.
The following diagram depicts the position of metadata repository in BI architecture.

There are two distinct strategies for managing metadata in a heterogeneous BI environment.
- Centralized Metadata
- Distributed Metadata
Centralized Metadata
The following diagram shows a generic BI system with centralized metadata management strategy.

The centralized metadata architecture ensures,
- Standardized metadata across different systems
- No replication of metadata across systems and hence no need for synchronization of metadata across the components used
- No need for maintaining bi-directional connections to be between various tools for metadata exchange
- Minimal effort in system integration.
- Optimal hardware resource requirements.
Distributed Metadata
The following diagram gives a representation of a BI system with distributed metadata management strategy.

The main drawback of distributed metadata architecture is in its metadata distribution mechanism, which is against the Data Warehousing principle of possessing ‘a single version of truth at a centralized location’. While metadata changes less frequently compared to the data, metadata updates are more complex to deal with. This is because metadata updates not only affect the data that is described (e.g. deletion / insertion of a data element) but also other objects due to metadata interrelationships (e.g. referential integrity constraints). Also, synchronization of repositories, which share metadata with each other, needs to be accomplished. In particular, updates of replicated metadata need to be detected and propagated in order to keep this metadata consistent. Furthermore, updated metadata needs to be applied (i.e. integrated) within a repository, e.g. to keep interrelationships with metadata from other repositories consistent.
Comparison
The aim of metadata management is always to achieve the centralized metadata architecture but due to the limitations of the tools and their functionality it may not be possible to achieve it in present time. Hence distributed metadata architecture is seen in most of the implementations. The following table gives comparison between these two architectures.
No. |
Aspects |
Centralized Metadata |
Distributed Metadata |
1. |
Number of repositories |
Only one centralized repository is needed. |
All the tools possess their own metadata repositories. |
2. |
Replication of metadata |
None. |
Sometimes this it is necessary to replicate the metadata across multiple tools e.g. user profiles. |
3. |
Tool independence |
The BI system architecture fully depends upon the tool chosen, as there exists only a single tool to take care of the entire BI system. |
As multiple tools can be involved in this architecture, a set of tools can be chosen to get maximum functionality / facility coverage. But this is usually at the cost of seamless system integration. |
4. |
System Integration |
As only one tool or a set of tools from a single vendor is involved in this architecture, the system integration is seamless and hence compatibility issues of various components do not arise. |
Integrating tools from various vendors is one of the greatest challenges in this architecture. The number of tool-to-tool connections / compatibility issues and the mapping overhead are significant. Usually a POC is recommended to ensure the compatibility among various tools before the implementation begins. |
5. |
Metadata synchronization |
No synchronization is necessary. |
Synchronization of repositories sharing metadata with each other needs to be accomplished. In particular, updates of replicated metadata need to be detected and propagated automatically in order to keep this metadata consistent. |
6. |
Metadata exchange |
No metadata exchange is necessary. |
All tools communicate with each other to exchange metadata generating numerous bi-directional connections. But a few of the tools may not be able to communicate at all or may need tools to provide a channel to communicate with others. |
7. |
Hardware capacity requirements |
Optimum hardware is required. |
Various tools demand different hardware capacities. Accumulated hardware requirement is always larger than what is needed for centralized metadata architecture. |
8. |
Example |
Informatica Suite of Products. |
Architecture using various products like Informatica PowerCenter, Business Objects, Hyperion Essbase, MetaCenter etc. |
The above comparison gives a clear picture of why the centralized metadata architecture yields a cost effectiveness solution compared to the distributed architecture.
|