|
Just like Data-Warehouse, ETL in metadata repository (OR in other words, the plumbing work) is the biggest effort and challenge in the repository initiative. ETL does not only include the original sources of metadata, but also the other independent metadata repositories which need to be integrated (either physically OR virtually).
Few Examples of Sources for Metadata Extraction-
Data Modeling Tools
- Logical and Physical Data models
- Logical Entity and attributes definition
- Physical Entity and attribute definition
- Data Standards
- Domain Standards
- Data Dictionary
Data Integration Tools
OLTP applications
- Physical Data Structures (if data modeling tools do not have this data)
- Batch Job schedule
- Data Interface structure (point to point file exchange across the systems)
Enterprise Reporting Tools
- Logical and Physical definition of report
- Publishing schedule of the reports
Extraction, Transformation and Loading of Metadata
Unlike Data Warehouse, we are not going to devote a separate page for each of the above. The reason is that most of the ETL design concepts can be equally applied to the metadata ETL. You should refer to design concepts related to Extraction, Transformation and Loading before you go ahead. In this page, we will just cover some metadata specific Tips and examples.
Like data warehouse, there will be a staging area, where the ETL process will be executed and then loaded into the loaded area (the metadata warehouse area)
Extraction of Metadata from Metadata sources
TIP- One needs to take care (more than you do in data warehouse), to extract the data as-is. As per ExecutionMiH.com experience base, metadata transformation is generally more intricate. Any mistake in extraction may lead not only to disturbance of certain data records,, but a whole table itself. Just imagine, an application being designed using a faulty metadata lying in central metadata repository.
TIP- Un-like a Data warehouse, you may not need to go for incremental loading, whereby you pick the details on those data records which have changed to reduce the operational overhead. Given that the data volume is not significant, it is not a big issue. Another reason is that some applications may not have a good tracking mechanism OR versioning of their metadata.
Metadata Transformation
TIP- As you do metadata transformation (refer data warehouse transformation design, as they apply to metadata transformation as well), there will be a manual element as well. This manual element primarily comes in the form of entering the Contextual and conceptual level, which typically does not exist in your source systems. One should not underestimate the value of this higher level information.
TIP- We would recommend that you divide the transformation in various stages. Unlike data warehouse, it is primarily the complexity and not the volume of transformation, which is the core issue in Metadata ETL (data warehouse has to struggle with both complexity and volumes). The impact of faulty transformation can be much more than that in data warehouse. By dividing the transformation in different stages, you will be able to control and monitor the transformation and its quality.
Metadata Loading
TIP- Don’t focus too much on volume handling techniques as you do in Data Warehouse. Data volume is not an issue.
Manual vs. Automated Metadata ETL
We do expect in an ideal world for the entire ETL to be automated. Here are some examples of the reasons which compel manual intervention:
- One needs to add qualitative information (like back-ground and purpose of the metadata) at contextual, conceptual and also at implementation level.
TIP - We would recommend the qualitative information to be added manually only at the end of the extraction and transformation, as you will have a cleansed and sanitized metadata to refer to.
TIP- This is for metadata information, which does not need to undergo major transformation. Instead of entering qualitative information in warehouse, you can have it entered in the source system itself. This will benefit not only the users of metadata, but also the people who are using that specific source system.
- Metadata Standards- If you have naming standards, which are not followed by the source systems, you may have a scenario, where these standard names simply cannot be automatically generated. You may have to do manual intervention here.
|