Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Data Warehouse Metadata Data Warehouse Data Quality assurance  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  Data-Warehouse/Mart  →  CHAPTER -  DW Design & Architecture  → 

Back-Room Data Warehouse Metadata

Back-Room Metadata spans across the Data Source and BI Technical Metadata areas and hence occupies a large scope. It encompasses the ETL metadata, data model, security profiles and audit/usage details.

This page is an extract from BIDS Metadata Management Solution authored by Kamlesh Mhashilkar- Head, Execution-MiH Services of Tata Consultancy Services

Back-Room Metadata Overview

Back-Room Metadata spans across the Source Systems and BI Technical Metadata areas. This is primarily utilized by the administrators, developers and designers. The mappings between data source and system technical elements form the base of the ETL metadata, which is used for the data load purpose. Also the end user data usage at various levels (i.e. report, table, column level) is needed for performance monitoring and tuning purpose. These metadata components primarily come under the back-room metadata.

ETL Metadata

In ideal scenario, end-to-end data loading process should be metadata driven. Otherwise a lot of manual intervention is needed to run the data load jobs and carry out the changes in the existing components. ETL metadata principally gets divided into two areas.

  • Control Metadata
  • Process Metadata

Control Metadata

The metadata deployed for controlling Extraction, Transformation and Loading processing is named as control metadata. It mainly contains the scheduling and data audit / reconciliation information.

Scheduling metadata drives the triggering of the ETL jobs and the check on data receipt. The streamlining of various dependent processes and parallelism of independent process needs to be configured in this metadata. The data audit and reconciliation involves capturing check sums, row counts of source data and the loaded data. In case of mismatch (outside the reasonable ranges), alerts need to be generated. Record rejection and error logging should also be noted in the metadata.

Control metadata also aids in stating the data load status (i.e. data availability in BI system) to the end users. The history of input data errors and rejection also helps administrators to proactively enhance the system or suggest the source system changes.

The following diagram gives a basic control metadata model, which can be enhanced as per requirement.

Basic Control Metadata Model
The entire ETL processing can be managed, to the extent possible, through a single, metadata-driven process control environment. The control process should capture metadata regarding the progress and statistics of the daily job. It can also facilitate maintenance and development of a cohesive set of extract statistics metadata.

Process Metadata

The metadata that is used for processing purpose is called the Process Metadata. The complete ETL data transformation information is placed in this metadata section.

The following are the various processing activities, which are recorded in the Process Metadata.

No.

Activity

Description

1.

Transformation

Data type conversion: This involves lower-level transformations converting one data type or format to another. E.g. converting date, numeric, and character representations from one database to another.
Calculation and derivation: These transformations need to apply the business rules identified during the requirements process, which would involve functions including string manipulation, date and time arithmetic, conditional statements, and basic mathematical functions.
Aggregation: These involve summarization of low level data to the required granularity.
Special Transformations: e.g. row to column conversions and column to row conversions

2.

Cleansing and Augmentation

Cleaning up the source data by applying transformations (e.g. conversion, derivation and translation), removing duplicates and merging data from different source systems.
De-duping / match-merging is one of the important activities in the cleaning process, which is valuable for Customer centric applications. It aims at defining the business entity uniquely by removing its duplicate entries or by merging the multiple entries. e.g. duplicate customer records can be removed by applying an intelligent algorithm which can define a unique customer with similar sounding name and address. One can also apply augmentation and enrichment techniques.

3.

Validation

Referential integrity (/lookups) and constraint checks can be either implemented at the database level or during transformation process. These rules should be specified as a part of metadata.

The ETL development could be in the form of specification through an ETL tool or development of custom built programs or mix of both these methods.  The custom built scripts should use the metadata rather than hard coding of validations so that maintenance is possible through metadata. Also the Audit Trail (e.g. ETL process log) generated should display the metadata references in order to improve the readability.

Data Models in backroom metadata

The data models are the backbone of the BI system metadata. These models can encompass

  • DW schema (Technical Metadata) and the corresponding Business Metadata (For Data Warehouse schema modeling, refer Data Warehouse dimensional modeling)
  • Source system data structures and metadata (Data Source Metadata)
  • Mapping between source entities and DW entities (ETL Process Metadata)

NOTE: The availability of data source metadata and ETL process metadata with the data model depends on the functionality offered by the data modeling tool e.g. ERWin. Otherwise data models encompass only the DW schema and corresponding business metadata.

The DW schema and corresponding Business Metadata can be fed to the RDBMS. E.g. tools like ERWin, Designer2000 allow forward engineering to generate database structures and corresponding comments in the database. This business metadata may be imported into the metadata repository of front-end tool using a few custom scripts. This can populate the front-end layer with the business metadata which is an integral part of Front-Room Metadata. The following table shows a basic data structure for storing the metadata related to a data field in DW schema.

No.

Fields

1.

Column_Name

2.

Data_Type

3.

Length

4.

Precision

5.

Default_Value

6.

Low_Range

7.

High_Range

8.

Units_of_Measure

9.

Level

11.

Business_Name

12.

Business_Description

13.

Table_Name

14.

Subject_Area

Also ETL processes (tool based or custom built) can be manually defined with the mapping information fed in the data modeling tools. The data source metadata can be imported into the ETL repository, which can be utilized during the ETL development.

Back-Room Metadata- Security Profiles

BI system administrators need to set and monitor the system security at various levels to ensure access restriction. The user profiles and their security policies are maintained at various regions, namely Back-End and Front-End Security Regions.

Tool administrators, developers and designers are part of the back-end security. They have privileges to modify the system and the data. Their area of operation is should be confined to the Level 100 security zone. It is recommended that they should not operate from the nodes other that this security zone. This information is maintained in metadata of the database or metadata repositories of various tools.

The end-users, who access the data, come under the Front-End Security Region. The administrators create these users and define their data access policies. ACLs are generated for this purpose and implemented in individual front-end tools of the system. These ACLs are stored in the metadata repositories of the respective tools.

The following are the different levels / categories of users along with their region of control.

Security Region

Role

Region of Control

Back-End Security

Administrator

Access to Operating System, system tools

Back-End Security

Developer / Designer

Access to system tools (development environment)

Back-End Security

Supervisor

Access to system tools (production environment)

Front-End Security

Executive

Access to data from multiple departments 

Front-End Security

Manager

Access to data from respective department

Front-End Security

Analyst

Access to data from respective division / business function in the Department

Front-End Security

Operator

Access to operational data (mostly from ODS).

Audit Trail

BI system usage metadata indicating who is accessing which components of the DW, which reports are accessed at what frequency, what are the queries / ad-hoc reports requested, processing time for each report / query etc. should be recorded in the metadata repository.

The following table gives a generic audit trail table design.

No.

Field

Description

1.

User_ID (PK)

User identification

2.

Action (PK)

Action specification e.g. Login, Logout, Open Report, Create Report, Save Report, Refresh Report, Publish Report to Corporate, Send Report to another User, Error etc.

3.

Timestamp (PK)

Time when the action was initiated

4.

Execution_Time

Average execution time for the action. This can be NULL if the action was aborted.

5.

Module

Details of the module for which the action is taken e.g. Report Name, Table Name, Subject Area, Department, Error Code etc.

6.

User_IP

The machine identification (IP Address) from which the user is accessing the system

Note- BIDS Solutions encompass the proprietary solutions from TCS covering Business Intelligence and Data Warehousing landscape.

 

   Data Warehouse Metadata Data Warehouse Data Quality assurance  
 
All Topics in: "DW Design & Architecture" Chapter
 Data Warehouse Design and Architecture Overview →  Data Warehouse Source Systems →  Data Warehouse ETL Extraction →  Data Warehouse ETL Transformation →  Data Warehouse ETL Loading →  Data Warehouse Metadata →  Back-Room Data Warehouse Metadata →  Data Warehouse Data Quality assurance →  Data Warehouse job control and audit →  Data Warehouse sharing and browsing →  Data Warehouse Infrastructure → 
 
Relevant Links to this page
TOPIC - Metadata Management definition - What is metadata? → Master Data Management vs. Business intelligence vs. Metadata → Metadata Architecture Design → Metadata standards → 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
Data-Warehouse/Mart

Featured Pages
Derived Dimension Attributes Table
There is no perfect ETL
Enterprise Intelligence' Evaluation- Data Pipelines
Data Warehouse Testing Categories

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
Data Warehouse Testing is Different
Minimize aggregates if using OLAP
Data Warehouse ETL Transformation
Null foreign Keys in fact tables