Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

  Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Data Warehouse Source Systems Data Warehouse ETL Transformation  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data-Warehouse/Mart  →   -  DW Design & Architecture  → 

Data Warehouse ETL Extraction

Detailing the extraction of data from Source systems to staging database including the logic, sequence, timings and checks.

Data Warehouse Extraction Design- Concept

NOTE- While this page is part of Data Warehouse section, the concept and process can be equally applied in the areas of Master Data Management and Operational Data Store. Both MDM and ODS involve the use of Extraction, Transformation and Loading. The extent and intensity of transformation varies depending upon the purpose.

Reliable Source:

Ensuring that the data is pulled out from the correct and reliable 'master' source. For example-The customer ID and address may be available in three different systems, but it should pull it out of a source where it is most authentic and complete. The main reference for this purpose is Source Systems Mapping.

Data Consistency

Extraction programs pick data from tens (if not scores) of different sources. Making sure that all the data across these systems represent a single business world and not the different business world in different time warps. This subject also has been covered in Data Quality topic. Not all the systems have the date and time stamps in all the transactions (for saving overhead). Therefore sometimes the Extraction/Transformation services have to do this noble task. This is possible by:

  • The Extraction is done after all the interface and batch runs are complete. The consistency rules are typically same as that of production reports.
  • The time stamping to be done for all the extracted data, where dimensions are interlocked with the time dimensions (most of them will be)

Timely Availability

Ensuring that Data is made available for further processing at the right time and in the right form. For example – If the Transformation process (post Extraction) starts at 5 A.M. in the morning, the Extraction process should be able to pull out data from all the source systems before that, synchronized with their end of day batch runs etc.

Though looking simple, but this can be one of pretty cataclysmic challenges facing the monolithic systems. These large systems may belong to different technology groups and owned by different business organizations. One may find that extract routine windows keep on getting smaller with increasing production processing and enlarged production processing batches. You may think that offline databases as sources could be a solution, but the production reports generation is the first priority and challenge remains the same.

Completeness of Extraction for ETL

Ensuring that Data Extraction is well audited – The Extraction process should be able to run the quality checks to confirm that all the data has been extracted from all sources before giving a go-ahead to the Transformation activity. For example – These checks are done typically on the basis of last business transaction date OR the activity time stamp. For large Extraction routines, there are separate Extraction audit tables.

Quality of Extraction for ETL

Ensuring that Extraction process has been of quality- the process should ensure that there is a complete synch between the extracted data what was in production databases and the staging area. This can be done is many ways like running count and aggregation matches on critical fields across the staging and sources systems. You can refer interface controls for some of the checks you can place.

Preservation of Data

Ensure that Extraction Process preserve the data: The Data-Warehouse typically needs to maintain historical snapshots, which in source systems many a times are over written OR archive and also purged. There are no standard rules, but staging area typically should 'append' the extracted information, till the transformed data history is preserved either in staging OR in the final 'presentation/loaded' area.

In synch with business rules/System Logic

The common saying of 'there can be hundred answers to a question' applies perhaps most of data-warehousing project. An Extraction service can have many sources to choose from for the same set of data.

Extract programs typically take out the source data in the absolute raw form (the way it resides in the source systems) without any Transformation and also data quality checks. The key action between the source database and the extracted data is the filtration and selective Extraction. This is done mainly to reduce the Extraction time and also to avoid any unforeseen quality OR availability issues. You would not like to have an Extraction program stop because the agent code existing in the sales database is not available in the master table.

Data Warehouse Data Extraction- Process

Technical Specifications Data Extraction for ETL

You have identified the source system and which system will be the 'master' supplier of a dimension/fact. The next step is to make a list of the table and the fields you want to pick-up from each source. The dimensions and the fact table grains do not drive the Extraction routines. They are more driven by the source system. The source system mapping will give you the logical input to start writing the technical specifications of the Extraction routines. The technical specification routines (may of them are now being automated by state of the art 'Extraction tools':

The Sources systems for data extraction

The Table and field names, which have to be extracted, reference to dimensions and facts of the dimension models.

Data Extraction routing details containing:

  • The Extraction routine number
  • The Extraction routine details on the 'Extraction key' attributes, filters.
  • Any Transformations during Extraction.
  • The time of Extraction routines
  • The pre-Extraction routine checks
  • During Extraction routine checks
  • Post-Extraction routine checks

   Data Warehouse Source Systems Data Warehouse ETL Transformation  
Relevant Links to this page
Practice Tools → Source system mapping matrix → Practice Tools → Data Source Checklist → Practice Tools → Source to Target Table → 
All Topics in: "DW Design & Architecture" Chapter
 Data Warehouse Functional Technical Design and Architecture  →  Data Warehouse DW Data Source Systems →  Data Warehouse DW ETL Extraction →  Data Warehouse DW ETL Transformation →  Data Warehouse DW ETL Loading →  Data Warehouse Metadata →  Back-Room Data Warehouse Metadata Source Business security audit ETL →  Data Warehouse Data Quality assurance →  Data Warehouse DW job control and audit →  Data Warehouse Access and Browsing | Query Services Management →  Data Warehouse Infrastructure → 
More on DW Design & Architecture
DW Design and Architecture Overview
Data Warehouse Source Systems
Data Warehouse ETL Transformation
Data Warehouse ETL Loading
Data Warehouse Metadata
Back-Room Data Warehouse Metadata
Data Warehouse Data Quality assurance
Data Warehouse job control and audit
Data Warehouse sharing and browsing
Data Warehouse Infrastructure
BUY BI & Data Management Vendors & Tools Evaluation Kit
BUY largest on-line Data-Quality Management Kit
Additional Channels
Principles & Rules
Free Templates
Key Performance Indicators

Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales facility Infrastructure
Sales Leads Generation through Events
Sales Campaign Infrastructure
Sales Leads Generation through Point of Sale
telemarketing Sales Lead Generation
  Customer Relationship Management
Customer Segmentation approach
Customer Knowledge and Organizational Knowledge
Customer Value and Profitability Data Management
Customer Value and Profitability Tips and Actions
Customer Segmentation Data Management
  Human Resources & Leadership
Feedback does not mean only negative feedback
Empower Front-line Employees
Act with Decisiveness
Setting Strategic Intent and Alignment
Fitting leadership dimension in employee performance
Business Performance & Planning
Internal Info Assessment Report
Dashboard Health Checklist
Scorecard Health Checklist
Strategy Blueprint Information Gathering
Performance Review should have no surprises
  Business Intelligence & Data Quality
DMA Data flow Analysis
Data Group Master
Events in multiple currencies
Data Warehouse Requirements Assessment
BI Cost-Reduction- Consolidate Marts
  IT Vendors & Tools Management
Data Quality Tools Integration
OLAP Server Reliability
Connectivity and Computing Support
Report Delivery Management
Data Cleansing and Augmentation