Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Data Warehouse ETL Transformation Data Warehouse Metadata  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data-Warehouse/Mart  →   -  DW Design & Architecture  → 

Data Warehouse ETL Loading

Loading the data-sets into the data warehouse repository, to ensure that loading happens using minimum system resources and fastest possible time.

Data Loading starts after the data sets are ready in the Data Staging in the presentation server. This is considered to be a straightforward process. The key consideration in the Loading process is to achieve the speed of Loading. This is achieved by using various methods:

Turn off Logging during loading

If the logging is on, the system will create the transaction log with every write action it will perform on the database. By turning off logging, one can save on this overhead.

Drop and recreate Indexes

Like logging, indexing also has an associated overhead with most of the write operations in database. One can drop the index and recreate the same after Loading is done. This will make the indexing a batch process instead of an activity done with every new record added into the system. However, this trick is relevant, if you are adding a high proportion (say 10%..) to the existing tables. If you are adding only a small %age say 2% of the size of the existing tables, it will take more time to re-create the index.

Pre-Sort the File as per the primary key index for Data Warehouse loading

This is a good practice as it speeds the up the indexing process tremendously. This is among the most recommended techniques. If you also need to create indexes on other fields (which are not primary key), you can drop and re-create them later.

Take a balanced view on the number of indexes.

This is a concept, which is as much applicable for source systems as it is for the Data warehouse. Too many indexes will increase the Loading time, and too few will increase the user access time.

Use the append (new OR updated values, tables) instead of Full Refresh

This is a conventional wisdom that adding new records will take less time compared to rebuilding the table from scratch. However, if there are too many new records to be added, and volume is high, it is faster to drop and re-create.

Limit the frequency of updation

There are opportunities (though not many.) when you may update certain data marts, OR sections of Data Warehouse by less than Daily frequency. This will save a lot of time, if Data Warehouse is used for analytics, modeling OR score carding, which does not necessarily need daily updation.

Parallelize table load and index maintenance

Dropping indices and bulk Loading in parallel can significantly improve Loading time. All the known data warehouse platforms provide this capability.

Stream Data ETL

Certain ETL tools will allow you to extract, transform, and load in one process. That is, it is not necessary to create intermediate files. However, this is a new concept and works only for some very fire-wall segregated staging processes, which are complete on their own. Moreover, this will work when the transformation is simplistic.

Take a balanced view on integrity checks

If your quality assurance routines over the days and months are giving clean chit to the Loading process, one can reduce these checks as they place an extra load. Given your level of comfort, this can be done post-facto.

 

   Data Warehouse ETL Transformation Data Warehouse Metadata  
 
 

Was this page helpful?
 
 
More on DW Design & Architecture
DW Design and Architecture Overview
Data Warehouse Source Systems
Data Warehouse ETL Extraction
Data Warehouse ETL Transformation
Data Warehouse Metadata
Back-Room Data Warehouse Metadata
Data Warehouse Data Quality assurance
Data Warehouse job control and audit
Data Warehouse sharing and browsing
Data Warehouse Infrastructure
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Objectives Clarity
Sales force Training and Development
Sales Leads Generation through advertising
Sales Synergies
Sales Compensation components
Read more...
  Customer Relationship Management
Customer Segmentation approach
Supply Chain for Customer Service and Support
Drivers for Customer Satisfaction & Retention
What is Customer Segmentation?
Customer Segmentation Actions
Read more...
  Human Resources & Leadership
Setting Strategic Intent and Alignment
Developing Leaders- Few Leadership Traits
Lead Change
Empower Front-line Employees
Strategic Business Plan
Read more...
 
 
Business Performance & Planning
Financial Business Plan
strategy blueprint Rationalize Align and Publish
Stakeholder test for Scorecard
Strategy Blueprint Information Gathering
Never design performance systems for specific KPI
Read more...
  Business Intelligence & Data Quality
Business Intelligence organization roles
BI Cost-Reduction- Contracts
Handling Sparse Dimensional tables
Knowledge Discovery in Databases Process
Fact tables to record history
Read more...
  IT Vendors & Tools Management
Multi Cube OLAP Architecture
Load, Log and Cache Management for Reports
Metadata Repository sharing
Vendor Commercial Evaluation- pre Implementation
Vendor Delivery Support Model
Read more...