Building Making It Happen
Establishing Making-it-Happen as ‘Formal & Measurable’ Business Discipline
  Sign-in         Register
    
   Data Warehouse ETL Transformation Data Warehouse Metadata  

Execution-MiH Encyclopedia  →   Enterprise Intelligence  →  SECTION -  Data-Warehouse/Mart  →  CHAPTER -  DW Design & Architecture  → 

Data Warehouse ETL Loading

Loading the data-sets into the data warehouse repository, to ensure that loading happens using minimum system resources and fastest possible time.

Data Loading starts after the data sets are ready in the Data Staging in the presentation server. This is considered to be a straightforward process. The key consideration in the Loading process is to achieve the speed of Loading. This is achieved by using various methods:

Turn off Logging during loading

If the logging is on, the system will create the transaction log with every write action it will perform on the database. By turning off logging, one can save on this overhead.

Drop and recreate Indexes

Like logging, indexing also has an associated overhead with most of the write operations in database. One can drop the index and recreate the same after Loading is done. This will make the indexing a batch process instead of an activity done with every new record added into the system. However, this trick is relevant, if you are adding a high proportion (say 10%..) to the existing tables. If you are adding only a small %age say 2% of the size of the existing tables, it will take more time to re-create the index.

Pre-Sort the File as per the primary key index for Data Warehouse loading

This is a good practice as it speeds the up the indexing process tremendously. This is among the most recommended techniques. If you also need to create indexes on other fields (which are not primary key), you can drop and re-create them later.

Take a balanced view on the number of indexes.

This is a concept, which is as much applicable for source systems as it is for the Data warehouse. Too many indexes will increase the Loading time, and too few will increase the user access time.

Use the append (new OR updated values, tables) instead of Full Refresh

This is a conventional wisdom that adding new records will take less time compared to rebuilding the table from scratch. However, if there are too many new records to be added, and volume is high, it is faster to drop and re-create.

Limit the frequency of updation

There are opportunities (though not many.) when you may update certain data marts, OR sections of Data Warehouse by less than Daily frequency. This will save a lot of time, if Data Warehouse is used for analytics, modeling OR score carding, which does not necessarily need daily updation.

Parallelize table load and index maintenance

Dropping indices and bulk Loading in parallel can significantly improve Loading time. All the known data warehouse platforms provide this capability.

Stream Data ETL

Certain ETL tools will allow you to extract, transform, and load in one process. That is, it is not necessary to create intermediate files. However, this is a new concept and works only for some very fire-wall segregated staging processes, which are complete on their own. Moreover, this will work when the transformation is simplistic.

Take a balanced view on integrity checks

If your quality assurance routines over the days and months are giving clean chit to the Loading process, one can reduce these checks as they place an extra load. Given your level of comfort, this can be done post-facto.

 

   Data Warehouse ETL Transformation Data Warehouse Metadata  
 
All Topics in: "DW Design & Architecture" Chapter
 Data Warehouse Design and Architecture Overview →  Data Warehouse Source Systems →  Data Warehouse ETL Extraction →  Data Warehouse ETL Transformation →  Data Warehouse ETL Loading →  Data Warehouse Metadata →  Back-Room Data Warehouse Metadata →  Data Warehouse Data Quality assurance →  Data Warehouse job control and audit →  Data Warehouse sharing and browsing →  Data Warehouse Infrastructure → 
 

Was this page helpful?
If you like it ? share it !
Digg
Digg
Reddit
Reddit
Del.icio.us
Delicious
Google
Google
Live
Live
Facebook
Facebook
Slashdot
Slashdot
Netscape
Netscape
Technorati
Technorati
Stumbleupon
Stumbleupon
Spurl
Spurl
Furl
Furl
Blogmarks
Blogmarks
Yahoo
Yahoo
Plugim
Plugim
Squidoo
Squidoo
BlinkBits
BlinkBits
 
CONTENT ZONE
Data-Warehouse/Mart

Featured Pages
Minimize aggregates if using OLAP
Metadata Architecture Scenarios
Data Quality Program DMA
Data Warehouse Infrastructure Considerations

Make 'Executable' Strategy
Maximize Results
Maximize People
Manage Execution

Featured Pages
BI Performance Management- Setting the Context
Parallel Dimensional Hierarchy
Sponsor for a Data Quality Program
Data Mart + Dimensions +facts