Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
Principles and Rules Listing Page

Checksum Approach for identifying the changed records from source systems

With many records changing every day, it's a challenge for an integration system to identify the changed records out of millions of records existing in the source system.
 
This page of 'Principles and Rules' is linked to:  Data Warehousing, Data Analysis/OLAP, BI platform Tools Evaluation, BI business intelligence end-to-end view,

Most of the source system transactional tables store the new activities as additional records (append). However the master tables (Customer s, products, suppliers..) etc are mostly "updated" with changes like activation status, addresses, marital status (refer slowly changing dimensions ), over-writing the previous/historical information. For a large company with millions of such changes happening every day, an ETL has a big challenge to identify the records which have changed. There are two ways to handle it:

  • Refer the transaction log of the source system:

    Even mediocre systems have transaction logging capability in their database systems, which keep a track of all the updates done to the tables. The ETL system can scan the log and find out the master (or for that matter any) table which has changed. It goes to those records and picks them for processing.

    However here can be the issues with this approach when the database administrators turn-off transaction logging due to performance overhead reasons during peak periods, or due simple lack of skills.

  • Use the check-sum approach:

    There are many check-sum algorithms, which are used to compute a unique number using the contents of a record. The next day when the ETL picks up the master tables, it can calculate the check-sum of all the records and compare it with previous day check-sum for the corresponding records. If there is a mis-match, it means that something in the record has changed. This will allow ETL to quickly filter out the changed records and then do field by field comparison of this short-listed set of records.


Quick Feedback- Was this information helpful ?
Relevant Links to this page
Practice Tools → Source system mapping matrix → Practice Tools → Data Source Checklist → Principles & Rules → Dimensional model has to be aligned to the Entity-Relationship → Principles & Rules → Always Use Conformed Dimensions → Principles & Rules → You may not be a able to have a perfect ETL → Practice Techniques → Handling Sparse Dimensional tables → Principles & Rules → Do not separate the parent and child line item data → Practice Techniques → Managing time-stamps across multiple time-zones → Practice Techniques → Recording events in multiple currencies → Practice Techniques → Handle different units of measure in the same fact table → Principles & Rules → Handling of Null foreign Keys in fact tables → Principles & Rules → Dimension Attributes as NULL → Principles & Rules → Don't rely too much on Meta Data Tools to enforce Business Intelligence → Principles & Rules → Don't wait for universal models for Data Marting → Principles & Rules → Add extra buffer for ETL phase → Principles & Rules → Homework before interviews is must (Business Requirements Phase in Data Warehouse) → Principles & Rules → Excel is the competition, which should be challenged → Principles & Rules → Avoid Pure MOLAP → Practice Techniques → Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Consolidate Data-Marts → Practice Techniques → Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Licensing & Maintenance Contracts → Practice Techniques → Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Governance & Standards → Practice Techniques → Field Tips Series- Streamlining & reducing cost of Business Intelligence- Evaluate Open Source → Principles & Rules → Master Data Management- Making a Right Start → Practice Techniques → How to integrate stand-alone BI environments- Gradual Approach → Principles & Rules → Business owned applications are a reality- Manage it → Principles & Rules → New Data Standards- What about existing data and applications? → Principles & Rules → Handle Each Time-stamp in the Fact Table as a separate dimension → Principles & Rules → Keep Aggregates and Details data in different Fact tables → Principles & Rules → Some considerations for Infrastructure in Data Warehouse → Principles & Rules → For Core BI platform go for a single, established and robust player → Principles & Rules → Don't be guided only by the business requirements for your Business Intelligence → Practice Techniques → Using Synonyms and Views → 
 
Back
 
Relevant links to this page
Source system mapping matrix
Data Source Checklist
Dimensional model has to be aligned to the Entity-Relationship
Always Use Conformed Dimensions
You may not be a able to have a perfect ETL
Handling Sparse Dimensional tables
Do not separate the parent and child line item data
Managing time-stamps across multiple time-zones
Recording events in multiple currencies
Handle different units of measure in the same fact table
Handling of Null foreign Keys in fact tables
Dimension Attributes as NULL
Don't rely too much on Meta Data Tools to enforce Business Intelligence
Don't wait for universal models for Data Marting
Add extra buffer for ETL phase
Homework before interviews is must (Business Requirements Phase in Data Warehouse)
Excel is the competition, which should be challenged
Avoid Pure MOLAP
Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Consolidate Data-Marts
Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Licensing & Maintenance Contracts
Field Tips Series- Streamlining & Cost-Reduction in Business Intelligence- Governance & Standards
Field Tips Series- Streamlining & reducing cost of Business Intelligence- Evaluate Open Source
Master Data Management- Making a Right Start
How to integrate stand-alone BI environments- Gradual Approach
Business owned applications are a reality- Manage it
New Data Standards- What about existing data and applications?
Handle Each Time-stamp in the Fact Table as a separate dimension
Keep Aggregates and Details data in different Fact tables
Some considerations for Infrastructure in Data Warehouse
For Core BI platform go for a single, established and robust player
Don't be guided only by the business requirements for your Business Intelligence
Using Synonyms and Views
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators

Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales force density
Sales Revenue Management
Sales Objectives Clarity
Sales Leads Management System
Sales Campaign Infrastructure
Read more...
  Customer Relationship Management
Customer Value and Profitability-Overview
What is Customer Segmentation?
Exit barriers for Customer Retention
Customer Service and Support Overview
Customer Segmentation Parameters
Read more...
  Human Resources & Leadership
Business and Financial Acumen
Customer Focus
Roles and Level based Competency Segregation
Maximize the output first and then the potential
Feedback does not mean only negative feedback
Read more...
 
 
Business Performance & Planning
SWOT Analysis in Strategic blueprint Planning
Strategy Map to Strategic theme
SWOT Assessment Report
Individual goal Sheet
Stakeholder test for Scorecard
Read more...
  Business Intelligence & Data Quality
Normalization in Dimensional modeling
Dimensional Modeling vs. Relational Modeling
Master Data Management
Beware of Data Federation
What is Data Warehouse?
Read more...
  IT Vendors & Tools Management
End User Reporting Features
Data Quality Tools Integration
OLAP Security
Collaboration and Administration Support
Vendor Delivery Evaluation Training
Read more...