Sales Management Customer Relationship Human Resources Business Performance BI & Data Quality IT Tools & Vendors

Sign-in   Register
Establishing 'Making it Happen' as a 'Formal & Predictable' Discipline
   Foundation & Conformed Dimensions and Facts in Data Warehouse Dimensional Model  

ENCYCLOPEDIA→   Enterprise Intelligence  →   -  Data-Warehouse/Mart  →   -  DW Dimensional Modeling Concepts  → 

Slowly Changing Dimensions SCD in Dimensional Modeling

Dimensional model has to address some complex situations liked slowly changing dimensions.


Slowly Changing Dimensions

Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. lead to changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost.

For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billing statements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency.

A key benefit for Data Warehouse is to provide historical information, which is typically over-written (and thus lost)in the transaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit.

There are three ways to handle the same:

Slowly Changing Dimension method 1 (In short SCD 1)

The way most of the source systems will handle it- Overwrite the attribute value. For example if a customer’s marital status has moved from 'Unmarried' to 'Married', we over-write 'unmarried' to 'Married'. Similarly, if an insurance policy status has moved from 'Lapsed' to 'Re-instated' the new status is over written on the old status. This is obviously done, when we are not analyzing the historical information.

Slowly Changing Dimension Method 2 (in short SCD 2)

This is the true-blue technique to deliver precise historical analysis. This is used, when there is more than one change in the attributes of an entity, and we need to track the date of change of the attribute.

In this method, a new record is added whereby the new record is given a separate identifier as the primary key. We cannot use the production key as the primary key here as it has not changed (Customer ID has remained the same, while the value of its attribute 'marital status' has changed). This new identifier is called the surrogate key.

Apart from adding a new record and providing a new primary (surrogate) key, the validity period for this new record is also added.

For example- You have a dimensional table with customer_ID '110002' with marital status as 'single'. Overtime, customer gets married and also moved to a new location. The customer dimension record will be:

Surrogate Key Customer ID Date Valid Marital Status Date of Birth City
1100021 110002 Sept 23, 2004 Single Jan8, 1982 Palo Alto
1100022 110002 Oct 25, 2005 Married Jan8, 1982 Palo Alto
1100023 110002 Nov 23, 2005 Married Jan8, 1982 San Francisco

Slowly changing dimension method 3 (SCD 3)

This is a mid-way between method 1 and method 2. Here we don’t add an additional record, but add a new field 'old attribute value'. However, this has limitations. This method has to know from the beginning on what attributes will change. This is because a new field/attribute has to be added in the design for every attribute, which can change. Secondly, attribute can change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.

Surrogate Key Customer ID Marital Status Date of Birth City Marital Status Old City Old
1100021 110002 Married Jan8, 1982 San Francisco Single Palo Alto

NOTE – The term of 'Slowly changing dimension' is used because of it being a universally acknowledged term. However, the same methods will apply to fast changing dimensions as well.

Surrogate Keys as Primary keys of dimension tables

There is a best practice in dimensional model design to not to use the production primary key as the primary key for the dimension table. This goes against conventional logic, but has a reason.

Data Warehouse has a core need for maintaining historical information and how an entity has moved and changed shape through the passage of time. Typically Source Systems need for this kind of information is quite less. In case of historical tracking in source systems, these systems can have the luxury of using multiple-field primary key (including the key identifier of the entity plus date stamp). For example if an insurance policy is lapsed and after two months it becomes reinstated, one can use the primary key as a combination of Policy number+ date/time +the status in the 'policy history table'. However, Data Warehouse doesn’t recommend the luxury of using multiple field primary key in dimension table.

Therefore, the concept 'surrogate Key' comes into play where the primary key is not the production key, but a key generated by the system. The production key is also used as an attribute within the same dimension table.

The situations/reasons on when a surrogate key is used:

  • 'Slowly changing dimensions'
  • When the primary key itself is repeated.
  • When there is a multiple field primary key. Dimension model typically does not use multiple field primary key to link to the fact table.

Therefore it is always recommended to use surrogate keys. it is difficult to find the organizations, which will not face the situations as highlighted above. If there are, they could as well manage their needs using excel and pivot tables.

 

   Foundation & Conformed Dimensions and Facts in Data Warehouse Dimensional Model  
 
 

Was this page helpful?
 
 
More on DW Dimensional Model Concepts
DW Dimensional Model Components Concept
Schemas- Star, Snow-Flake and Constellation
Dimensional Modeling vs. Relational Modeling
Foundation & Conformed Dimensions and Facts
BUY BI & Data Management Vendors & Tools Evaluation Kit
Read more...
BUY largest on-line Data-Quality Management Kit
Read more...
Additional Channels
Principles & Rules
Free Templates
Glossary
Key Performance Indicators



Most Popular Zones with list of pages crossing 25000 hits  →→→ 
Maximising Sales Performance
Sales Channel Retention, Support and Engagement
Sales Campaign Management
Sales Channel Mix Profitability
Data Management in Sales Campaign
Sales Material logistics and Distribution
Read more...
  Customer Relationship Management
What is Customer Segmentation?
Customer Segmentation Data Management
Supply Chain for Customer Service and Support
Customer Segmentation approach
Customer Service and Support Overview
Read more...
  Human Resources & Leadership
What is Leadership?
Lead Change
Strategic Business Plan
Competencies Definitions
Be straight and blunt, till you team gets used to it
Read more...
 
 
Business Performance & Planning
Stakeholder test for Scorecard
Business Objectives Drill Down
Shifting the mind-set to leading Indicators- KPIs
A KPI should be simple -but it depends
SWOT Assessment Report
Read more...
  Business Intelligence & Data Quality
Sponsor for a Data Quality Program
Data Quality Gaps Root Cause Analysis
Customer Data Variations
What is MDM-CDI?
BI & Data Warehouse- End User Tools
Read more...
  IT Vendors & Tools Management
Load, Log and Cache Management for Reports
Metadata Tool administration Security
Data Quality Tools Integration
Metadata Repository sharing
Vendor Management strength Evaluation
Read more...