Sunday, 12 June 2016

Data Warehouse Slowly Changing Dimensions (SCD)

Slowly changing dimensions or SCD’s are dimensions that changes slowly over time, rather than regular bases.  In data warehouse environment, we need to keep track of the change in dimension values that can be used to report historical data at any given point of time.
There are many approaches are being used to deal  with slowly changing dimensions , can be referred as SCD’s henceforth.

Read More on SCD here: Slowly Changing Dimensions (SCD)
  • Type 0: Always retains original
  • Type 1 : Keeps latest data, old data is overwritten
  • Type 2 : Keeps the history of old data by adding new row
  • Type 3 : Adds new attribute to store changed value
  • Type 4 : Uses separate history table
  • Type 6 : combination of type 1, 2 and 3

Type 0: Always retains original

The type 0 is passive method.  There is no special action is performed on dimensional changes. Values remain as they were at the time the dimension table record was first inserted, others may be overwritten.

Type 1 : Keeps latest data, old data is overwritten

This method overwrites the old data, hence doesn’t keep historical data.
Consider the patient table

SK
Pat_ID
Name
Cellphone
100
1
ABC
1234567890

In the above table, pat_ID is the natural key and SK is surrogate key column. Technically, SK column is not required as natural key uniquely identifies the records.  Tables will be joined by using this key column and this column is unique key.
If the patient changes his/her cellphone number, old data is overwritten.

SK
Pat_ID
Name
Cellphone
100
1
ABC
9876543210

The main disadvantages of the type 1 method is there is no history data in data warehouse. However, it can be easily maintained and reduces the data warehouse size.

Type 2 : Keeps the history of old data by adding new row

This method keeps track of historical data by adding multiple rows of given key columns in dimension table. Some SCD type 2 implementations uses effective from and to date with flag indicating latest record. Learn about SCD Type 2 design  SCD Typ2 Design and Development

SK
   Pat_ID
Name
Cellphone
EFF_FR_DT
EFF_TO_DT
flag
100
1
ABC
1234567890
2016-06-01
2016-06-10 
101
1
ABC
9876543210
2016-06-10  
NULL 

Flag = 1 and EFF_TO_DT = NULL indicates current or latest tuple versions. In some implementation, data modeller uses future date (9999-12-31) as effective to date.  In case if you want to track the historical records, you have to select all records which are non-null and flag = 0 values.  
Adding new records in changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute/columns could be added to table in the future.

Type 3 : Adds new attribute to store changed value

This method of SCD keeps the limited history of the changed data in form of new column. The history preservation is limited to the  number of columns designed for storing historical data. The original table structure in type 1 and 2 is same , but type 3 adds new column to store current data.
For example, in the following example cellphone number changes and additional column is added to hold previous  and current cellphone numbers.
SK
Pat_ID
Name
Previous_Cellphone
Current_Cellphone
100
1
ABC
1234567890
9876543210

The only disadvantages is this method keeps limited history about changed data. In above table, new column current_cellphone and previous_cellphones are added to hold new data and changed data respectively.

Type 4 : Uses separate history table

Type 4 usually uses the separate table to hold history data, where as original table keeps the current data , and other table is created to store the changed data.  Surrogate key is added to both tables to and is used in fact table to identify original and history data.


Patient table
SK
   Pat_ID
Name
Cellphone
101
1
ABC
9876543210

Patient History table
SK
   Pat_ID
Name
Cellphone
CRT_DT
100
1
ABC
1234567890
 2016-06-01
101
1
ABC
9876543210
 2016-06-10

Type 6 : combination of type 1, 2 and 3

The Type 6 method combines the approaches of types 1, 2 and 3. In this type, below are the changes to dimension table:
Current_record – keeping the current data
Historical_record – keeping historical or changed data
Effective_from_date – keeping start date of record, null for new records
Effective_to_date –keeping end of expired record
Curr_flag – indicates current records
This type capture attribute change, new row is added as  in type 2. The old information is overwritten with the new data as in type 1. And history preserved  in a historical_record as in type 3. 
Below is type 6 table of the following example

   Pat_ID
Name
Current_Cellphone
Historical_Cellphone
EFF_FR_DT
EFF_TO_DT
flag
1
ABC
9876543210
2345678901
 2015-01-01
2016-01-20 
 0
1
ABC
9876543210
1234567890
 2016-01-21 
 Null
 1


No comments:

Post a Comment