Slowly changing dimensions or SCD’s
are dimensions that changes slowly over time, rather than regular bases. In data warehouse environment, we need to keep
track of the change in dimension values that can be used to report historical data
at any given point of time.
There are many approaches are
being used to deal with slowly changing
dimensions , can be referred as SCD’s henceforth.
Read More on SCD here: Slowly Changing Dimensions (SCD)
Read More on SCD here: Slowly Changing Dimensions (SCD)
- Type 0: Always retains original
- Type 1 : Keeps latest data, old data is overwritten
- Type 2 : Keeps the history of old data by adding new row
- Type 3 : Adds new attribute to store changed value
- Type 4 : Uses separate history table
- Type 6 : combination of type 1, 2 and 3
Type 0: Always retains original
The type 0 is passive method. There is no special action is performed on
dimensional changes. Values remain as they were at the time the dimension table
record was first inserted, others may be overwritten.
Type 1 : Keeps latest data, old data is overwritten
This method overwrites the old
data, hence doesn’t keep historical data.
Consider the patient table
SK
|
Pat_ID
|
Name
|
Cellphone
|
100
|
1
|
ABC
|
1234567890
|
In the above table, pat_ID is the
natural key and SK is surrogate key column. Technically, SK column is not
required as natural key uniquely identifies the records. Tables will be joined by using this key column
and this column is unique key.
If the patient changes his/her
cellphone number, old data is overwritten.
SK
|
Pat_ID
|
Name
|
Cellphone
|
100
|
1
|
ABC
|
9876543210
|
The main disadvantages of the
type 1 method is there is no history data in data warehouse. However, it can be
easily maintained and reduces the data warehouse size.
Type 2 : Keeps the history of old data by adding new row
This method keeps track of
historical data by adding multiple rows of given key columns in dimension table.
Some SCD type 2 implementations uses effective from and to date with flag
indicating latest record. Learn about SCD Type 2 design SCD Typ2 Design and Development
SK
|
Pat_ID
|
Name
|
Cellphone
|
EFF_FR_DT
|
EFF_TO_DT
|
flag
|
100
|
1
|
ABC
|
1234567890
|
2016-06-01
|
2016-06-10
|
0
|
101
|
1
|
ABC
|
9876543210
|
2016-06-10
|
NULL
|
1
|
Flag = 1 and EFF_TO_DT = NULL indicates
current or latest tuple versions. In some implementation, data modeller uses
future date (9999-12-31) as effective to date. In case if you want to track the historical
records, you have to select all records which are non-null and flag = 0 values.
Adding new records in changes to
the dimensional model in type 2 could be very expensive database operation so
it is not recommended to use it in dimensions where a new attribute/columns
could be added to table in the future.
Type 3 : Adds new attribute to store changed value
This method of SCD keeps the limited
history of the changed data in form of new column. The history preservation is
limited to the number of columns
designed for storing historical data. The original table structure in type 1
and 2 is same , but type 3 adds new column to store current data.
For example, in the following
example cellphone number changes and additional column is added to hold previous
and current cellphone numbers.
SK
|
Pat_ID
|
Name
|
Previous_Cellphone
|
Current_Cellphone
|
100
|
1
|
ABC
|
1234567890
|
9876543210
|
The only disadvantages is this
method keeps limited history about changed data. In above table, new column
current_cellphone and previous_cellphones are added to hold new data and
changed data respectively.
Type 4 : Uses separate history table
Type 4 usually uses the separate table
to hold history data, where as original table keeps the current data , and
other table is created to store the changed data. Surrogate key is added to both tables to and
is used in fact table to identify original and history data.
Patient table
SK
|
Pat_ID
|
Name
|
Cellphone
|
101
|
1
|
ABC
|
9876543210
|
Patient History table
SK
|
Pat_ID
|
Name
|
Cellphone
|
CRT_DT
|
100
|
1
|
ABC
|
1234567890
|
2016-06-01
|
101
|
1
|
ABC
|
9876543210
|
2016-06-10
|
Type 6 : combination of type 1, 2 and 3
The Type 6 method combines the approaches of types 1, 2
and 3. In this type, below are the changes to dimension table:
Current_record – keeping the
current data
Historical_record – keeping historical
or changed data
Effective_from_date – keeping start
date of record, null for new records
Effective_to_date –keeping end of
expired record
Curr_flag – indicates current
records
This type capture attribute
change, new row is added as in type 2. The old information is overwritten
with the new data as in type 1. And history preserved in a historical_record as in type 3.
Below is type 6 table of the
following example
Pat_ID
|
Name
|
Current_Cellphone
|
Historical_Cellphone
|
EFF_FR_DT
|
EFF_TO_DT
|
flag
|
1
|
ABC
|
9876543210
|
2345678901
|
2015-01-01
|
2016-01-20
|
0
|
1
|
ABC
|
9876543210
|
1234567890
|
2016-01-21
|
Null
|
1
|
No comments:
Post a Comment