Dimensions in data
warehousing contain relatively static data about entities
such as customers, stores, locations etc. Slowly changing dimensions commonly
known as SCD, usually captures the data that changes slowly but unpredictably,
rather than regular bases. Slowly
changing dimension is most popular method used in dimensional modelling to
preserve historical data.
For example,
lets take the example of patient details. The fact table may contains the
information about patient expense details. The fact and dimensions are always
linked by means of foreign keys. One of the dimension may contain the
information about patient (say, patient dimension ) . The patient information
will not change on day to day bases. It’ll change once in a while (slowly changes).
On In depth Implementation read here: Design and Implement Slowly Changing Dimension Type 2 (SCD Type2) in SQL
On In depth Implementation read here: Design and Implement Slowly Changing Dimension Type 2 (SCD Type2) in SQL
If you want to
implement the SCD type 2 in SQL (without ETL tools), it’s gonna take bit longer
route but you’ll end up with best feeling in world of implementing SCD Type 2.
Okay!. Let’s get
started with building SCD Type 2 on patient dimension table.
Let us consider
the patient details table which has attributes like SK or PAT_ID, Phone number,
name etc. Here we need to capture the updated phone number or name and store that
historical data in same table.
In data warehouse environment, usually tables are
divided into three tables viz. LOAD (just a data dump) ,
ITERMEDIATE (transformed data) and TARGET (replica of TARGET) tables, will be
referred as LOAD, INT and TGT henceforth. In this SCD type 2 implementation , we will be
using all these three tables.
Below are the
content of LOAD and TARGET table at the time building SCD Type 2.
LOAD : PAT_DTLS_LOAD - Hold new and updated data. Consider we received data as below and we have to load this data into patient dimension which has SCD Type2 implemented on it.
SK
|
PHONE_NO
|
NAME
|
123
|
987654321
|
Jhoney
|
234
|
876543210
|
Stuart
|
567
|
345678901
|
Max Stuart
|
456
|
765432101
|
Jeff
|
TARGET: PAT_DTLS_TGT- Hold actual patient data – This is actual initial load i.e. there are
no patient who updated their data yet.
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
345
|
345678901
|
Jhoney Stuart
|
2016-06-01
|
NULL
|
1
|
234
|
234567890
|
Stuart
|
2016-06-01
|
NULL
|
1
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
NULL
|
1
|
567
|
345678901
|
Max Stuart
|
2016-06-01
|
NULL
|
1
|
INT: PAT_DTLS_LOAD_INT- Table to hold intermediate results
Now we are all
set with tables and data. Let’s get started with detailed steps...
Step 1: Load expired records from TGT to INT table
Copy all expired records INT table . This step just copying all records with non-null EFF_TO_DT and flag
= 0 records. Since there are no expired records already in database, hence
there won’t be any records in intermediate table.
Query: Below query can be used to perform this task
INSERT INTO PAT_DTLS_LOAD_INT
SELECT * FROM PAT_DTLS_TGT
WHERE FLAG = 0 and EFF_TO_DATE is not null;
Result: In first run there will not be any expired records
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
Step 2: Get all records which are going to expire
These are the
records that are updated and are
recieved in LOAD table. Since these records are expired, You also need set the
EFF_TO_DT to the ‘current_date-1’ that means record was active till yesterday
and flag as ‘0’. The updated records will be active from the moment we load them so EFF_FR_DT will be current date
Query: Below query is very helpful to perform this task
INSERT into PAT_DTLS_LOAD_INT
SELECT TGT.SK ,
TGT.PHONE_NO,
TGT.NAME,
EFF_FR_DT,
now() - 1 as EFF_TO_DT,
'0' as flag
FROM PAT_DTLS_TGT TGT
WHERE TGT.flag = 1
AND EXISTS (SELECT 1 FROM
PAT_DTLS_LOAD SRC
WHERE TGT.sk = src.sk
and (
TGT.PHONE_NO <>
src.PHONE_NO
or TGT.NAME <> src.NAME )
);
Result: Results shows expired records. Updated version of
these records are available in LOAD table
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
2016-06-12
|
0
|
234
|
234567890
|
Stuart
|
2016-06-01
|
2016-06-12
|
0
|
Step 3: Copy active records from TGT to INT table
These are the records
which are active and not updated. Since target table is going to be the truncate
and load, You will have to copy all active records to intermediate table.
Query: Below query can
perform the mentioned task
INSERT INTO PAT_DTLS_LOAD_INT
SELECT TGT.SK ,
TGT.PHONE_NO,
TGT.NAME,
EFF_FR_DT,
EFF_TO_DT,
FLAG
FROM PAT_DTLS_TGT TGT
WHERE FLAG = 1
AND NOT EXISTS (SELECT 1 FROM
PAT_DTLS_LOAD SRC
WHERE TGT.SK
= SRC.SK )
UNION ALL --union all is used to includeany unchaged records we got
inpresent cycle that are already present in target table
SELECT TGT.SK ,
TGT.PHONE_NO,
TGT.NAME,
EFF_FR_DT,
EFF_TO_DT,
FLAG
FROM PAT_DTLS_TGT TGT
WHERE TGT.FLAG = 1
AND EXISTS (SELECT 1 FROM
PAT_DTLS_LOAD SRC
WHERE TGT.SK = SRC.SK
AND (
TGT.PHONE_NO = SRC.PHONE_NO
AND TGT.NAME = SRC.NAME )
);
Results: Highlighted records are active records copied from
TGT table.These are the records which not updated in this load cycle
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
2016-06-12
|
0
|
234
|
234567890
|
Stuart
|
2016-06-01
|
2016-06-12
|
0
|
345
|
345678901
|
Jhoney Stuart
|
2016-06-01
|
NULL
|
1
|
567
|
345678901
|
Max Stuart
|
2016-06-01
|
NULL
|
1
|
Step 4: Copy only updated records from LOAD table
These are
records which are updated in this load cycle. Since updated records are already
in the LOAD table, you have to compare
the LOAD and TGT records on Patient ID or SK column and copy the record which
is updated. You should also keep in mind to set EFF_FROM_DT and flag to
indicate active record. The updated records will be active from the moment we load them so EFF_FR_DT will be current date and EFF_TO_DT will be either NULL or future date.
Query: Below query can be used to perform task
SELECT SRC.SK ,
SRC.PHONE_NO,
SRC.NAME,
now(),
null,
1
FROM PAT_DTLS_LOAD src
WHERE EXISTS (SELECT 1
FROM PAT_DTLS_LOAD_INT INT1
WHERE src.sk = INT1.sk
AND flag = 0
AND NOT
EXISTS
(SELECT 1
FROM PAT_DTLS_LOAD_INT INT2
WHERE INT1.sk = INT2.sk
AND flag = 1 )
);
Results: Highlighted records are updated records. You can also
see the expired records in below intermediate results
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
2016-06-12
|
0
|
123
|
987654321
|
Jhoney
|
2016-06-13
|
NULL
|
1
|
234
|
234567890
|
Stuart
|
2016-06-01
|
2016-06-12
|
0
|
234
|
876543210
|
Stuart
|
2016-06-13
|
NULL
|
1
|
345
|
345678901
|
Jhoney Stuart
|
2016-06-01
|
NULL
|
1
|
567
|
345678901
|
Max Stuart
|
2016-06-01
|
NULL
|
1
|
Step 5: Copy fresh records from LOAD to INT
At any given load cycle , you may receive either
updated or fresh records. In this step you are going to load the fresh records.
This is the simple step to compare the LOAD and TGT table on patient ID or SK
and copy the record which is new.
Query:Below query is useful to perform mentioned task
INSERT INTO PAT_DTLS_LOAD_INT
SELECT SRC.SK ,
SRC.PHONE_NO,
SRC.NAME,
NOW(),
NULL,
1
FROM PAT_DTLS_LOAD SRC
WHERE NOT EXISTS (SELECT 1
FROM PAT_DTLS_LOAD_INT INT
WHERE SRC.SK = INT.SK);
Results: The fresh record is highlighted.
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
2016-06-12
|
0
|
123
|
987654321
|
Jhoney
|
2016-06-13
|
NULL
|
1
|
234
|
234567890
|
Stuart
|
2016-06-01
|
2016-06-12
|
0
|
234
|
876543210
|
Stuart
|
2016-06-13
|
NULL
|
1
|
345
|
345678901
|
Jhoney Stuart
|
2016-06-01
|
NULL
|
1
|
456
|
765432101
|
Jeff
|
2016-06-13
|
NULL
|
1
|
567
|
345678901
|
Max Stuart
|
2016-06-01
|
NULL
|
1
|
Step 6: Truncate and load TGT table.
Finally, you just need to truncate the TGT and
copy records from INT table.
Query: Below query can be used for this task
TRUNCATE TABLE PAT_DTLS_TGT;
INSERT INTO PAT_DTLS_TGT
SELECT * FROM PAT_DTLS_LOAD_INT;
Results: Since
TGT is replica of INT table. It hold all records which are available in INT
table
SK
|
PHONE_NO
|
NAME
|
EFF_FR_DT
|
EFF_TO_DT
|
FLAG
|
123
|
123456789
|
Jhoney
|
2016-06-01
|
2016-06-12
|
0
|
123
|
987654321
|
Jhoney
|
2016-06-13
|
NULL
|
1
|
234
|
234567890
|
Stuart
|
2016-06-01
|
2016-06-12
|
0
|
234
|
876543210
|
Stuart
|
2016-06-13
|
NULL
|
1
|
345
|
345678901
|
Jhoney Stuart
|
2016-06-01
|
NULL
|
1
|
456
|
765432101
|
Jeff
|
2016-06-13
|
NULL
|
1
|
567
|
345678901
|
Max Stuart
|
2016-06-01
|
NULL
|
1
|
Hope you have enjoyed reading this blog. Please
drop comment for feedback and any queries.
No comments:
Post a Comment