SCD Type -2 with Effective Date and Incremental load logic
Here I’m taking scenario where my dimension is changing slowly
and it require history to be maintained, with optimum load performance.
As we are trying to implement SCD2, it is necessary for Source to have Primary key and non-redundant data.
Taking sample data employee table where salary is changing over
period of time and rest are the attributes are similar.
Below is a source table, here my source has two extra column “Created
Date” and “Updated Date” using this two attributes we can get latest data and
overcome from unnecessary extraction.
And below Target table which is having two extra column “Effective
Start Date” and “Effective End Date” to find out which is latest data into
target dimension table.
Another additional column from source “Surrogate Key” which is
uniquely generated number using Sequence generator.
Now let’s start creating mapping.
Step1: Import Source definition into Source Analyzer using DB
username and password
Step2: Import Target definition into Target designer or create
Target definition and using Generate execute SQL we can create relational table
from informatica.
Step3: Drag Source/Target definition into mapping Designer and
Here we will do Query override to extract latest data using below query
SELECT EMPNO,ENAME,SAL FROM SRC_EMP_SCD2 WHERE
UPDATE_DATE>TO_DATE('$$SET_MAX_DATE','MM/DD/YYYY HH24:MI:SS')
Step4: Now create Lookup T/R and select Target table as Base
table for lookup to compare data between Source and target data.
Here we need to compare source and target data based on key
columns (which is EMPNO in this case), and lookup on required columns which
will help you to get latest data into cache from target table.
I’m using below query to get active data from target dim table
in Lookup SQL Override
SELECT SURR_KEY AS SURR_KEY,EMPNO AS EMPNO,ENAME AS ENAME,SAL AS
SAL FROM TGT_EMP_SCD2 WHERE EFF_END_DATE>SYSDATE
Step5: Now create
Expression T/R and drag Surrogate Key and SAL column from lookup T/R and all
columns from Source Qualifier.
Here we need to create
some additional columns for “Effective Start Date” & “Effective End Date”
and for Inactive record here will create “IN_ACT_RECORD” as Date data type with
SYSDATE.
Before moving ahead let’s
create Mapping variable for incremental load.
For Incremental load we
have to assign new Max value to mapping variable using “SETMAXVARIABLE”, once
session runs successfully. Mapping variable will be assigned new value and
using this value we can extract latest data from source.
Here below are some
additional port logic
UPDATED_DATE : SETMAXVARIABLE($$SET_MAX_DATE,UP_DT)
for extract latest data
Effective_start_Date:
SYSDATE
Effective_End_Date: TO_DATE('01/01/5000','MM/DD/YYYY')
to identify active and Inactive data based on this column
IN_ACT_RECORD: SYSDATE to
mark active data
Step6: Create Router T/R
to route data into different pipelines for this we have to create different
group in Router T/R
Group1(INSERT):
ISNULL(SURR_KEY) for new data
Group2 (INSERT+UPDATE): NOT
ISNULL(SURR_KEY) AND (SAL != SAL1) for updated data
Step7: create sequence generator
T/R to generate unique number into Target Dim table and map this NEXTVAL port
to SURR_KEY port into Target instances
Step8: Create Update Strategy
T/R to update existing record if data got updated into source table, in update
Strategy Expression mention “DD_UPDATE” to update record into Target table.
Step9: Drag all required
ports from Router “Insert” Group into Target Instance1
And drag all required
ports from Router “INSERT_UPDATE” into Target Instance2 for existing record for
INSERT.
Now drag SURR_KEY and SAL
ports from “INSERT_UPDATE” into Target Instance3 for updating.
Step10: With this mapping
is done now let’s create session and provide all source, target and lookup connections.
Below is created mapping
Now let’s run session,
below is target data after first load
Now modify record into source with “Updated_Date” column and rerun session again
Verify target table, here
we can verify updated Effective_End_Date column with current date as inactive
record.
Verify persistence value in session by right clicking on session, In next run it will consider persistent value for Mapping variable.
No comments:
Post a Comment