Sunday, September 20, 2015

SCD Type -2 with Effective Date and Incremental load logic




SCD Type -2 with Effective Date and Incremental load logic

Here I’m taking scenario where my dimension is changing slowly and it require history to be maintained, with optimum load performance.
As we are trying to implement SCD2, it is necessary for Source to have Primary key and non-redundant data.
Taking sample data employee table where salary is changing over period of time and rest are the attributes are similar.
Below is a source table, here my source has two extra column “Created Date” and “Updated Date” using this two attributes we can get latest data and overcome from unnecessary extraction.


And below Target table which is having two extra column “Effective Start Date” and “Effective End Date” to find out which is latest data into target dimension table.
Another additional column from source “Surrogate Key” which is uniquely generated number using Sequence generator.



Now let’s start creating mapping.
Step1: Import Source definition into Source Analyzer using DB username and password



Step2: Import Target definition into Target designer or create Target definition and using Generate execute SQL we can create relational table from informatica.



Step3: Drag Source/Target definition into mapping Designer and Here we will do Query override to extract latest data using below query

SELECT EMPNO,ENAME,SAL FROM SRC_EMP_SCD2 WHERE UPDATE_DATE>TO_DATE('$$SET_MAX_DATE','MM/DD/YYYY HH24:MI:SS')


 Step4: Now create Lookup T/R and select Target table as Base table for lookup to compare data between Source and target data.
Here we need to compare source and target data based on key columns (which is EMPNO in this case), and lookup on required columns which will help you to get latest data into cache from target table.
I’m using below query to get active data from target dim table in Lookup SQL Override
SELECT SURR_KEY AS SURR_KEY,EMPNO AS EMPNO,ENAME AS ENAME,SAL AS SAL FROM TGT_EMP_SCD2 WHERE EFF_END_DATE>SYSDATE



Step5: Now create Expression T/R and drag Surrogate Key and SAL column from lookup T/R and all columns from Source Qualifier.
Here we need to create some additional columns for “Effective Start Date” & “Effective End Date” and for Inactive record here will create “IN_ACT_RECORD” as Date data type with SYSDATE.
Before moving ahead let’s create Mapping variable for incremental load.



For Incremental load we have to assign new Max value to mapping variable using “SETMAXVARIABLE”, once session runs successfully. Mapping variable will be assigned new value and using this value we can extract latest data from source.
Here below are some additional port logic
UPDATED_DATE : SETMAXVARIABLE($$SET_MAX_DATE,UP_DT) for extract latest data
Effective_start_Date: SYSDATE
Effective_End_Date: TO_DATE('01/01/5000','MM/DD/YYYY') to identify active and Inactive data based on this column
IN_ACT_RECORD: SYSDATE to mark active data


Step6: Create Router T/R to route data into different pipelines for this we have to create different group in Router T/R
Group1(INSERT): ISNULL(SURR_KEY) for new data
Group2 (INSERT+UPDATE): NOT ISNULL(SURR_KEY) AND (SAL != SAL1) for updated data


Step7: create sequence generator T/R to generate unique number into Target Dim table and map this NEXTVAL port to SURR_KEY port into Target instances


Step8: Create Update Strategy T/R to update existing record if data got updated into source table, in update Strategy Expression mention “DD_UPDATE” to update record into Target table.


Step9: Drag all required ports from Router “Insert” Group into Target Instance1
And drag all required ports from Router “INSERT_UPDATE” into Target Instance2 for existing record for INSERT.
Now drag SURR_KEY and SAL ports from “INSERT_UPDATE” into Target Instance3 for updating.
Step10: With this mapping is done now let’s create session and provide all source, target and lookup connections.
Below is created mapping



Now let’s run session, below is target data after first load






 

Now modify record into source with “Updated_Date” column and rerun session again

 

 Verify target table, here we can verify updated Effective_End_Date column with current date as inactive record.



Verify persistence value in session by right clicking on session, In next run it will consider persistent value for Mapping variable.







Thursday, September 17, 2015

Using Normalizer Transformation, generating Extra Column


Using Normalizer Transformation, generating Extra Column:
This scenario discussed in Facebook group, hence thought of to get solution.
Below was actual requirement.
location,sales
mumbai,100,200,600
bengaluru,200,205,300
Kolkatta,150,100,200
hydrabad,300,250,200
and required target was

Steps to create mapping
Step1: First import/create flat file definition in Source Analyzer


Step2: Create target relational table definition in target designer



Step3: Now let’s create normalizer Transformation to convert column into rows
And Edit Transformation to create port under Normalizer tab “location” with occurrence ‘0’ and “sales” with occurrence ‘3’, after this informatica create to default port “GKID” and “GCID”.
As we know if you have created any column occurrence other than ‘0’ it will create GKID and GCID correspond to that port.
The Normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column. The generated column ID is an index for the instance of the multiple-occurring data. For example, if a column occurs four times in a source record, the Normalizer returns a value of 1, 2, 3, or 4 in the generated column ID based on which instance of the multiple-occurring data occurs in the row.
The naming convention for the Normalizer generated column ID is GCID_<occuring_field_name>.
GKID works as initialize sequence for GCID.



Step4: Now let’s create expression transformation to cleanse the data as well as to generate new column “month” into target, Will pull location, sales and GCID column from Normalizer T/R to expression and use below logic
Location: LTRIM(RTRIM(LOCATION))
SALES_0: TO_INTEGER(SALES)
MONTH: IIF(GCID_SALES=1,'JAN',IIF(GCID_SALES=2,'FEB','MAR'))


Step5: Drag and drop all output column from Expression to target and validate mapping and save it.

Step6: Create session and workflow provide all connection strings and run workflow below is the output achieved from this mapping.
Target look like this




Please provide your comment below.