Wednesday, August 26, 2015

Pushdown Optimization in Informatica

PUSHDOWN Optimization: pushdown optimization technique in informatica pushes transformation logic to source or target databases. When you run a session configured for pushdown optimization, the Integration Service analyzes the mapping and transformations to determine the transformation logic it can push to the database.
Integration service reads transformation logic and convert it into SQL queries and sends those converted SQL queries to DB. Source or Target database runs those queries to process transformation logic.
We can push transformation logic to database it's purely depend upon database, T/R logic, mappings and session configuration.
We can preview SQL queries and mapping logic which integration service uses to push in pushdown optimization viewer.
Pushdown optimization enables data processing through informatica transformation can be pushed to RDBMS to get better performance, this minimize data moving between servers.

Types of pushdown optimization
1. Source pushdown optimization: When session configured for source side pushdown optimization Integration service analyzes mapping from Source definition to target definition to determine it cannot push to database or not then it generate SQL "select" statement based on the transformation logic for each transformation it can push to the database. When you run the session, the Integration Service pushes all transformation logic that is valid to push to the database by executing the generated SQL statement. Then, it reads the results of this SQL statement and continues to run the session.
If a session contains an SQL override or lookup override, the Integration Service generates a view based on the SQL override. Then it generates a SELECT statement and runs the SELECT statement against this view. Once session completes, the Integration Service drops the view from the database.
2. Target pushdown optimization: When we run session which is configured for target side pushdown optimization then integration service analyze mapping from target to source definition that it cannot push to target database. then integration service generate SQL insert, update or delete statements based on the transformation logic for each transformation it can push to db, starting with the first transformation in the pipeline it can push to the database. The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the target database. Then, it executes the generated SQL.
3. Full pushdown optimization: The Integration Service pushes as much transformation logic as possible to both source and target databases. If we configuring session for full pushdown optimization, and if Integration Service cannot push all the transformation logic to the database, it will perform partial pushdown optimization instead of full pushdown optimization.
If you want to implement full pushdown optimization, the source and target must be on the same database. When you run a session which is configured for full pushdown optimization, the Integration Service analyzes the mapping starting with the source and analyzes each transformation logic in the pipeline until it analyzes the target. It generates SQL statements that are executed against the source and target database based on the transformation logic it can push to the database. If the session contains an SQL override or lookup override, the Integration Service generates a view and runs a SELECT statement against this view.

Session configuration for pushdown optimization:
session can be configured for pushdown optimization by editing session properties with option available "To Source", "To Target", "Full" & "$$PushdownConfig" once you choose any of these other options get enabled
  • Allow temporary view for pushdown- Allows the Integration Service to create temporary view objects in the database when it pushes the session to the database.
  • Allow Temporary Sequence for Pushdown- Allows the Integration Service to create temporary sequence objects in the database.
  • Allow Pushdown for User Incompatible Connections- Indicates that the database user of the active database has read permission on the idle databases.

 Pushdown Optimization Viewer: Use the Pushdown Optimization Viewer to examine the transformations that can be pushed to the database. Select a pushdown option or pushdown group in the Pushdown Optimization Viewer to view the corresponding SQL statement that is generated for the specified selections.
Pushdown optimizer viewer pops up in a new window and it shows how integration service converts the data transformation logic into SQL statement for a particular mapping. When you select a pushdown option or pushdown group in the viewer, you do not change the pushdown configuration. To change the configuration, we must update the pushdown option in the session properties.

Pushdown Optimization and SCD: Slowly changing dimensions type 1 and type 3 logic can be pushed to the database using pushdown optimization technique. Use the following rules and guidelines for pushing SCD logic to database.
Type 1 and type 3 SCDs can be pushed to oracle and IBM DB2 databases.
Source data must not have duplicate rows. The database can become deadlocked if it makes multiple updates to the same row.
You must create SCDs using wizard in version 8.5 or higher. SCDs created in lower version cannot be pushed to database.

Important points on pushdown optimization:
  • Use pushdown optimization technique when the source or target database gives better performance when compared to the informatica.
  • When you run a session for full pushdown optimization, the database must run a long transaction, if the session contains a large quantity of data. Consider the following database performance issues when you generate a long transaction.
  •  A long transaction uses more database resources.
  •  A long transaction locks the database for longer periods of time, and thereby reduces the database concurrency and increases the likelihood of deadlock.
  • A long transaction can increase the likelihood that an unexpected event may occur.
  • Some cases where integration service and pushdown optimization can produce different results even though same transformation logic is there. This can happen while data conversion, Case sensitivity, sequence generation and sorting data.

Nulls treated as the highest or lowest value: While sorting the data, the Integration Service can treat null values as lowest, but database treats null values as the highest value in the sort order.
SYSDATE built-in variable: Built-in Variable SYSDATE in the Integration Service returns the current date and time for the node running the service process. However, in the database, the SYSDATE returns the current date and time for the machine hosting the database. If the time zone of the machine hosting the database is not the same as the time zone of the machine running the Integration Service process, the results can vary.
Date Conversion: The Integration Service converts all dates before pushing transformations to the database and if the format is not supported by the database, the session fails.
Logging: When the Integration Service pushes transformation logic to the database, it cannot trace all the events that occur inside the database server. The statistics the Integration Service can trace depend on the type of pushdown optimization. When the Integration Service runs a session configured for full pushdown optimization and an error occurs, the database handles the errors. When the database handles errors, the Integration Service does not write reject rows to the reject file.

Monday, August 24, 2015

Load multiple target table using same source table

How to load multiple target table using same source relational table

Step 1: Create target table list flat file and keep all the target table name in which you want to load data.

Step 2: use previously created target table list file as source definition for this mapping, create one input/output port table_name in flat file source definition.

Step 3: drag this source definition in mapping and create SQL transformation after source qualifier with default options, drag and drop port from SQ to SQL T/R.

Step 4: Edit SQL T/R and goto SQL setting tab and enable "Add statistic output port", it will create "NumRowsAffected" port in SQL T/R output port and this ports returns
Total number of rows affected by SQL statements for each input row.

Step 5: Now click on SQL ports tabs and click on SQL query edit box to insert data into different target tables
Click on String Substitution under port in this window and select your port table_name here and provide your query here
SQL query will be like
INSERT INTO ~table_name~ (col1, col2, col3...Coln) select (col1, col2, col3...coln) from source_table
~table_name~ this value will change based on outputreceived from flat file
Here you can't use update query, only insert statement along with select statement.

Step 6: Now we will create expression transformation for row level operations drag all ports from SQL T/R to expression T/R, here we will create
Another output port for target load time either you can use expression "sysdate" or "SESSIONSTARTTIME" for target load time update
In another output port will write in expression to figure out any sql error occured
IIF (ISNULL (SQL_error_input),'No Error', SQL_error_input) and connect all ports to target table SQL_Target_Load_audit.

Step 7: Now we will create one Audit table which will take care all your target load logs every time
And this table will be target for this mapping, if any load fails it will write SQL errors in SQL table

By doing this you can verify your target load time, if any SQL error and no of record processed in each run.

Wednesday, August 19, 2015

Code Deployment in Informatica

Migrate/Deploy Code in Informatica
1. Using Deployment Groups: Creating deployment group in repository manager including all objects into other environment suppose development environment to QA environment this is implemented where version control is used so that later if we want to roll by code to old code, should be able to roll back to previous version.
Below are steps to create deployment groups
In power center repository manager under Tool>deployment> Group click on new provide name select based on your requirement static/dynamic and provide permission on this then start copying objects into newly created deployment group after copying all objects.
A. static deployment group: we manually add objects to static deployment group, dependent objects can be configured automatically, ex- if one mapplet used in various mappings it will automatically added to deployment group
B. dynamic deployment group: we can configure deployment group with query to enable informatica to add object in dynamic type deployment group, we can build query during dynamic deployment group creation.
We can start deploying groups to new environment or QA Environment.
2. Import and export as XML: Here all objects can be exported as XML to destination directory then later on we can import those object in destination environment.

3. Copy/Paste: copy all objects and paste to destination repository using power center repository manager make sure that first copy source and target definition and then mapping, session and workflow sequential order.

Important points during code migration:

Source and target definition not available in Deployed environment: While copying mapping if those source and target definition not available in target environment then mapping get copied with previous source and target shortcuts, we have to make sure before migrating mapping we have to create source and target definition first.
Impacts post migration: Before migration it is necessary to check all source/target that getting updated, those changes might cause other mappings to fail.
Connection setup missed: As database changes in QA and pro in this case your mapping will fail in production
So in this case after migration we have to make sure we have proper database connection with permission.
Permission on Target environment objects: After migration we have to make sure, proper permission given in all the folders to user and admin.
Parameter/Variables: We have to change parameter and variables as well in parameter file after creating it in target environment.
Unix/Shell script: Path in shell script as well as access to all user been done.
Recreating Views: As base table got change we have to create view again for new base tables, which is going to be used in target environment.
Creating indicator file: Creating indicator file or getting indicator file from upstream we have to make sure we have done required changes.

Job Schedule: we have to do required changes for job schedule as we have multiple workflow dependent each other we have to make sure we have proper job scheduling.

Friday, August 14, 2015

Degenerated Dimension

Degenerated Dimension: A degenerated dimension is a dimension key in the fact table that does not have it's own table
A Degenerate dimension is a Dimension which is having single attribute. This dimension is represented as a single value in a fact table.
The data items that are not facts and data items that do not fit into the existing dimensions are called as Degenerate Dimensions.
In below example only one key column which is not associated with any other dimension.
the product_no belongs to product dimension and user_id belongs to user dimension but here Invoice_No not belongs to any dimension it can be treated as primary key for fact table

Degenerate Dimensions are fastest way to group similar transactions.Degenerate Dimensions are used when fact tables represent transaction data.
They can be used as primary key for the fact table but they cannot act as foreign keys.
Pros
1. It's avoiding expensive join between two table with a one-many relationship.
2. By degenerating date dimension, we can avoid need of join between event dim and time dimension.
Cons: It will increase fact table size.

Constraint Based Loading

Constraint Based Loading is used to load data first in parent target table and then child target table, we can specify constraint based loading option in session properties under config tab.
Integration service process record row by row basis, for every row generated by source the integration service first load data into primary key table then foreign key table.
It is helpful to normalize data from de-normalized source data.

important points:
1. constraint based loading option only for insert.
2. can not update or delete rows with this option.
3. it can be implemented only when target having primary and foreign key relationship.
4.target table must be in the same target connection group.
For update and delete using constraint based load, informatica provides option to insert, updates and deletes in target table in other words complete constraint based loading can be enabled by specify custom properties "FullCBLOSupport=Yes" under config tab of session.





We can specify this property in integration service which make it applicable for all session and workflows.
If we are using complete constraint based loading mapping should not contain active transformation which can change Row ID value (Agg, Joiner,Normalizer,Rank,sorter).

Using Row id we can update,delete and insert data into target.