Tuesday, June 16, 2015

XML Transformation

XML Source Qualifier Transformation: You can add an XML Source Qualifier transformation to a mapping by dragging an XML source definition to the Mapping Designer workspace or by manually creating one.
 We can link one XML source definition to one XML Source Qualifier transformation.
 We cannot link ports from more than one group in an XML Source Qualifier transformation to ports in the same target transformation.

XML Parser Transformation: The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases.
 Used when we need to extract XML data from a TIBCO source and pass the data to relational targets.
 The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports.

XML Generator Transformation: The XML Generator transformation lets you read data from messaging systems, such as TIBCO and MQ Series, or from other sources, such as files or databases.
 Used when we need to extract data from relational sources and passes XML data to targets.
 The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.

Stage 1 (Oracle to XML)
 We are gonna generate an xml file as output with the oracle emp table as source.

Step 1: Generate the XML target file.
• Import the same emp table as source table
• Go the targets and click on import the XML definition.
• Later choose the Non XML source from the left hand pane.
• Move the emp table (source table) from all sources to the Selected Sources.
• After which, we got to click on open to have the target table in the target designer.
• Sequential Steps to generate the xml target table is shown in below snap shots.

Step 2: Design the mapping, connect the SQ straight away to the target table.
• Create the name of the mapping as per the naming convention.
• Save the changes.

Step 3: Create task and the work flow.
 • Double click on the work flow and go to the mapping tab and here we got to specify the output file directory. (C :/) ….
• Run the work flow ,check in the C drive and look for an file by name emp.xml …

Stage 2 (XML to Oracle)
 Here source is gonna be the xml file and the target file is the oracle file.

Step 1: Importing the source xml file and import the target transformation.
• Go the sources and click on the import XML definition.
• Browse for the emp.xml file and open the same.
• The first three windows are gonna be same as in previous case.
• Target table is gonna be the same EMP table.

Step 2: Design the mapping.
• Connections for this mapping is gonna be the following way.
• Save the mapping.

Step 3: Create the task and work flow.
• Create the task and the work flow using the naming conventions.
• Go to the mappings tab and click on the Source on the left hand pane to specify the path for the input file.

 Step 4: Preview the output on the target table.

Saturday, June 13, 2015

Sorter & Router Transformation

Sorter Transformation


• Connected and Active Transformation
• The Sorter transformation allows us to sort data.
• We can sort data in ascending or descending order according to a specified sort key.
• We can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct.
When we create a Sorter transformation in a mapping, we specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. We also configure sort criteria the Power Center Server applies to all sort key ports and the system resources it allocates to perform the sort operation.
The Sorter transformation contains only input/output ports. All data passing through the Sorter transformation is sorted according to a sort key. The sort key is one or more ports that we want to use as the sort criteria.
Sorter Transformation Properties
1. Sorter Cache Size:
The Power Center Server uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Power Center Server passes all incoming data into the Sorter transformation Before it performs the sort operation.
• We can specify any amount between 1 MB and 4 GB for the Sorter cache size.
• If it cannot allocate enough memory, the Power Center Server fails the Session.
• For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Power Center Server machine.
• Informatica recommends allocating at least 8 MB of physical memory to sort data using the Sorter transformation.
2. Case Sensitive:
The Case Sensitive property determines whether the Power Center Server considers case when sorting data. When we enable the Case Sensitive property, the Power Center Server sorts uppercase characters higher than lowercase characters.
3. Work Directory
Directory Power Center Server uses to create temporary files while it sorts data.
4. Distinct:
Check this option if we want to remove duplicates. Sorter will sort data according to all the ports when it is selected.
Performance Tuning:
Sorter transformation is used to sort the input data.
1. While using the sorter transformation, configure sorter cache size to be larger than the input data size.
2. Configure the sorter cache size setting to be larger than the input data size while Using sorter transformation.
3. At the sorter transformation, use hash auto keys partitioning or hash user keys Partitioning.


Router Transformation


• Active and connected transformation.
A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the Condition. However, a Router transformation tests data for one or more conditions And gives you the option to route rows of data that do not meet any of the conditions to a default output group.
Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation.
A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that we configure in the Designer.
Working with Groups
A Router transformation has the following types of groups:
• Input: The Group that gets the input ports.
• Output: User Defined Groups and Default Group. We cannot modify or delete Output ports or their properties.
User-Defined Groups: We create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter Condition. We can create and edit user-defined groups on the Groups tab with the Designer. Create one user-defined group for each condition that we want to specify.
The Default Group: The Designer creates the default group after we create one new user-defined group. The Designer does not allow us to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the IS passes the row to the default group

Friday, January 16, 2015

Rank Transformation, Sequence generator & Aggregator Transformation

·         Active and connected transformation
The Rank transformation allows us to select only the top or bottom rank of data. It allows us to select a group of top or bottom values, not just one value.
During the session, the Power Center Server caches input data until it can perform the rank calculations.
Rank Transformation Properties :
·         Cache Directory where cache will be made.
·         Top/Bottom Rank as per need
·         Number of Ranks Ex: 1, 2 or any number
·         Case Sensitive Comparison can be checked if needed
·         Rank Data Cache Size can be set
·         Rank Index Cache Size can be set
Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation. The Power Center Server uses the Rank Index port to store the ranking position for each row in a group.
For example, if we create a Rank transformation that ranks the top five salaried employees, the rank index numbers the employees from 1 to 5.
·         The RANKINDEX is an output port only.
·         We can pass the rank index to another transformation in the mapping or directly to a target.
·         We cannot delete or edit it.
Defining Groups
Rank transformation allows us to group information. For example: If we want to select the top 3 salaried employees of each Department, we can define a group for Department.
·         By defining groups, we create one set of ranked rows for each group.
·         We define a group in Ports tab. Click the Group By for needed port.
·         We cannot Group By on port which is also Rank Port.

·         Passive and Connected Transformation.
·         The Sequence Generator transformation generates numeric values.
·         Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.
We use it to generate Surrogate Key in DWH environment mostly. When we want to maintain history, then we need a key other than Primary Key to uniquely identify the record. So we create a Sequence 1,2,3,4 and so on. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and can’t maintain history. So we use Surrogate key as Primary key and not EMPNO.
Sequence Generator Ports :
The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVAL.
·         We cannot edit or delete these ports.
·         Likewise, we cannot add ports to the transformation.
NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.
CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.
·         We typically only connect the CURRVAL port when the NEXTVAL port is already connected to a downstream transformation.
·         If we connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row.
·         When we connect the CURRVAL port in a Sequence Generator Transformation, the Integration Service processes one row in each block.
·         We can optimize performance by connecting only the NEXTVAL port in a Mapping.
Points to Ponder:
·         If Current value is 1 and end value 10, no cycle option. There are 17 records in source. In this case session will fail.
·         If we connect just CURR_VAL only, the value will be same for all records.
·         If Current value is 1 and end value 10, cycle option there. Start value is 0.
·         There are 17 records in source. Sequence: 1 2 – 10. 0 1 2 3 –
·         To make above sequence as 1-10 1-20, give Start Value as 1. Start value is used along with Cycle option only.
·         If Current value is 1 and end value 10, cycle option there. Start value is 1.
·         There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in repository. If we run session again, sequence will start from 8.
·         Use reset option if you want to start sequence from CURR_VAL every time.


·         Connected and Active Transformation
·         The Aggregator transformation allows us to perform aggregate calculations, such as averages and sums.
·         Aggregator transformation allows us to perform calculations on groups.
Components of the Aggregator Transformation
1.    Aggregate expression
2.    Group by port
3.    Sorted Input
4.    Aggregate cache
1) Aggregate Expressions
·         Entered in an output port.
·         Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate functions:
·         AVG, COUNT, MAX, MIN, SUM
·         FIRST, LAST
·         MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))
Nested Aggregate Functions
·         In Aggregator transformation, there can be multiple single level functions or multiple nested functions.
·         An Aggregator transformation cannot have both types of functions together.
·         MAX( COUNT( ITEM )) is correct.
·         MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function nested within another aggregate function
Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
·         SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
·         IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
2) Group By Ports
·         Indicates how to create groups.
·         When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
·         In Aggregator Transformation, Open Ports tab and select Group By as needed.
3) Using Sorted Input
·         Use to improve session performance.
·         To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
·         When we use this option, we tell Aggregator that data coming to it is already sorted.
·         We check the Sorted Input Option in Properties Tab of the transformation.
·         If the option is checked but we are not passing sorted data to the transformation, then the session fails.
4) Aggregator Caches
·         The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
·         It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.


Wednesday, December 17, 2014

Filter Transformation

Filter transformation is an active & connected transformation. As an active transformation, Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only those rows that return TRUE pass through this transformation. Other rows discarded do not appear in the session log or reject files/bad file.
You cannot concatenate ports from more than one transformation into the Filter transformation. The input ports for the filter must come from a single transformation. The Filter transformation does not allow setting output default values.
Creating Filter Transformation:
Follow the below steps to create a filter transformation:-
In the mapping designer, open a mapping or create a new mapping.
Go to the toolbar->click on Transformation->Create->Select the filter transformation
Enter a name->Click on create and then click on done.
You can add ports either by dragging from other transformations or manually creating the ports within the transformation.
Specifying Filter Condition:
To configure the filter condition, go to the properties tab and in the filter condition section open the expression editor. Enter the filter condition you want to apply. Click on validate button to verify the syntax and then click OK.
Components of Filter Transformation:
The filter transformation has the following components.
Transformation: You can enter the name and description of the transformation.
Ports: Create new ports and configure them
Properties: You can provide filter condition to filter the rows. You can also configure tracing levels.
Metadata Extensions: Specify the metadata details like name, data type.
Configuring Filter Transformation:
The following properties needs to be configured on the ports tab in filter transformation
Port name: Enter the name of the ports created.
Port type: All the ports in filter transformation are input/output.
Data type, precision, and scale: Configure the data type and set the precision and scale for each port.
Filter Transformation examples:
Create a mapping to load the employee’s details into the target who has join organization after Dec-2012?
Filter condition: HIREDATE >=’01-01-2013’
It will reject employee’s details other than who has joined organization after ‘31-Dec-2012’
Performance Tuning Tips:
    1.   Use the filter transformation as close as possible to the sources in the mapping. This will reduce the number of rows to be processed in the downstream transformations.
    2.   In case of relational sources, if possible use the source qualifier transformation to filter the rows. This will reduce the number of rows to be read from the source.
Note: The input ports to the filter transformation must come from a single transformation. You cannot connect ports from more than one transformation to the filter Transformation.


Tuesday, December 16, 2014

Source Qualifier

Source Qualifier Transformation:
This is Active and Connected Transformation. The Source Qualifier transformation represents the rows that the Power Center Server reads when it runs a session. Source Qualifier Transformation is not reusable. This is default transformation except in case of XML or COBOL files.
Tasks performed by Source Qualifier:
  • Join data originating from the same source database: We can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation.
  • Filter rows when the Power Center Server reads source data: If we Include a filter condition, the Power Center Server adds a WHERE clause to the Default query.
  • Specify an outer join rather than the default inner join: If we include a User-defined join, the Power Center Server replaces the join information Specified by the metadata in the SQL query.
  • Specify sorted ports: If we specify a number for sorted ports, the Power Center Server adds an ORDER BY clause to the default SQL query.
  • Select only distinct values from the source: If we choose Select Distinct, the Power Center Server adds a SELECT DISTINCT statement to the default SQL query.
  • Create a custom query to issue a special SELECT statement for the Power Center Server to read source data: For example, you might use a Custom query to perform aggregate calculations. The entire above are possible in Properties Tab of Source Qualifier transformation.                                                             
 Source Qualifier Properties Tab:
1) SOURCE FILTER:
We can use a source filter to reduce the number of rows the Power Center Server queries.
Note: When we use a source filter in the session properties, we override the customized SQL query in the Source Qualifier transformation.
2) NUMBER OF SORTED PORTS:
When we use sorted ports, the Power Center Server adds the ports to the ORDER BY clause in the default query. By default it is 0. If we change it to 1, then the data will be sorted by column that is at the top in Source If we change it to 2, then data will be sorted by top two columns.
3) SELECT DISTINCT:
If we want the Power Center Server to select unique values from a source, we can use the Select Distinct option.Just check the option in Properties tab to enable it.
4) PRE-SESSION and POST-SESSION Commands:
  • The Power Center Server runs pre-session SQL commands against the source database before it reads the source.
  • It runs post-session SQL commands against the source database after it writes to the target.
  • Use a semi-colon (;) to separate multiple statements.
5) USER DEFINED JOINS:
Entering a user-defined join is similar to entering a custom SQL query. However, we only enter the contents of the WHERE clause, not the entire query.
We can specify equi join, left outer join and right outer join only. We cannot specify full outer join. To use full outer join, we need to write SQL Query.
6) SQL QUERY:
For RDBMS sources, the Power Center Server generates a query for each Source Qualifier transformation when it runs a session. The default query is a SELECT statement for each source column used in the mapping. In other words, the Power Center Server reads only the columns that are connected to another Transformation
Note: If we do not cancel the SQL query, the Power Center Server overrides the default query with the custom SQL query. We can enter an SQL statement supported by our source database. Before entering the query, connect all the input and output ports we want to use in the mapping.
Important Points:
  • When creating a custom SQL query, the SELECT statement must list the port names in the order in which they appear in the transformation.
  • Make sure to test the query in database first before using it in SQL Query. If query is not running in database, then it won’t work in Informatica too.
  • Also always connect to the database and validate the SQL in SQL query editor.