Learn Develop Data Engineering

Thursday, July 2, 2015

Informatica Installation Guide

Informatica Installation:

Step:1
Informatica PowerCenter trail version can be downloaded from https://edelivery.oracle.com
Log on to https://edelivery.oracle.com and accept the Terms and Conditions.

Step:2
Choose the Product package as shown below and Click Continue.

Step:3
Locate the download package as shown in below image.

Step : 4
Download the packages to D:\INFA9X

Unpack the Installation Package

Step : 1

Unzip all the the four downloaded zip files into D:\INFA9X.

Hint : Use the program WinRAR to unzip all the files. After unzipping you will see below files and folders.

Step : 2

Unzip dac_win_101314_infa_win_32bit_910.zip into the the same folder D:\INFA9X. After unzipping you will see below files and folders.

Install Informatica PowerCenter Server

Step:1
To locate install.exe, Navigate to D:\INFA9X\dac_win_101314_infa_win_32bit_910 as shown in below image. double click on the install.exe.

Step:2
Installation wizard Starts. Choose the installation type.
Click Next.

Step:3
Installation Pre-requisites will be shown before the installation starts as below.
Click Next.

Step : 4

Enter the license key. You can locate the license key from D:\INFA9X\EXTRACT\Oracle_All_OS_Prod.key.

Click Next.
Step : 5

Pre-installation summery will give the items installed during the installation process based on the license key.

Click Next

Step:6
Installation Begins. It takes couple of minutes to finish. Soon after completion of this step, Configuring Domain window opens. Continue the steps from Domain Configuration.

Domain Configuration.

Step :1

Choose “Create a Domain” radio button.
Check “Enable HTTPS for Informatica Administrator”
Leave the Port number as it is and choose “Use a keystore file generated by the installer”

Click Next.

Step : 2
Provide the Repository database details as below.

Database Type : Choose your Repository database (Oracle/SQL Server/Sybase)
Database user ID : Database user ID to connect database.
User Password : Password.
Schema Name : If Schema name is not provided default schema will be used.
Database Address and Port : Machine on which database in installed and default port number.
Database Service Name : Database Name.

Below image shows the configuration using SQL Server.
Click Next.

Step : 3
You can give the Domain details, Admin user details now.

Domain Name : Name of your Domain.
Node Host Name : Machine name on which Informatica Server is running.
Node Name : Name of the Node.
Node Port Number : Leave the default port Number.
Domain user name : This is the Administrator user
Domain password : Administrator password

Note : Remember your Admin User ID, Password to log on to Admin Console later in the installation.

Step: 4
Use the default configuration and Click Next.

Step : 5
Installation is complete and you get the post-installation summery. You get a link to the installation log file and a link to Admin console.
Click Done.

Configure Repository Service

Step : 1
Go to Start menu and Click on “Informatica Administrator Home Page”. This will open up the Admin Console in a web browser.

Step : 2
Log on to Admin console using your Admin User ID and Password. You set your Admin User ID and Password in “Domain Configuration” section Step 3

Step :3
Once you Log on you will see the Screen just like shown below.

Step : 4
Choose your Domain Name from “Domain Navigator”, Click on “Actions”, Choose “New” and “PowerCenter Repository Service”.

Step : 5
A new screen will appear, Provide the details as shown below.

Repository Name : Your Repository Name.
Description : An optional description about the repository.
Location : Choose the Domain you have already created. If you have only one Domain, this value will be pre populated.
License : Choose the license key from the drop down list.
Node : Choose the node name from the drop down list.

Click Next.

Step : 6
A new screen will appear, Provide the Repository database details.

Database Type : Choose your Repository database (Oracle/SQL Server/Sybase)
Username : Database user ID to connect database.
Password : Database user Password.
Connection String : Database Connection String.
Code Page : Database Code Page
Table Space : Database Table Space Name
Choose “No content exists under specified connection string. Create new content”

Click Finish

Step : 7
It takes couple of minutes create Repository content. After the repository creation below screen will be seen.

Step : 8
The repository service will be running in “Exclusive” mode as shown below. This needs to be change to “Normal” before we can configure Integration service.
Click “Edit” Repository Properties.

Step : 9
A pop up window appears, Set the properties

Operation Mode : Normal
Security Audit Trail : No

Click OK.

Click OK for the next two pop up windows which confirms the Repository Restart to change the Repository Operating Mode.

Configure Integration Service

Step : 1
Choose your Domain Name from “Domain Navigator”, Click on “Actions”, Choose “New” and “PowerCenter Integration Service”.

Step : 2
A new window will appear, Provide the details as shown below.

Name : Your Integration Service Name.
Description : An optional description about the repository.
Location : Choose the Domain you have already created. If you have only one Domain, this value will be pre populated.
License : Choose the license key from the drop down list.
Node : Choose the node name from the drop down list.

Click Next.

Step : 3
A new window will appear, Provide the details as shown below.

PowerCenter Repository Service : Choose your Repository Service Name from the drop down list.
Username : Admin user name.
Password : Admin password.
Data Movement Mode : ASCII.

Click Finish.

Step : 4
A pop up window will appear, Choose the Code Page as ANSI.
Click OK.

Step : 5
Window will be closed and you can see all the configured services in the “Domain Navigator”
With that we are all done with the installation and configuration for Informatica PowerCenter Server.

Client Installation.

Step : 1
Go to D:\INFA9X as shown in below image. Click on the install.bat.

Step : 2
Installation wizard Starts.
Click Start.

Step : 3
Installation wizard Starts. Choose the installation type as in the below image.
Click Next.

Step : 4
Installation Pre-requisites will be shown before the installation starts as below.
Click Next.

Step : 5
Choose the client tools you need. Only PowerCenter Client is mandatory.
Click Next.

Step : 6
Choose the client installation directory.
Click Next.

Step : 7
You can choose the type of Eclipse installation in this step. This window will be available if you choose to install “Informatica Developer” or “Data Transformation Studio”.
Click Next.

Step : 8
Pre-installation summery will give the items installed during the installation process.
Click Next.

Step: 9
Installation Begins. It takes one or two minutes to complete this step.

Step : 10
Installation is complete and you get the post-installation summery.

With that we are all done with the installation and configuration for Informatica PowerCenter Client.

Tuesday, June 16, 2015

XML Transformation

XML Source Qualifier Transformation: You can add an XML Source Qualifier transformation to a mapping by dragging an XML source definition to the Mapping Designer workspace or by manually creating one.
We can link one XML source definition to one XML Source Qualifier transformation.
We cannot link ports from more than one group in an XML Source Qualifier transformation to ports in the same target transformation.

XML Parser Transformation: The XML Parser transformation lets you extract XML data from messaging systems, such as TIBCO or MQ Series, and from other sources, such as files or databases.
Used when we need to extract XML data from a TIBCO source and pass the data to relational targets.
The XML Parser transformation reads XML data from a single input port and writes data to one or more output ports.

XML Generator Transformation: The XML Generator transformation lets you read data from messaging systems, such as TIBCO and MQ Series, or from other sources, such as files or databases.
Used when we need to extract data from relational sources and passes XML data to targets.
The XML Generator transformation accepts data from multiple ports and writes XML through a single output port.

Stage 1 (Oracle to XML)
We are gonna generate an xml file as output with the oracle emp table as source.

Step 1: Generate the XML target file.
• Import the same emp table as source table
• Go the targets and click on import the XML definition.
• Later choose the Non XML source from the left hand pane.
• Move the emp table (source table) from all sources to the Selected Sources.
• After which, we got to click on open to have the target table in the target designer.
• Sequential Steps to generate the xml target table is shown in below snap shots.

Step 2: Design the mapping, connect the SQ straight away to the target table.
• Create the name of the mapping as per the naming convention.
• Save the changes.

Step 3: Create task and the work flow.
• Double click on the work flow and go to the mapping tab and here we got to specify the output file directory. (C :/) ….
• Run the work flow ,check in the C drive and look for an file by name emp.xml …

Stage 2 (XML to Oracle)
Here source is gonna be the xml file and the target file is the oracle file.

Step 1: Importing the source xml file and import the target transformation.
• Go the sources and click on the import XML definition.
• Browse for the emp.xml file and open the same.
• The first three windows are gonna be same as in previous case.
• Target table is gonna be the same EMP table.

Step 2: Design the mapping.
• Connections for this mapping is gonna be the following way.
• Save the mapping.

Step 3: Create the task and work flow.
• Create the task and the work flow using the naming conventions.
• Go to the mappings tab and click on the Source on the left hand pane to specify the path for the input file.

Step 4: Preview the output on the target table.

Saturday, June 13, 2015

Sorter & Router Transformation

Sorter Transformation

• Connected and Active Transformation

• The Sorter transformation allows us to sort data.

• We can sort data in ascending or descending order according to a specified sort key.

• We can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct.

When we create a Sorter transformation in a mapping, we specify one or more ports as a sort key and configure each sort key port to sort in ascending or descending order. We also configure sort criteria the Power Center Server applies to all sort key ports and the system resources it allocates to perform the sort operation.

The Sorter transformation contains only input/output ports. All data passing through the Sorter transformation is sorted according to a sort key. The sort key is one or more ports that we want to use as the sort criteria.

Sorter Transformation Properties

1. Sorter Cache Size:

The Power Center Server uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Power Center Server passes all incoming data into the Sorter transformation Before it performs the sort operation.

• We can specify any amount between 1 MB and 4 GB for the Sorter cache size.

• If it cannot allocate enough memory, the Power Center Server fails the Session.

• For best performance, configure Sorter cache size with a value less than or equal to the amount of available physical RAM on the Power Center Server machine.

• Informatica recommends allocating at least 8 MB of physical memory to sort data using the Sorter transformation.

2. Case Sensitive:

The Case Sensitive property determines whether the Power Center Server considers case when sorting data. When we enable the Case Sensitive property, the Power Center Server sorts uppercase characters higher than lowercase characters.

3. Work Directory

Directory Power Center Server uses to create temporary files while it sorts data.

4. Distinct:

Check this option if we want to remove duplicates. Sorter will sort data according to all the ports when it is selected.

Performance Tuning:

Sorter transformation is used to sort the input data.

1. While using the sorter transformation, configure sorter cache size to be larger than the input data size.

2. Configure the sorter cache size setting to be larger than the input data size while Using sorter transformation.

3. At the sorter transformation, use hash auto keys partitioning or hash user keys Partitioning.

Router Transformation

• Active and connected transformation.

A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. A Filter transformation tests data for one condition and drops the rows of data that do not meet the Condition. However, a Router transformation tests data for one or more conditions And gives you the option to route rows of data that do not meet any of the conditions to a default output group.

Mapping A uses three Filter transformations while Mapping B produces the same result with one Router transformation.

A Router transformation consists of input and output groups, input and output ports, group filter conditions, and properties that we configure in the Designer.

Working with Groups

A Router transformation has the following types of groups:

• Input: The Group that gets the input ports.

• Output: User Defined Groups and Default Group. We cannot modify or delete Output ports or their properties.

User-Defined Groups: We create a user-defined group to test a condition based on incoming data. A user-defined group consists of output ports and a group filter Condition. We can create and edit user-defined groups on the Groups tab with the Designer. Create one user-defined group for each condition that we want to specify.

The Default Group: The Designer creates the default group after we create one new user-defined group. The Designer does not allow us to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the IS passes the row to the default group

Friday, January 16, 2015

Rank Transformation, Sequence generator & Aggregator Transformation

Rank Transformation

· Active and connected transformation

The Rank transformation allows us to select only the top or bottom rank of data. It allows us to select a group of top or bottom values, not just one value.

During the session, the Power Center Server caches input data until it can perform the rank calculations.

Rank Transformation Properties :

· Cache Directory where cache will be made.

· Top/Bottom Rank as per need

· Number of Ranks Ex: 1, 2 or any number

· Case Sensitive Comparison can be checked if needed

· Rank Data Cache Size can be set

· Rank Index Cache Size can be set

Rank Index

The Designer automatically creates a RANKINDEX port for each Rank transformation. The Power Center Server uses the Rank Index port to store the ranking position for each row in a group.

For example, if we create a Rank transformation that ranks the top five salaried employees, the rank index numbers the employees from 1 to 5.

· The RANKINDEX is an output port only.

· We can pass the rank index to another transformation in the mapping or directly to a target.

· We cannot delete or edit it.

Defining Groups

Rank transformation allows us to group information. For example: If we want to select the top 3 salaried employees of each Department, we can define a group for Department.

· By defining groups, we create one set of ranked rows for each group.

· We define a group in Ports tab. Click the Group By for needed port.

· We cannot Group By on port which is also Rank Port.

Sequence Generator Transformation

· Passive and Connected Transformation.

· The Sequence Generator transformation generates numeric values.

· Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.

We use it to generate Surrogate Key in DWH environment mostly. When we want to maintain history, then we need a key other than Primary Key to uniquely identify the record. So we create a Sequence 1,2,3,4 and so on. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and can’t maintain history. So we use Surrogate key as Primary key and not EMPNO.

Sequence Generator Ports :

The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVAL.

· We cannot edit or delete these ports.

· Likewise, we cannot add ports to the transformation.

NEXTVAL:

Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.

CURRVAL:

CURRVAL is NEXTVAL plus the Increment By value.

· We typically only connect the CURRVAL port when the NEXTVAL port is already connected to a downstream transformation.

· If we connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row.

· When we connect the CURRVAL port in a Sequence Generator Transformation, the Integration Service processes one row in each block.

· We can optimize performance by connecting only the NEXTVAL port in a Mapping.

Points to Ponder:

· If Current value is 1 and end value 10, no cycle option. There are 17 records in source. In this case session will fail.

· If we connect just CURR_VAL only, the value will be same for all records.

· If Current value is 1 and end value 10, cycle option there. Start value is 0.

· There are 17 records in source. Sequence: 1 2 – 10. 0 1 2 3 –

· To make above sequence as 1-10 1-20, give Start Value as 1. Start value is used along with Cycle option only.

· If Current value is 1 and end value 10, cycle option there. Start value is 1.

· There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in repository. If we run session again, sequence will start from 8.

· Use reset option if you want to start sequence from CURR_VAL every time.

Aggregator Transformation

· Connected and Active Transformation

· The Aggregator transformation allows us to perform aggregate calculations, such as averages and sums.

· Aggregator transformation allows us to perform calculations on groups.

Components of the Aggregator Transformation

1. Aggregate expression

2. Group by port

3. Sorted Input

4. Aggregate cache

1) Aggregate Expressions

· Entered in an output port.

· Can include non-aggregate expressions and conditional clauses.

The transformation language includes the following aggregate functions:

· AVG, COUNT, MAX, MIN, SUM

· FIRST, LAST

· MEDIAN, PERCENTILE, STDDEV, VARIANCE

Single Level Aggregate Function: MAX(SAL)

Nested Aggregate Function: MAX( COUNT( ITEM ))

Nested Aggregate Functions

· In Aggregator transformation, there can be multiple single level functions or multiple nested functions.

· An Aggregator transformation cannot have both types of functions together.

· MAX( COUNT( ITEM )) is correct.

· MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function nested within another aggregate function

Conditional Clauses

We can use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.

· SUM( COMMISSION, COMMISSION > QUOTA )

Non-Aggregate Functions

We can also use non-aggregate functions in the aggregate expression.

· IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))

2) Group By Ports

· Indicates how to create groups.

· When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.

The Aggregator transformation allows us to define groups for aggregations, rather than performing the aggregation across all input data.

For example, we can find Maximum Salary for every Department.

· In Aggregator Transformation, Open Ports tab and select Group By as needed.

3) Using Sorted Input

· Use to improve session performance.

· To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.

· When we use this option, we tell Aggregator that data coming to it is already sorted.

· We check the Sorted Input Option in Properties Tab of the transformation.

· If the option is checked but we are not passing sorted data to the transformation, then the session fails.

4) Aggregator Caches

· The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.

· It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.

Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.