Wednesday, December 17, 2014

Filter Transformation

Filter transformation is an active & connected transformation. As an active transformation, Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only those rows that return TRUE pass through this transformation. Other rows discarded do not appear in the session log or reject files/bad file.
You cannot concatenate ports from more than one transformation into the Filter transformation. The input ports for the filter must come from a single transformation. The Filter transformation does not allow setting output default values.
Creating Filter Transformation:
Follow the below steps to create a filter transformation:-
In the mapping designer, open a mapping or create a new mapping.
Go to the toolbar->click on Transformation->Create->Select the filter transformation
Enter a name->Click on create and then click on done.
You can add ports either by dragging from other transformations or manually creating the ports within the transformation.
Specifying Filter Condition:
To configure the filter condition, go to the properties tab and in the filter condition section open the expression editor. Enter the filter condition you want to apply. Click on validate button to verify the syntax and then click OK.
Components of Filter Transformation:
The filter transformation has the following components.
Transformation: You can enter the name and description of the transformation.
Ports: Create new ports and configure them
Properties: You can provide filter condition to filter the rows. You can also configure tracing levels.
Metadata Extensions: Specify the metadata details like name, data type.
Configuring Filter Transformation:
The following properties needs to be configured on the ports tab in filter transformation
Port name: Enter the name of the ports created.
Port type: All the ports in filter transformation are input/output.
Data type, precision, and scale: Configure the data type and set the precision and scale for each port.
Filter Transformation examples:
Create a mapping to load the employee’s details into the target who has join organization after Dec-2012?
Filter condition: HIREDATE >=’01-01-2013’
It will reject employee’s details other than who has joined organization after ‘31-Dec-2012’
Performance Tuning Tips:
    1.   Use the filter transformation as close as possible to the sources in the mapping. This will reduce the number of rows to be processed in the downstream transformations.
    2.   In case of relational sources, if possible use the source qualifier transformation to filter the rows. This will reduce the number of rows to be read from the source.
Note: The input ports to the filter transformation must come from a single transformation. You cannot connect ports from more than one transformation to the filter Transformation.


Tuesday, December 16, 2014

Source Qualifier

Source Qualifier Transformation:
This is Active and Connected Transformation. The Source Qualifier transformation represents the rows that the Power Center Server reads when it runs a session. Source Qualifier Transformation is not reusable. This is default transformation except in case of XML or COBOL files.
Tasks performed by Source Qualifier:
  • Join data originating from the same source database: We can join two or more tables with primary key-foreign key relationships by linking the sources to one Source Qualifier transformation.
  • Filter rows when the Power Center Server reads source data: If we Include a filter condition, the Power Center Server adds a WHERE clause to the Default query.
  • Specify an outer join rather than the default inner join: If we include a User-defined join, the Power Center Server replaces the join information Specified by the metadata in the SQL query.
  • Specify sorted ports: If we specify a number for sorted ports, the Power Center Server adds an ORDER BY clause to the default SQL query.
  • Select only distinct values from the source: If we choose Select Distinct, the Power Center Server adds a SELECT DISTINCT statement to the default SQL query.
  • Create a custom query to issue a special SELECT statement for the Power Center Server to read source data: For example, you might use a Custom query to perform aggregate calculations. The entire above are possible in Properties Tab of Source Qualifier transformation.                                                             
 Source Qualifier Properties Tab:
1) SOURCE FILTER:
We can use a source filter to reduce the number of rows the Power Center Server queries.
Note: When we use a source filter in the session properties, we override the customized SQL query in the Source Qualifier transformation.
2) NUMBER OF SORTED PORTS:
When we use sorted ports, the Power Center Server adds the ports to the ORDER BY clause in the default query. By default it is 0. If we change it to 1, then the data will be sorted by column that is at the top in Source If we change it to 2, then data will be sorted by top two columns.
3) SELECT DISTINCT:
If we want the Power Center Server to select unique values from a source, we can use the Select Distinct option.Just check the option in Properties tab to enable it.
4) PRE-SESSION and POST-SESSION Commands:
  • The Power Center Server runs pre-session SQL commands against the source database before it reads the source.
  • It runs post-session SQL commands against the source database after it writes to the target.
  • Use a semi-colon (;) to separate multiple statements.
5) USER DEFINED JOINS:
Entering a user-defined join is similar to entering a custom SQL query. However, we only enter the contents of the WHERE clause, not the entire query.
We can specify equi join, left outer join and right outer join only. We cannot specify full outer join. To use full outer join, we need to write SQL Query.
6) SQL QUERY:
For RDBMS sources, the Power Center Server generates a query for each Source Qualifier transformation when it runs a session. The default query is a SELECT statement for each source column used in the mapping. In other words, the Power Center Server reads only the columns that are connected to another Transformation
Note: If we do not cancel the SQL query, the Power Center Server overrides the default query with the custom SQL query. We can enter an SQL statement supported by our source database. Before entering the query, connect all the input and output ports we want to use in the mapping.
Important Points:
  • When creating a custom SQL query, the SELECT statement must list the port names in the order in which they appear in the transformation.
  • Make sure to test the query in database first before using it in SQL Query. If query is not running in database, then it won’t work in Informatica too.
  • Also always connect to the database and validate the SQL in SQL query editor.




Expression Transformation

Expression Transformation is a connected passive transformation (i.e.) Number of input and output Rows is the same), you can modify individual ports of a single row, or add or suppress them. It helps implement the complicated data transforms, applies business logic and performs checks and validations. You can use various function according your business logic.

Built-in Functions:
Data Manipulation: concatenation (CONCAT or ||), Case change (UPPER, LOWER) truncation, InitCap (INITCAP)
Data type conversion: (TO_DECIMAL, TO_CHAR, TO_DATE, TO_FLOAT, and TO_NUMBER)                 
Data cleansing: GREATEST, IN, INSTR, IS_DATE, IS_NUMBER, IS_SPACES, ISNULL, LEAST, LTRIM, RTRIM
Manipulate dates: GET_DATE_PART, IS_DATE, and DIFF_DATES
Scientific calculations and numerical operations: exponential, power, log, modulus (LOG, POWER, SQRT)
Informatica Function: IIF, DECODE

Properties of Expression Transformation:
Active /Passive: Expression Transformation is a Passive transformation as it only modifies the incoming port data, but it won’t affect the number of rows processed.
Connected/Unconnected Transformation: Expression Transformation is a connected   Transformation
Types of ports in Expression Transformation: Input, Output, Variable port.
Expression transformation in Informatica:
  1. Steps to create Expression transformation:In the Mapping Designer, open a Mapping.
  2. Click Transformation > Create. Select Expression transformation.
  3. You can also select Transformation by clicking function button on Informatica Designer.
  4. Enter a name and click done (ex: EXP_DIM_BUN_EMEA).
  5. Select and drag the ports from the source qualifier or other transformations to add to the Expression transformation. You can also open the transformation and create ports manually.
  6. Double-click on the title bar and click on Ports tab. You can create output and variable ports Within the transformation.
  7. In the Expression section of an output or variable port, open the Expression Editor.
  8. Enter an expression. Click Validate to verify the expression syntax.
  9. Click OK.
  10. Assign the port data type, precision, and scale to match the expression return value.
  11. To make it reusable, check the reusable option in the edit properties.
  12. Configure the tracing level on the Properties tab.
  13. Click OK.
  14. Connect the output ports to a downstream transformation or target.


Expression transformation Tabs:
  • Transformation:  Enter the name and description of the transformation. The naming convention              for an Expression transformation is EXP_meaning_full_name. You can also make the transformation reusable.
  • Ports: Create port and configure them.
  • Properties: Configure the tracing level to determine the amount of transaction detail reported in the session log file.
  • Metadata Extensions:  Specify the extension name, data type, precision, and value. You can also create reusable metadata extensions.
  • Configuring Ports: You can configure the following components on the ports tab
  • Port name:  Add the name of the port.
  • Data type, precision, and scale:  Configure the data type and set the precision and scale for each port.
  • Port type: A port can be input, output, input/output, or variable. The input ports receive data and output ports pass data. The input/output ports pass data unchanged. Variable ports   store data temporarily and can store values across the rows.
  • Expression:  Use the Expression Editor to enter expressions. Expressions use the transformation language, which includes SQL-like functions, to perform calculations.
  • Default values and description:  Set default value for ports and add description.

 Purpose of Expression Transformation: The Expression transformation is use to perform non-aggregate calculations for each data. Data can be modified using logical and numeric operators or built-in functions.

Performance Tuning on Expression Transformation:
  • Try to use numeric operation instead of string operations.
  • Use of operators are faster than functions (i.e. || vs. CONCAT).
  • Use transformation variables to break down complex transformation logic into smaller parts.        You can define variable port by clicking on V check box while defining the port. In the expression editor of it, it can all input port and other variable port for calculation.
  • It is highly recommended to define a naming convention for the input and output ports for expressions. For example, all input ports have an in_ prefix for Input port, out_ for output   ports and var_ for variables port.
  • Ports are evaluated in the following order: input ports first, then variable ports and then output ports.


Wednesday, November 26, 2014

DATA WAREHOUSE CONCEPTS

What is BI: Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages methods and technologies that focus on counts, statistics and business objectives to improve business performance.
The objective of Business Intelligence is to better understand customers and improve customer service, make the supply and distribution chain more efficient, and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose. It is used for predictions, time series analysis, financial analysis, what -if simulations etc. Basically it is used for better decision making.

DATA WAREHOUSE CONCEPTS:
Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. This data warehouse supports analytical reporting, structured and/or ad hoc queries and decision making. Data Warehousing involves data cleaning, data integration and data consolidations.

DATA WAREHOUSE DEFINITIONS:
Data warehouse is the de-normalized structure of database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.
Data warehouse is a "Subject-Oriented, Integrated, Time- Variant, Nonvolatile collection of data in support of decision making".
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/ department level.

Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. For example, the SRDB is not simply made accessible to end-users, but is provided structure and organized according to the specific needs.

Integrated: A single source of information for and about understanding multiple areas of interest. The
data warehouse provides one-stop shopping and contains information about a variety of subjects. Thus the OIRAP data warehouse has information on students, faculty and staff, instructional workload, and student outcomes.

Non-Volatile: Stable information that doesn’t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed.
Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse.

Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users.

Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

Other Definitions
Data Warehouse: A data structure that is optimized for distribution. It collects and stores integrated sets of historical data from multiple operational systems and feeds them to one or more data marts. It may also provide end-user access to support enterprise views of data.

Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers.

Staging Area: Any data store that is designed primarily to receive data into a warehousing environment.

Operational Data Store: A collection of data that addresses operational needs of various operational units. It is not a component of a data warehousing architecture, but a solution to operational needs.

OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs. Multidimensional Analysis: The ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as “drilling-down”, “drilling-across” and “slicing and dicing”

Hypercube: A means of visually representing multidimensional data.

Star Schema: A means of aggregating data based on a set of known dimensions. It stores data multidimensionally in a two dimensional Relational Database Management System (RDBMS), such as Oracle.

Snowflake Schema: An extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment.

Multidimensional Database: Also known as MDDB or MDDBS. A class of proprietary, non-relational database management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systems.

OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can incorporate data acquisition, data access, data manipulation, or any combination thereof.

COMPARISON OF DATA WAREHOUSE AND OPERATIONAL DATA 

HOW IS THE WAREHOUSE DIFFERENT?

The data warehouse is distinctly different from the operational data used and maintained by day-to-day operational systems. Data warehousing is not simply an “access wrapper” for operational data, where data is simply “dumped” into tables for direct access. Among the differences:


ETL Development project implementation


Business requirements (Requirement Phase):  Business Relationship Manager committed with Business People (Client) for project deliverable dates and cost of the project. 
Business Relationship Manager will set-up a meeting with offshore team i.e. project manager and ask for the project estimations. Project estimations done then we can sent a mail to Business Relationship Manager we need two more resource for this project.If He is okay for that then our project start.

Client Review (Requirement Phase): Our onsite analyst setup a meeting with client with the help of Business requirement Document and get the clarifications from client. Analyst did the analysis based on Business requirement Document which was provided by client. Final analysis sheet Functional Specification Document will send to the application Subject Matter Expert With CC of development Team.

Review (Requirement Phase): Dev team need to ask any clarification required from analyst.If Subject Matter Expert and Dev team okay for that analysis then whole team will go to the Design Modeller.

Logical Design Modelling (Modelling Phase): Design Modeller will design the tables with the help of inputs from Team using tool(ER Studio Erwin..etc). After Modelling completed Dev team needs to validate those tables and send a conformation mail to Modeller.

Physical Design Modelling (Modelling Phase): Modeller will release the design to Physical Modelling Design. They will create the tables as per design in your dev env(database)

Construction (Development phase and Unit Testing): After Creation of tables then we need to do the development. Ex: Product is coming from product table,Number of Products coming from PRODUCT_SOLD table and percent of Product is the measure needs to calculate from PRODUCT_SOLD_ALL table.Modeller design the one table for all these three objects.You have to create three mapping for these three tables to bring the data in your database. After that based on data and Functional Specification Document you need to get the data as you expected. You need to load the data into one target table by using three source tables.You need to prepare Technical Specification Document,Unit testing cases document and mapping design document.Your code review with Subject Matter Expert.

SIT Phase: After Development complete client will assign the QA team to do the testing.We need to move the code using deployment tools to QA environment.We have to explain all the changes to QA team.

UAT test:  Then They will do the testing and gave the sign off Once Sign off done. We have to call client please do the test in Test environment.  

UAT sign off: If client is okay for that...They will provide the sign off.

Cross functional Teams Reviews: You need to explain the code changes to Cross functional team if they have any impact.

GO live: Move the code to production.