Learn Develop Data Engineering: Informatica-ETL performance Tuning

Informatica-ETL performance Tuning: Informatica performance tuning is an important aspect of ETL development or Data warehouse.

Now a days business people wants data to be more fast more responsive, in this situation we have to look for solution where we can improve our designed system.

While doing performance tuning we have to see all possible aspect like database, ETL tool and hardware resource.

Below are some of the points which will help to improve performance of data warehouse

1. In Database like Oracle, SQL Server, Sybase, DB2 we can sort, Grouping and aggregation and keep data into staging environment.

2. We try to avoid synonyms, DB links and remote access database and try to keep all tables in one instance.

3. Localize all target table and stored procedure, function, views & sequence in source DB.

4. Informatica runs well in RDBMS engine but not good on Java, OLAP and reporting engines.

5. Turn Off VERBOSE tracing in informatica session, due to this informatica takes more time to write into logs file.

6. Turn Off collect performance statistics from informatica.

7. If source is flat file use staging tables to reduce multiple transformation in mapping.

8. Try to avoid use of non-lookup cached if your lookup is growing.

9. Try to keep complex mapping, will divide complex mapping into parallel processing.

10. We need to keep equal balance between informatica and database, but here we have to use possible capability from DB to avoid more operation in informatica.

11. We can use DBMS for (Reading, Writing, Grouping, Filtering data) and for more complex logic, outside join, integrating various source, multiple source feed in this we have to identify what operation we can done in DB and informatica.

12. We have to Tune Session Setting by checking "Throttle Reader" and Increasing default buffer size.

13. While installing Informatica server we have to choose with good server so that PMServer will have good space to save the data.

14. Avoid DB sequence generator, if you cannot then use staging tables and if sequence generator is shared one then add Seq ID from flat file and call post target load stored procedure to populate column.

Tuning the Session

1. Partition the session this creates many connections to the source and target, and loads data in parallel pipelines. Each pipeline will be independent of the other. But the performance of the session will not improve if the number of records is less. Also the performance will not improve if it does updates and deletes. So session partitioning should be used only if the volume of data is huge and the job is mainly insertion of data.

2. Run the sessions in parallel rather than serial to gain time, if they are independent of each other.

3. Drop constraints and indexes before we run session. Rebuild them after the session run completes. Dropping can be done in pre-session script and Rebuilding in post session script. But if data is too much, dropping indexes and then rebuilding them etc. will be not possible. In such cases, stage all data, pre-create the index, use a transportable table space and then load into database.

4. Use bulk loading, external loading etc. Bulk loading can be used only if the table does not have an index.

5. In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’. But when the session does only insertion of rows into target table, it has to be kept as ‘Insert’ to improve performance.

6. Increase the database commit level (The point at which the Informatica server is set to commit data to the target table.

Tuning Of Mapping

1. We can avoid executing major SQL queries from Mapplets or mappings.

2. Use optimized queries when we are using them.

3. Reduce the number of transformations in the mapping. Active transformations like rank, joiner, filter, aggregator etc. should be used as less as possible.

4. Remove all the unnecessary links between the transformations from mapping.

5. If a single mapping contains many targets, then dividing them into separate mappings can improve performance.

6. If we need to use a single source more than once in a mapping, then keep only one source and source qualifier in the mapping. Then create different data flows as required into different targets or same target.

7. If a session joins many source tables in one source qualifier, then an optimizing query will improve performance.

8. In the SQL query that Informatica generates, ORDER BY will be present. Remove the ORDER BY clause if not needed or at least reduce the number of column names in that list. For better performance it is best to order by the index field of that table.

9. Combine the mappings that use same set of source data.

10. On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions.

11. Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping.

12. If data is passing through multiple staging, remove the staging area will increase performance.

13. Try to keep the stored procedures simple in the mappings.

14. Unnecessary data type conversions should be avoided.

Tuning of Lookup Transformations

1. To improve performance, cache the lookup tables. Informatica can cache all the lookup and reference tables; this makes operations run very fast. (Meaning of cache is given in point 2 of this section and the procedure for determining the optimum cache size is given at the end of this document.)

2. Even after caching, the performance can be further improved by minimizing the size of the lookup cache. Reduce the number of cached rows by using a SQL override with a restriction.

Learn Develop Data Engineering

Pages

Wednesday, September 9, 2015

Informatica-ETL performance Tuning

6 comments: