Informatica-ETL performance Tuning: Informatica performance tuning is an important aspect of ETL development or Data warehouse.
Now a
days business people wants data to be more fast more responsive, in
this situation we have to look for solution where we can improve
our designed system.
While
doing performance tuning we have to see all possible aspect like database, ETL
tool and hardware resource.
Below are
some of the points which will help to improve performance of data
warehouse
1. In
Database like Oracle, SQL Server, Sybase, DB2 we can sort, Grouping and
aggregation and keep data into staging environment.
2. We try
to avoid synonyms, DB links and remote access database and try to keep all
tables in one instance.
3. Localize
all target table and stored procedure, function, views & sequence in source
DB.
4.
Informatica runs well in RDBMS engine but not good on Java, OLAP and reporting
engines.
5. Turn Off
VERBOSE tracing in informatica session, due to this informatica takes more time
to write into logs file.
6. Turn Off
collect performance statistics from informatica.
7. If
source is flat file use staging tables to reduce multiple transformation in
mapping.
8. Try
to avoid use of non-lookup cached if your lookup is growing.
9. Try to
keep complex mapping, will divide complex mapping into parallel processing.
10. We need
to keep equal balance between informatica and database, but here we have to use
possible capability from DB to avoid more operation in informatica.
11. We can
use DBMS for (Reading, Writing, Grouping, Filtering data) and for more
complex logic, outside join, integrating various source, multiple
source feed in this we have to identify what operation we can done in DB and
informatica.
12. We have
to Tune Session Setting by checking "Throttle Reader" and Increasing
default buffer size.
13. While
installing Informatica server we have to choose with good server so that
PMServer will have good space to save the data.
14. Avoid DB
sequence generator, if you cannot then use staging tables and if sequence
generator is shared one then add Seq ID from flat file and call post target
load stored procedure to populate column.
Tuning the Session
1. Partition
the session this creates many connections to the source and target, and loads
data in parallel pipelines. Each pipeline will be independent of the other. But
the performance of the session will not improve if the number of records is
less. Also the performance will not improve if it does updates and deletes. So
session partitioning should be used only if the volume of data is huge and the
job is mainly insertion of data.
2. Run the
sessions in parallel rather than serial to gain time, if they are independent
of each other.
3. Drop
constraints and indexes before we run session. Rebuild them after the session
run completes. Dropping can be done in pre-session script and Rebuilding in
post session script. But if data is too much, dropping indexes and then
rebuilding them etc. will be not possible. In such cases, stage all data,
pre-create the index, use a transportable table space and then load into
database.
4. Use bulk
loading, external loading etc. Bulk loading can be used only if the table does
not have an index.
5. In a
session we have options to ‘Treat rows as ‘Data Driven, Insert, Update and
Delete’. If update strategies are used, then we have to keep it as ‘Data Driven’.
But when the session does only insertion of rows into target table, it has to
be kept as ‘Insert’ to improve performance.
6. Increase
the database commit level (The point at which the Informatica server is set to
commit data to the target table.
Tuning Of Mapping
1. We can avoid
executing major SQL queries from Mapplets or mappings.
2. Use
optimized queries when we are using them.
3. Reduce
the number of transformations in the mapping. Active transformations like rank,
joiner, filter, aggregator etc. should be used as less as possible.
4. Remove
all the unnecessary links between the transformations from mapping.
5. If a
single mapping contains many targets, then dividing them into separate mappings
can improve performance.
6. If we
need to use a single source more than once in a mapping, then keep only one
source and source qualifier in the mapping. Then create different data flows as
required into different targets or same target.
7. If a
session joins many source tables in one source qualifier, then an optimizing
query will improve performance.
8. In the
SQL query that Informatica generates, ORDER BY will be present.
Remove the ORDER BY clause if not needed or at least reduce the number of
column names in that list. For better performance it is best to order by the
index field of that table.
9. Combine
the mappings that use same set of source data.
10. On
a mapping, field with the same information should be given the same type and
length throughout the mapping. Otherwise time will be spent on field
conversions.
11. Instead of doing complex calculation in query, use
an expression transformer and do the calculation in the mapping.
12. If data is passing through multiple staging,
remove the staging area will increase performance.
13. Try to keep
the stored procedures simple in the mappings.
14. Unnecessary data type conversions should be avoided.
Tuning of Lookup Transformations
1. To
improve performance, cache the lookup tables. Informatica can cache all the
lookup and reference tables; this makes operations run very fast. (Meaning of
cache is given in point 2 of this section and the procedure for determining the
optimum cache size is given at the end of this document.)
2. Even
after caching, the performance can be further improved by minimizing the size
of the lookup cache. Reduce the number of cached rows by using a SQL override
with a restriction.
6 comments:
Thanks for your post and very useful for me. Great content of different kinds of information. I want many of the ideas on this topic and please keep updating...
Informatica Training in Chennai
Informatica Training Center Chennai
Primavera Training in Chennai
Html5 Training in Chennai
Xamarin Training in Chennai
Social Media Marketing Courses in Chennai
Tableau Training in Chennai
Informatica Training in Tambaram
Informatica Training in Adyar
Thanks for your efforts in sharing this post with us. This was really awesome. kindly keep continuing the great work.
English Speaking Classes in Mumbai
English Speaking Course in Mumbai
Best English Speaking Classes in Mumbai
Spoken English Classes in Mumbai
English Classes in Mumbai
Spoken English in Chennai
IELTS Coaching in Chennai
IELTS Coaching in Mumbai
It is very informative blog and useful article thank you for sharing with us , keep posting learn Informatica Certification
I beyond question welcome the way wherein you keep posting such astonishing and significant articles and keep invigorating your blog with latest bearing.
C and C++ Training Institute in chennai | C and C++ Training Institute in anna nagar | C and C++ Training Institute in omr | C and C++ Training Institute in porur | C and C++ Training Institute in tambaram | C and C++ Training Institute in velachery
Thank you so much for highlighting how Informatica and its combination with ETL tools work to make up the best models and find solutions.
Informatica Read Soap API
oracle bpm online training
angular js online training
sql server dba online training
oracle golden gate online training
Post a Comment