Learn Develop Data Engineering: ETL Developer (Informatica, SQL & DWH) Interview Questions-3

ETL Developer (Informatica, SQL & DWH) Interview Questions-3

1. Can you copy the session to a different folder or repository?

Yes. By using copy session wizard, you can copy a session in a different folder or repository.

2. What is a command that used to run a batch?

pmcmd is used to start a batch.

3. What are the types of mapping in Getting Started Wizard?

Simple Pass through mapping, slowly growing target mapping.

4. What are the different types of Type-2 slowly changing dimensions?

There are three types of slowly changing dimensions

SCD with versioning

SCD with flags

SCD with Date

5. What are the different threads in DTM process?

Master thread

Mapping thread

Reader thread

Writer thread

Pre and post session threads

6. What are the active and passive transformations?

An active transformation can change the number of rows that pass through it.

A passive transformation does not change the number of rows that pass through it.

7. What are the scheduling options to run a session?

Different options of scheduling are

Run only on demand: Informatica server runs the session only when user starts session explicitly.

Run once: Informatica server runs the session only once at a specified date and time.

Run every: Informatica server runs the session at regular intervals as you configured.

Customized repeat: Informatica server runs the session at the date and time specified in the repeat dialog box.

8. How can we store previous session logs?

Just run the session in time stamp mode then automatically session log will not overwrite current session log.

9. What is the difference between Mapping and Mapplet?

Mapping is the collection of Source Definition, Target Definition, Transformation(s) and/or Mapplet.

Mapplet is a set of reusable transformations for applying same business logic.

10. What is the use of Repository Manager?

Repository Manager is used to manage folders, view Source/Target dependencies, Mapping Dependencies, adding and removing repositories, Versioning etc.

11. What are the various Tasks in Informatica Workflow Manager?

The various Tasks in Informatica are

a) Assignment Task

b) Command Task

c) Control Task

d) Decision Task

e) E-mail Task

f) Event Raise Task

g) Event Wait Task

h) Session Task

i) Timer Task

j) Link Task

12. What is the use of Source Analyzer?

Source Analyzer is used to create source definition.

13. What is the use of Target Designer?

Target Designer is used to create Target Definition.

14. What is Target Designer called prior to Informatica 8.6?

Warehouse Designer

15. Can we use Reusable Sequence Generator transformation in Mapplet?

16. Why Cache? Explain with an example?

If there are x number of records with y number of columns from source data and we need to extract z number of columns only (very less) then the cache stores those columns for respective records in the $PMCACHEDIR of Informatica Server so that we don’t need to extract each record from database and load into Informatica. Doing this increases the performance of the system.

17. What is the use of Shared Object?

Suppose there is an object which is used by many users in the company.

If the object undergoes a change, this change has to be updated to each and every user. Instead, if the object is made as shared, then the update has to be done to the object and all other users get the update.

18. What is Dual table?

Dual table is a table that is created by oracle along with data dictionary. It gives output of exactly one column name dummy and one record ‘x’.

Select * from Dual;

-----------

DUMMY

19. Can we install Power Center Informatica 8.X version on Windows 7?

No we can’t install Informatica on Power Center. We can Install Informatica 9.x on Windows 7.

20. What are the types of variables in Informatica?

There are three types of variables in Informatica

Predefined variable represented by $

User defined variable represented with $$

System variable denoted by $$$

21. Difference between Informatica normal join and Oracle Equi join?

Equi join in Oracle is performed on oracle sources (relational sources) while Informatica Equi joins can be performed on non-relational sources too (oracle and flat file etc.)

21. What is degenerate dimension?

Dimension which has no dimension table of its own and is derived from the fact table.

22. What is requirements gathering?

It is carried out by Business Analyst. It is nothing but interacting with end users and getting to know what his requirements are. Based on his requirements, the rest of the phases Analysis, Design, Implementation and Testing and finally Maintenance are carried on.

23. What is Junk dimension?

The dimension that is formed by lumping of smaller dimensions is called Junk dimension.

Collection of random transactional codes, flags that are unrelated to any particular dimension, simply a place where all junk attributes stores.

24. What is Staging Area?

Staging Area is indeed a database where data from different source systems are brought together and this database acts as an input to Data Cleansing.

25. What are the types of joins in Informatica and in Oracle?

There are four types of joins in oracle

equi join

Non equi join

Self-join

Outer join

Joins in Informatica

Master join (right outer join)

Detailed join (left outer join)

Normal join

27. What is the file extension or format of files for the Informatica Objects like sessions, mappings etc. in Repository?
The format of files for Informatica Objects in Repository is XML

28. Where can we find Versioning in Informatica? What happens if Versioning is turned off?
In Informatica, we can find Versioning in Repository Manager. If Versioning is turned off, we will not be able to track the changes for the respective Sessions/Mappings/Workflows.

29. What is tracing level? What are the types of tracing level?
Tracing level is the amount of information that Informatica server writes into a log file.
Types of tracing level
Normal
Terse
Verbose init
verbose data

30. In joiner transformation, we take the table with lesser number of rows as master while the more number of rows as detailed. Why?
In joiner, each and every row of the master is compared with every row of the detailed and so, the less number of rows in master, the less is the number of iterations and so better is the performance of the system.

31. What are all the databases the Informatica server on windows can connect to?
Informatica server on windows can connect to
SQL server database
Oracle
Sybase
Teradata
MS Access
MS Excel
Informix
DB2

32. What are the databases the Informatica server on UNIX can connect to?

Informatica on UNIX can connect to
Sybase
Teradata
Informix
DB2
Oracle

33. What is an overview window?
It is a window where you can see all the transformations in a mapping.

34. In how many ways we can update a source definition?
Two ways
we can reimport the source definition
we can edit the source definition

35. What is mapping?
Mapping is nothing but data flow between source and target.

36. What is a session?
Session is a set of instructions that tells the Informatica server when to and how to move the data from source to targets.

37. What are the different things we can do using PMCMD command?
We can START, STOP and ABORT session using PMCMD command.

38. What are the differences between Informatica 6 and 7?
Informatica 7 has
we can use pmcmd command
Union and custom transformation
Version controlling

39. What is the use of bitmap indexes?
Bitmap indexes are used to join large fact tables to smaller dimension tables.

40. How can we delete duplicate rows from flat files?
We can make use of sorter transformation and select distinct option.

41. If a session fails after loading 10000 records into the target how can we start loading into the target from the 10001th record?
We can run the session with the recovery strategy mode.

42. What is the limit of joiner transformations?
We cannot use sequence generator or update strategy transformations before or after joiner transformations.

43. How does server recognize the source and target databases?
By using ODBC if they are relational, FTP if they are flat files.

44. What is rank index in a group?
Power Center Designer automatically creates a RANK INDEX port while using Rank transformation. The purpose of this RANK INDEX port is to store the ranking for the column(s) we are interested in.

45. What are the constants or flags for each database operation and their numeric equivalent in Update Strategy?
Insert DD_INSERT 0
Update DD_UPDATE 1
Delete DD_DELETE 2
Reject DD_REJECT 3

46. Can you generate reports using Informatica?
No, we cannot generate reports using Informatica, it is just an ETL tool but we can generate metadata reports.

Yes, Using Informatica Data Analyzer Tool we can generate reports.

47. Can you start batches within a batch?
No, you can’t

48. What are the types of groups in Router Transformation?
There are three types of groups in Router Transformation namely
Input group
Otput group
Default group

49. What are batches? What are the types of batches?

Batches provide a way to group sessions for either sequential or parallel execution by Informatica server.

Concurrent batches – which run at the same time
Sequential batches – which run one after the other

50. What is Power Center Repository?
Power Center Repository allows you to share metadata among different repositories and to create a data-mart domain.

51. What are the types of metadata that stores in repository?
Source definition
Target definition
Mappings
Mapplet
Transformations

52. Differences between dynamic cache and static cache
In case of dynamic cache, if we want to insert a new row, then it will first looks in the lookup cache and if the row is not present in the cache, it inserts the row into the cache as well as target but in case of static cache, its stores only into target table and not in cache.

53. What is the use of source qualifier?
Source qualifier is used to convert different data types to Informatica compatible data types.

54. What is page code compatibility?
It is nothing but compatibility of code for maintaining data accuracy. It comes into picture when data is in different languages.

55. What are Synonyms?
Synonyms are alternative names for database objects such as tables, views, stored procedures etc.
Syntax
Create [replace] synonym for [schema.]object_name

56. Types of Lookup Cache?
Static cache
Dynamic cache
Persistent cache
Recache from database
Shared cache

57. What are various stages of SDLC?
Requirements Gathering, Analysis, Design, Implementation and Testing and Maintenance.

58. What is Star Schema?
Star Schema is a simplest form of schema which has one fact table and at least one dimension table. The dimensions here are de-normalized.

59. What is Fact table?
It is a centralized table in Star Schema. Fact table has two types of columns. First type is measures and second type is the foreign keys for the dimension tables.

60. What is Snowflake Schema?
In Snowflake Schema, the dimensions are further divided into sub dimensions. The dimensions here are normalized.

61. What is Dimension table?
A dimension table is one that describes the business entities of an enterprise.

62. What is operational data store (ODS)?
Operational data store is defined to be structure that is Subject-oriented
Integrated Volatile and current data that is a day or perhaps a month old.

63. What is Logical Data Modeling?
Logical Data Modeling is a type of data modeling which represents business requirements of an organization.

64. What is Physical Data Modeling?
Physical Data Modeling is a type of data modeling which includes all required tables, columns, and relationships for the physical implementation of a database.

65. Difference between mapplet and reusable transformation?
Using mapplet we can make set of transformations reusable whereas in reusable transformation, we can make only one transformation as reusable.

66. What is Data cleansing?

It is the process of converting data from different format of files or databases to single required format.

67. What are the types of repositories created by Informatica repository manager?
Four types of repositories are created using repository manager
Standalone repository, Global repository, Local repository, Versioned repository

68. What are the two types of processes that run the session?
The two types of processes that run the session are Load Manager DTM processes (Data Transformation Manager)

69. What is the difference between Union and Lookup transformation?
Union transformation is Active while Lookup transformation is Passive.
For Union transformation, the source tables or data should have similar structure while it’s not the case with the Lookup transformation.

Lookup transformation can run on source or target tables while Union tables work only on source tables.

70. What is throughput in Informatica?
Throughput is nothing but the rate at which Informatica server reads the data from sources and writes them successfully to the target.

71. Where can we find the throughput option in Informatica?
We can view this in workflow monitor
in workflow monitor, right click on session, then click on Get Run Properties and under Source/Target statistics we can find throughput option

72. What is code page?
Code page consists of encoding to specify characters in set of one or more languages and are selected based on source language.

73. What is the difference between Aggregator transformation and Expression Transformation?
Aggregator transformation use aggregator functions and performs calculation on entire group whereas in Expression transformation performs calculation on row by row basis.

74. What is the difference between Filter transformation and Router transformation?
Filter transformation drops the data that do not meet the condition whereas Router transformation captures the data even though the condition is not met and saves it in Default output group.
Filter transformation works on single condition only while Router transformation works on multiple conditions as well.

Filter transformation gives only one output. Router transformation can give more than one output.

75. What is the similarity between Router and Filter transformations?
Router transformation and Filter transformations are used to filter the data based on condition.

Both Filter and Router transformation are Active transformations.

Both Filter and Router transformation are connected.

76. What is the difference between Source Qualifier transformation and Joiner transformation?
Source Qualifier transformation is used to join the data from homogeneous sources while Joiner transformation is used to join data from heterogeneous sources as well as homogenous sources from different schemas.
We need matching keys to join two relational sources in Source Qualifier transformation and is not the case with joiner transformation.

77. What is PARAM file?
Parameter file is an ordinary text file where we can define the value for the parameter which is defined in session.
These parameter files are defined in session properties.

78. Which transformation should we use to normalize the COBOL and relational sources?
We need to make use of Normalizer Transformation

79. What are the various transformations which we cannot use in Mapplet?
Transformations which we cannot use in Mapplet are Normalizer Transformation,
XML Source Qualifier Transformation ,Target Definition Cobol Sources
Pre and Post Session Stored Procedures.

80. What is the difference between Joiner transformation and Lookup transformation?
Joiner is active transformation while Lookup is passive transformation.
Joiner works on source data only while Lookup works on source as well as target data.
Joiner transformation supports equi joins only while Lookup supports equi join as well as non equi joins.

Joiner transformation is connected while Lookup transformation can be either connected or unconnected.

81. What is the use of Lookup transformation?
Lookup transformation is used to check whether matching record exists in the target table and if the matching record doesn’t exist, it inserts accordingly.

82. How can we improve the session performance in Aggregator transformation?
We can increase the session performance by sending the sorted input to the aggregator transformation.

83. What is aggregated cache?
Aggregator cache is a temporary location which stores the input data values while the aggregation calculation is being carried out.

84. What is SQL override?
It is nothing but overriding SQL in source qualifier or lookup for additional logic.

85. What are the default values for variables?
The default variables for
Number=0
Variable=NULL
Date=1/1/1753

86. What are the types of data movement in Informatica server?
There are two types of modes in Informatica Ascii mode Unicode mode

87. What happens to the discarded records in Filter transformation?
Discarded rows do not appear in session logs and reject files.

88. How to increase the performance of session in using Filter transformation?
To increase the performance of session, we need to use Filter transformation as close as possible to the sources in the mapping.

89. In case of flat files (which comes through FTP) haven’t arrived what happens?
The session is going to fail because of fatal error

90. Why do we need to reinitialize aggregate cache?
We need to reinitialize the aggregate cache only to remove the previous data present in the aggregator cache and to be used by the new source.

91. What are the various mapping objects available in Informatica?
Mapping objects that are available in Informatica are
Source Definition
Target Definition
Link
Transformation
Mapplet

92. What is the default source option for Update Strategy?
Data Driven

93. What is the use of Update Strategy transformation?
Update Strategy is used to perform DML operations like Insert, Update, Delete and Reject on already populated targets.

94. What is the default join that the Joiner transformation provides?
Normal join

what are the types of Lookup Cache?
Static cache
Dynamic cache
Persistent cache
Recache from database
Shared cache

95. What are the differences between unique key and primary key?

Primary key cannot contain null value whereas unique key can contain one and only one null value.

In case of sql server, with default options, primary key is created as a clustered index while unique key is created as a non-clustered index

A unique key is similar to primary key but we can have more than one unique key per table

96. What are the types of joins in Informatica and in Oracle?

There are four types of joins in oracle

equi join

non equi join

self join

outer join

Joins in informatica

master join (right outer join)

detailed join (left outer join)

normal join

97. What is the difference between RowId and RowNum?

RowId is the physical address of a row. If we know the RowId, we can read entire row.

RowNum is the temporary number allocated during query execution

98. What are pseudo columns? What are the various types of pseudo columns?

Pseudo columns are columns which are not in the table but they can be used in sequel queries as if they are part of the table

RowNum

RowId

Sysdate

User

Currval

Nextval

99. What are the various types of Statements in Oracle?

The various Statements that are available in Oracle are

Data Manipulation Language (DML) Statements

Data Definition Language (DDL) Statements

Transaction Control (TC) Statements

Session Control (SC) Statements

System Control (SC) Statements

100. What are the various types of DML Statements?

Select

Update

Delete

Insert

Merge

101. What are various AGGREGATOR FUNCTIONS?
SUM
AVG
MIN
MAX
COUNT
STDDEV
VARIANCE
FIRST
LAST

102. What are the types of joins in oracle?
Equijoins or simple joins or inner joins
Non-equijoin
Outer join
Self join

103. Difference between char and varchar?
Char is a fixed length data type where as varchar is a variable length data type. So in case of char, though we entered less size for a column, it is going to allocate max size and in case of varchar, it is not going to allocate the maximum size.

104. Difference between varchar and varchar2
Varchar and varchar2 are variable length datatypes. Varchar datatype has a size of 2000 bytes whereas varchar2 has a size of 4000 bytes.
Varchar is in ascii whereas varchar2 is in unicode.

105. What is Normalization? Define 1NF, 2NF and 3NF.
It is the process of reducing the complex data structure into a simpler one by removing the redundancy.
First normal form: First normal form state that each field is atomic.
Second normal form: Second normal form states that data redundancy can be reduced if all the non key attributes which are dependent on one of the primary keys of a composite primary key are put to a separate table along with depended primary key. This should also satisfy the 1nf. For example, if a table has attributes like partid, city, state and country and partid, country for a composite primary key and city, state depend on country then the table is separated as two different tables as attributes having partid, country and country, state, city.

Third normal form: Third normal form states that if a dependency exist between non key attributes, then these attributes are isolated to a different table. This should also satisfy the 1nf + 2nf.

106. What are types of loading in Informatica?
The two types of loading available in Informatica are
Bulk Loading
Normal Loading

107. What is the difference between local index and global index?
Global index is nothing but a single index covering all partitions whereas local index has separate index for each partition.

108. What is Complex Mapping?
Complex Mapping will have the following features
difficult requirements
many number of transformations
Complex business logic

109. How many ways you can add ports?
Two ways
from other transformation
Click on add port button

110. How many number of sessions can you can you group in batches?
Any number of sessions but the lesser the number of sessions in a batch, the easier the migration.

111. What is Data Driven?
Informatica Server follows the instructions coded into update strategy within session mapping which determine how to flag the records for Insert, Update, Delete and Reject.

112. Difference between connected Lookup and Unconnected Lookup?
Connected lookup receives input directly from mapping pipeline whereas unconnected lookup receives input from: LKP expression of another transformation.
Connected lookup returns more than one column in a row whereas unconnected
lookup returns only one column in each row.
Connected lookup Supports user-defined values while unconnected doesn’t.
Connected lookup is not reusable whereas unconnected is.
Performance of connected lookup is lower compared to unconnected lookup.

113. What is a Mapplet?
Mapplet is an object which consists of set of reusable transformations which can be used in different mappings.

114. Can Lookup be done on flat files?
Yes

115. What is the transformation used in loading 4 flat files of similar structure to a single target?
We can make use of Union transformation

116. Difference between direct and indirect loading options in sessions?
Direct loading can be used on single transformations while indirect loading can be used on multiple transformations in direct loading, we can perform recovery process while in indirect loading, we cannot perform recovery process.

117. What are the various techniques for implementing a dimensional model?
Star schema
Snowflake schema

Galaxy schema

118. What are the types of dimensions?
There are three types of dimensions
slowly changing dimensions
Confirmed dimensions
Casual dimensions

2 comments:

meenati said...: It's a very informative blog and useful article thank you for sharing with us, keep posting learn Informatica Online Training; November 7, 2019 at 5:20 PM
James Zicrov said...: Thank you so much for arranging this question it really helpful for me.

Informatica Read Soap API; May 25, 2021 at 12:01 PM

Learn Develop Data Engineering

Pages

Saturday, September 5, 2015

ETL Developer (Informatica, SQL & DWH) Interview Questions-3

2 comments: