Friday, January 16, 2015

Rank Transformation, Sequence generator & Aggregator Transformation

·         Active and connected transformation
The Rank transformation allows us to select only the top or bottom rank of data. It allows us to select a group of top or bottom values, not just one value.
During the session, the Power Center Server caches input data until it can perform the rank calculations.
Rank Transformation Properties :
·         Cache Directory where cache will be made.
·         Top/Bottom Rank as per need
·         Number of Ranks Ex: 1, 2 or any number
·         Case Sensitive Comparison can be checked if needed
·         Rank Data Cache Size can be set
·         Rank Index Cache Size can be set
Rank Index
The Designer automatically creates a RANKINDEX port for each Rank transformation. The Power Center Server uses the Rank Index port to store the ranking position for each row in a group.
For example, if we create a Rank transformation that ranks the top five salaried employees, the rank index numbers the employees from 1 to 5.
·         The RANKINDEX is an output port only.
·         We can pass the rank index to another transformation in the mapping or directly to a target.
·         We cannot delete or edit it.
Defining Groups
Rank transformation allows us to group information. For example: If we want to select the top 3 salaried employees of each Department, we can define a group for Department.
·         By defining groups, we create one set of ranked rows for each group.
·         We define a group in Ports tab. Click the Group By for needed port.
·         We cannot Group By on port which is also Rank Port.

·         Passive and Connected Transformation.
·         The Sequence Generator transformation generates numeric values.
·         Use the Sequence Generator to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.
We use it to generate Surrogate Key in DWH environment mostly. When we want to maintain history, then we need a key other than Primary Key to uniquely identify the record. So we create a Sequence 1,2,3,4 and so on. We use this sequence as the key. Example: If EMPNO is the key, we can keep only one record in target and can’t maintain history. So we use Surrogate key as Primary key and not EMPNO.
Sequence Generator Ports :
The Sequence Generator transformation provides two output ports: NEXTVAL and CURRVAL.
·         We cannot edit or delete these ports.
·         Likewise, we cannot add ports to the transformation.
NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting it to a Transformation or target.
CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.
·         We typically only connect the CURRVAL port when the NEXTVAL port is already connected to a downstream transformation.
·         If we connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row.
·         When we connect the CURRVAL port in a Sequence Generator Transformation, the Integration Service processes one row in each block.
·         We can optimize performance by connecting only the NEXTVAL port in a Mapping.
Points to Ponder:
·         If Current value is 1 and end value 10, no cycle option. There are 17 records in source. In this case session will fail.
·         If we connect just CURR_VAL only, the value will be same for all records.
·         If Current value is 1 and end value 10, cycle option there. Start value is 0.
·         There are 17 records in source. Sequence: 1 2 – 10. 0 1 2 3 –
·         To make above sequence as 1-10 1-20, give Start Value as 1. Start value is used along with Cycle option only.
·         If Current value is 1 and end value 10, cycle option there. Start value is 1.
·         There are 17 records in source. Session runs. 1-10 1-7. 7 will be saved in repository. If we run session again, sequence will start from 8.
·         Use reset option if you want to start sequence from CURR_VAL every time.


·         Connected and Active Transformation
·         The Aggregator transformation allows us to perform aggregate calculations, such as averages and sums.
·         Aggregator transformation allows us to perform calculations on groups.
Components of the Aggregator Transformation
1.    Aggregate expression
2.    Group by port
3.    Sorted Input
4.    Aggregate cache
1) Aggregate Expressions
·         Entered in an output port.
·         Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate functions:
·         AVG, COUNT, MAX, MIN, SUM
·         FIRST, LAST
·         MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))
Nested Aggregate Functions
·         In Aggregator transformation, there can be multiple single level functions or multiple nested functions.
·         An Aggregator transformation cannot have both types of functions together.
·         MAX( COUNT( ITEM )) is correct.
·         MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one aggregate function nested within another aggregate function
Conditional Clauses
We can use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
·         SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate expression.
·         IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
2) Group By Ports
·         Indicates how to create groups.
·         When grouping data, the Aggregator transformation outputs the last row of each group unless otherwise specified.
The Aggregator transformation allows us to define groups for aggregations, rather than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
·         In Aggregator Transformation, Open Ports tab and select Group By as needed.
3) Using Sorted Input
·         Use to improve session performance.
·         To use sorted input, we must pass data to the Aggregator transformation sorted by group by port, in ascending or descending order.
·         When we use this option, we tell Aggregator that data coming to it is already sorted.
·         We check the Sorted Input Option in Properties Tab of the transformation.
·         If the option is checked but we are not passing sorted data to the transformation, then the session fails.
4) Aggregator Caches
·         The Power Center Server stores data in the aggregate cache until it completes Aggregate calculations.
·         It stores group values in an index cache and row data in the data cache. If the Power Center Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. We do not need to configure cache memory for Aggregator transformations that use sorted ports.