·
Active and connected transformation
The Rank transformation allows us to select only the top or
bottom rank of data. It allows us to select a group of top or bottom values,
not just one value.
During the session, the Power Center Server caches input data
until it can perform the rank calculations.
Rank Transformation Properties :
·
Cache Directory where cache will be made.
·
Top/Bottom Rank as per need
·
Number of Ranks Ex: 1, 2 or any number
·
Case Sensitive Comparison can be checked if needed
·
Rank Data Cache Size can be set
·
Rank Index Cache Size can be set
Rank Index
The Designer automatically creates a RANKINDEX port for each
Rank transformation. The Power Center Server uses the Rank Index port to store
the ranking position for each row in a group.
For example, if we create a Rank transformation that
ranks the top five salaried employees, the rank index numbers the employees
from 1 to 5.
·
The RANKINDEX is an output port only.
·
We can pass the rank index to another transformation in the
mapping or directly to a target.
·
We cannot delete or edit it.
Defining Groups
Rank transformation allows us to group information. For example:
If we want to select the top 3 salaried employees of each Department, we can
define a group for Department.
·
By defining groups, we create one set of ranked rows for each
group.
·
We define a group in Ports tab. Click the Group By for needed
port.
·
We cannot Group By on port which is also Rank Port.
·
Passive and Connected Transformation.
·
The Sequence Generator transformation generates numeric values.
·
Use the Sequence Generator to create unique primary key values,
replace missing primary keys, or cycle through a sequential range of numbers.
We use it to generate Surrogate Key in DWH environment mostly.
When we want to maintain history, then we need a key other than Primary Key to
uniquely identify the record. So we create a Sequence 1,2,3,4 and so on. We use
this sequence as the key. Example: If EMPNO is the key, we can keep only one
record in target and can’t maintain history. So we use Surrogate key as Primary
key and not EMPNO.
Sequence Generator Ports :
The Sequence Generator transformation provides two output ports:
NEXTVAL and CURRVAL.
·
We cannot edit or delete these ports.
·
Likewise, we cannot add ports to the transformation.
NEXTVAL:
Use the NEXTVAL port to generate sequence numbers by connecting
it to a Transformation or target.
CURRVAL:
CURRVAL is NEXTVAL plus the Increment By value.
·
We typically only connect the CURRVAL port when the NEXTVAL port
is already connected to a downstream transformation.
·
If we connect the CURRVAL port without connecting the NEXTVAL
port, the Integration Service passes a constant value for each row.
·
When we connect the CURRVAL port in a Sequence Generator
Transformation, the Integration Service processes one row in each block.
·
We can optimize performance by connecting only the NEXTVAL port
in a Mapping.
Points to Ponder:
·
If Current value is 1 and end value 10, no cycle option. There
are 17 records in source. In this case session will fail.
·
If we connect just CURR_VAL only, the value will be same for all
records.
·
If Current value is 1 and end value 10, cycle option there.
Start value is 0.
·
There are 17 records in source. Sequence: 1 2 – 10. 0 1 2 3 –
·
To make above sequence as 1-10 1-20, give Start Value as 1.
Start value is used along with Cycle option only.
·
If Current value is 1 and end value 10, cycle option there.
Start value is 1.
·
There are 17 records in source. Session runs. 1-10 1-7. 7 will
be saved in repository. If we run session again, sequence will start from 8.
·
Use reset option if you want to start sequence from CURR_VAL
every time.
·
Connected and Active Transformation
·
The Aggregator transformation allows us to perform aggregate
calculations, such as averages and sums.
·
Aggregator transformation allows us to perform calculations on
groups.
Components of the Aggregator Transformation
1. Aggregate
expression
2. Group
by port
3. Sorted
Input
4. Aggregate
cache
1) Aggregate Expressions
·
Entered in an output port.
·
Can include non-aggregate expressions and conditional clauses.
The transformation language includes the following aggregate
functions:
·
AVG, COUNT, MAX, MIN, SUM
·
FIRST, LAST
·
MEDIAN, PERCENTILE, STDDEV, VARIANCE
Single Level Aggregate Function: MAX(SAL)
Nested Aggregate Function: MAX( COUNT( ITEM ))
Nested Aggregate Functions
·
In Aggregator transformation, there can be multiple single level
functions or multiple nested functions.
·
An Aggregator transformation cannot have both types of functions
together.
·
MAX( COUNT( ITEM )) is correct.
·
MIN(MAX( COUNT( ITEM ))) is not correct. It can also include one
aggregate function nested within another aggregate function
Conditional Clauses
We can use conditional clauses in the aggregate expression to
reduce the number of rows used in the aggregation. The conditional clause can
be any clause that evaluates to TRUE or FALSE.
·
SUM( COMMISSION, COMMISSION > QUOTA )
Non-Aggregate Functions
We can also use non-aggregate functions in the aggregate
expression.
·
IIF( MAX( QUANTITY ) > 0, MAX( QUANTITY ), 0))
2) Group By Ports
·
Indicates how to create groups.
·
When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.
The Aggregator transformation allows us to define groups for
aggregations, rather than performing the aggregation across all input data.
For example, we can find Maximum Salary for every Department.
·
In Aggregator Transformation, Open Ports tab and select Group By
as needed.
3) Using Sorted Input
·
Use to improve session performance.
·
To use sorted input, we must pass data to the Aggregator
transformation sorted by group by port, in ascending or descending order.
·
When we use this option, we tell Aggregator that data coming to
it is already sorted.
·
We check the Sorted Input Option in Properties Tab of the
transformation.
·
If the option is checked but we are not passing sorted data to
the transformation, then the session fails.
4) Aggregator Caches
·
The Power Center Server stores data in the aggregate cache until
it completes Aggregate calculations.
·
It stores group values in an index cache and row data in the
data cache. If the Power Center Server requires more space, it stores overflow
values in cache files.
Note: The Power Center Server uses memory to process an
Aggregator transformation with sorted ports. It does not use cache memory. We
do not need to configure cache memory for Aggregator transformations that use
sorted ports.
No comments:
Post a Comment