Learn Develop Data Engineering: Datawarehouse & BI

Showing posts with label Datawarehouse & BI. Show all posts

Wednesday, February 24, 2016

Create a Deployment Group in informatica Power Center

Step 1. Create a label

In Informatica Power Repository Manager
1. Click on Versioning a Labels a New
2. Choose a name like INITXXXXXXX (depending on which INIT this label is for) and write a comment like “Due to Release 10.1”.

Step 2. Applying Labels to Objects

In Informatica Power Center Designer.
1. Choose the object that you want to apply a label on. For example a mapping like m_LoadSalesDetail.
2. Right click on the mapping a Versioning a View History
3. Choose the mapping.
4. Click on Tools a Labels a Apply Labels
5. If there is a totally new mapping/object you should choose both “Label all Children” and “Label all Parents” and if it’s only an update of a current mapping you only choose “Label all Children”.

Step 3. Create a Deployment Group

You can create two types of deployment groups; Static and Dynamic.
Static – You populate a static deployment group by manually selecting objects. Create a static deployment group if you do not expect the set of deployment objects to change. For example, you might group objects for deployment on a certain date and deploy all objects at once.
Dynamic - You use the result set from an object query to populate the deployment group. Create a dynamic deployment group if you expect the set of deployment objects to change frequently. For example, you can use a dynamic deployment group if you develop multiple objects to deploy on different schedules. You can run the dynamic deployment group query multiple times and add new objects to the group each time you run the query.
Most commonly you choose to create a Static Deployment group.

To create a deployment group:
In the Repository Manager, choose the first mapping.
1. Click Versioning a Deployment a Groups to view the existing deployment groups in the Deployment Group Browser.
2. Click New to configure the deployment group in the Deployment Group Editor.
3. Enter a name for the deployment group like INITXXXXXXX
4. Select whether to create a static or dynamic deployment group.
5. If you are creating a dynamic deployment group, click Queries to select a query from the Query Browser, and then click Close to return to the Deployment Group Editor.
6. Optionally, enter a comment for the deployment group for ex. Due to Release 10.1
7. Click OK.
After you create a deployment group, it appears in the Deployment Groups node in the Navigator of the Repository.

Step 4. Add objects to the deployment group.

In the Repository Manager
1. Right-click an object you want to add to the deployment group.
2. Click Versioning a View History a Tools a Add to Deployment group
3. Choose the deployment group that you want to add the object to.
4. Loop this step until you have moved all objects that you want to have in the deployment group.
5. If this was done for a release, don’t forget to update the release notes with the path and name of the deployment group that you have created.

Deployment Group Tasks

You can complete the following tasks when you work with deployment groups:
·         Create a deployment group. Create a global object for deploying objects from one or more folders.
·         Edit a deployment group. Modify a deployment group. For example, you can convert a static deployment group to a dynamic group, or you can convert a dynamic deployment group to a static group.
·         Configure privileges and permissions for a deployment group.Configure permissions on a deployment group and the privilege to copy a deployment group.
·         View the objects in a static or dynamic deployment group. Preview the objects that the Repository Service will deploy.
·         Add or remove objects in a static deployment group. Specify the objects that belong to a static deployment group.
·         Associate a query with a dynamic deployment group. Assign a query to a deployment to dynamically update the objects that the group contains.
·         View the history of a deployment group. View the history of a deployment group, including the source and target repositories, deployment date, and user who ran the deployment.
·         Post-deployment validation. Validate the objects in the target repository after you copy a deployment group to verify that the objects and dependent objects are valid.
·         Roll back a deployment group. Roll back a deployment group to purge deployed versions of objects from the target repository.

Configuring Privileges and Permissions for a Deployment Group

Configure object permissions when you create, edit, delete, or copy a deployment group. To limit the privilege to perform deployment group operations but provide the privilege to copy a deployment group without write permission on target folders, assign the Execute Deployment Groups privilege. An administrator can assign the Execute Deployment Groups privilege. You must have read permission on source folders and execute permission on the deployment group to copy the deployment group.

Adding or Removing Objects in Static Deployment Groups

You manually add or delete objects from a static deployment group. You can add checked-in objects to a static deployment group from the Repository Manager. You cannot add checked-out objects to a deployment group. You can add objects to a deployment group when you view the results of an object query or view the results of an object history query from the Repository Manager. To add objects from the Query Results or View History window, click Tools > Add to deployment group.

In the Repository Manager, right-click an object in the Navigator or in a detail window, and click Versioning > View History. In the View History window, click Tools > Add to deployment group.

To add several objects to a deployment group, select the objects in the Navigator and drag them into the deployment group. When you select a static deployment group in the Navigator, the Main window displays the objects within the deployment group.

When you add objects to a static deployment group, you can also add dependent objects to the deployment group.You can specify the following conditions to add dependencies:
·         All dependencies. Select to deploy all dependent objects.
·         Non-reusable. Select to deploy non-reusable dependent objects.
·         No dependencies. Select to skip deploying dependent objects.

When you click All Dependencies, you add all dependent objects to the static deployment group. Dependent objects include dependent objects within a workflow or mapping, original objects that shortcuts reference, and primary key sources where there is a primary-key/foreign-key relationship.

To have the Repository Manager use the recommended setting without prompting you, select the option to prevent the dialog box from appearing again. Alternatively, click Tools > Options, and clear Prompt User While Adding to Deployment Group.

Using Queries in Dynamic Deployment Groups

When you associate an object query with a deployment group, the Repository Service runs the query at the time of deployment. You can associate an object query with a deployment group when you edit or create a deployment group.

To deploy composite objects using a dynamic deployment group, you must deploy all components of the composite object the first time you deploy the deployment group to another repository. For example, if you deploy a mapping, you must also deploy the reusable and non-reusable child objects associated with the mapping. To do Deployment Group Tasks 81 this, you must create a query that returns parent objects and their dependent child objects. A common way to group versioned objects for deployment is to use labels to identify the objects you want to deploy.

To find the latest versions of objects in a dynamic deployment group, you must create all mappings in the group with labels. If the dynamic deployment group contains a non-reusable object in an unlabeled mapping, the group will not deploy.

When you use labels to identify versioned objects for a dynamic deployment group, the labels for parent and dependent child objects can become out of sync. When this occurs, queries may return part of a composite object, and the dynamic deployment fails. This can occur in the following cases:
· You apply a label to a parent object, but do not label the dependent child objects. When you apply a label to a parent object, the label does not apply to child objects. For example, you apply label 1 to mapping 1 without labeling the dependent child objects. Later, you run a dynamic deployment group using a query that searches for objects in a specified folder that use label 1. The query returns the parent object but not the child objects. The deployment fails because you attempted to deploy only the parent for a composite object. To ensure that dynamic deployment queries return these child objects, manually apply the label to dependent objects each time you apply a label or move a label to a different version of the parent object.

· You do not apply a specified label to the same version of the parent and child object. By default, object queries return the latest versions of objects. For example, you apply label 1 to version 1 of a child object and apply label 1 to version 2 of the parent object. In the query, you search for objects that use label 1 and reusable and non-reusable objects. The query returns the parent object but not the child objects because the most recent versions of the child objects do not have the label applied. To ensure that dynamic deployment queries return both parent and child objects when you apply a specified label to different versions of parent and child objects, include a Latest Status parameter in the query and specify the latest checked-in and older values.
· The dynamic deployment query does not return non-reusable child objects with parent objects. To ensure that the dynamic query returns reusable and non-reusable child objects, include the Reusable Status parameter in the query and specify reusable and non-reusable values. In addition, include a Latest Status parameter in the query and specify the latest checked-in and older values.

Viewing Deployment History

You can view the following information about groups you have deployed:
·         Date/time. The date and time you deployed the group.
·         User name. The user name of the person who deployed the group.
·         Deployment group name. The name of the deployment group.
·         Source repository. The repository you deployed the group from.
·         Target repository. The repository where you deployed the group.
·         Status. The status of the group as either deployed or not deployed.
·         Rollback time. The date and time the deployment group was rolled back.

To view the history of a deployment group:

1. Click Tools > Deployment > Groups to open the Deployment Group Browser.

2. Select a deployment group.

3. Click View History to view the history of the deployment group.

4. Optionally, click Details to view details about the objects in the deployment group.

5. Click OK to close the Deployment Group History window.

Validating the Target Repository

Validate the objects in the target repository after you copy a deployment group to verify that the objects or dependent objects are valid. You can also use the pmrep Validate command or the Repository Manager to validate the objects.

You can view the validation results in the deployment log. In the Repository Manager, the deployment log appears in the Output window.
Note: Validating objects in the target repository can take a long time.

Rolling Back a Deployment

You can roll back a deployment to purge the deployed versions from the target repository or folder. When you roll back a deployment, you roll back all the objects in a deployment group that you deployed at a specific date and time. You cannot roll back part of a deployment or roll back from a non-versioned repository.

To initiate a rollback, you must roll back the latest version of each object. The Repository Service ensures that the check-in time for the repository objects is the same as the deploy time. If the check-in time is different, then the repository object is not the same as the object in the deployment, and the rollback fails. The rollback also fails if the rollback process causes you to create duplicate object names. This might occur if you rename a deployed object, create a new object with the same name, and attempt to roll back the original deployment.

To roll back a deployment:

1. In the Repository Manager, connect to the target repository where you deployed the objects.

2. Click Tools > Deployment > History.

3. Select a deployment group in the Deployment Group History Browser, and click View History.

4. Select a deployment to roll back.

5. Click Rollback.

The Repository Service checks the object versions in the deployment against the objects in the target repository or folder, and the rollback either succeeds or fails. The rollback results appear at the end of processing. If the rollback fails, the Repository Service notifies you of the object that caused the failure.

Creating and Editing Deployment Groups

You can create the following types of deployment groups:
· Static – You populate a static deployment group by manually selecting objects. Create a static deployment group if you do not expect the set of deployment objects to change. For example, you might group objects for deployment on a certain date and deploy all objects at once.
· Dynamic - You use the result set from an object query to populate the deployment group. Create a dynamic deployment group if you expect the set of deployment objects to change frequently. For example, you can use a dynamic deployment group if you develop multiple objects to deploy on different schedules. You can run the dynamic deployment group query multiple times and add new objects to the group each time you run the query.

You can edit a deployment group to convert it into another deployment group type. You can view the objects in the deployment group before you copy a deployment group.

Creating a Deployment Group

You use the Deployment Group Editor to create and edit deployment groups.

To create a deployment group:

1. In the Repository Manager, click Tools > Deployment > Groups to view the existing deployment groups in the Deployment Group Browser.

2. Click New to configure the deployment group in the Deployment Group Editor.

3. Enter a name for the deployment group.

4. Select whether to create a static or dynamic deployment group.

5. If you are creating a dynamic deployment group, click Queries to select a query from the Query Browser, and then click Close to return to the Deployment Group Editor.

6. Optionally, enter a comment for the deployment group.

7. Click OK.

After you create a deployment group, it appears in the Deployment Groups node in the Navigator of the Repository Manager.

After you create a static deployment group, you can add objects to it.

Editing a Deployment Group

You edit a deployment group to convert a static deployment group into a dynamic deployment group, to convert a dynamic deployment group into a static group, or to associate a different query with a dynamic deployment group.

To edit a deployment group:

1. In the Repository Manager, click Tools > Deployment > Groups.

2. In the Deployment Group Browser, select the deployment group, and click Edit.

3. In the Deployment Group Editor, configure the static or dynamic deployment group.

4. Click OK.

Viewing the Objects in a Deployment Group

Before you deploy a static or dynamic deployment group, you can preview the objects that will be deployed.

To view the objects in a deployment group:

1. In the Repository Manager, click Tools > Deployment > Groups.

2. In the Deployment Group Browser, select the deployment group, and click View Group.

For a static deployment group, the deployment group objects appear in the Deployment Group Contents window. For a dynamic deployment group, the deployment group objects appear in the Query Results window.

Friday, September 11, 2015

Loading Dimension and Fact Tables

In Data warehousing dimension table are not related to each other in star schema because all dimension tables are de-normalized but on the other hand in snow flake schema dimension table are related to each other till some extent.

First we load dimension table one by one then fact table.

While loading fact table, dimension surrogate key as foreign key in fact table and dimension table are lookup’ed while loading fact table. We can load dimension table directly no need to use fact table while loading dimension table.

We can load fact and dimension table in one mapping using target load plan or in one session using event wait or using decision task.

While loading fact table you will have to lookup on dimension table to get the relevant attributes.

As Dimension table in star schema are not related to each other it is not common to see them being loaded simultaneously. In such cases workflow would have to start task followed by dimension table mapping in parallel and then decision take place which checks if all these dimension are loaded without errors. If there are no errors then we can proceed to load fact table.

If we are doing SCD-2 the following above step would be helpful to improve performance.

Thursday, September 10, 2015

Data Modeling

Data Model

A collection of concepts that can be used to describe structure of database and in other words representation of set of business requirement in standard framework which is understandable to biz users.

Data Model can be defined as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization.

A data model comprises of three components:

• A structural part, consisting of a set of rules according to which databases can be constructed.

• A manipulative part, defining the types of operation that are allowed on the data (this includes the operations that are used for updating or retrieving data from the database and for changing the structure of the database).

• Possibly a set of integrity rules, which ensures that the data is accurate.

In Data modeling some points we need to remember while design data model

1. Identify Entities: It should be classified by its properties and characteristics and business definition for that entity.

2. Identify Attributes: Charteristics and properties of entities, name should be unique and self-explanatory.

3. Identify Relationship: We have to identify relation between entities and use of attributes for this.

4. Naming Convention: Name should be short avoid using special character as well as common used terms , follow standards for all entities and attributes.

5. Defining Keys to attributes.

Three categories of Data model

1. Object Based Data Models: Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct object (a person, place, concept, and event) in the organization that is to be represented in the database. An attribute is a property that describes some aspect of the object that we wish to record, and a relationship is an association between entities.

Types of object based data models

• Entity-Relationship

• Object Oriented

• Semantic

• Functional

The Entity-Relationship model has emerged as one of the main techniques for modeling database design and forms the basis for the database design methodology.

The object oriented data model extends the definition of an entity to include, not only the attributes that describe the state of the object but also the actions that are associated with the object, that is, its behavior. The object is said to encapsulate both state and behavior.

Entities in semantic systems represent the equivalent of a record in a relational system or an object in an system but they do not include behavior (methods). They are abstractions 'used to represent real world (e.g. customer) or conceptual (e.g. bank account) objects.

The functional data model is now almost twenty years old. The original idea was to' view the database as a collection of extension ally defined functions and to use a functional language for querying the database.

2. Physical Data Models:

Physical data models describe how data is stored in the computer, representing information such as record structures, record ordering, and access paths. There are not as many physical data models as logical data models, the most common one being the Unifying Model.

3. Record Based Logical Models:

Record based logical models are used in describing data at the logical and view levels. In contrast to object based data models, they are used to specify the overall logical structure of the database and to provide a higher-level description of the implementation. Record based models are so named because the database is structured in fixed format records of several types. Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length.

The three most widely accepted record based data models are:

• Hierarchical Model

• Network Model

• Relational Model

The relational model has gained favor over the other two in recent years. The network and hierarchical models are still used in a large number of older databases.

Conceptual Data modeling

Conceptual data modeling is a map of concept and their relationship. This describes semantic of organization i.e. how organization doing business what are all components involve in this organization.

What are all information are required in terms of decision making for biz associates here it is considered in conceptual data modeling, in this we need to design solution.

Component of Conceptual Data modeling

1. Entity or Objects for example Product, customer, services.

2. Relationship between entities.

3. Identifiers which will distinguish the entity instances.

4. Various attributes required to maintain entity.

ER Model: ER Model is Conceptual Data model Here we need to describe data as entities, relationship and attributes.

Types of Attributes	Description	Example
Single	Not divisible	Aadhar no., Age, Sex
Composite	Divisible into small parts	Name(Fname, MName, LName)
Single-Valued	Single Value for entity	Age
Multi-Valued	Multiple value for entity	Academic marks of person
Derived	Contain calculated value	No of customer per region
Complex	Composite and multi value	Shipping address

Enhanced ER Modeling:

In this along with ER model we have additional concepts

· Subclass and super-class

· Top to Bottom

· Bottom to Top

· Category

· Attribute and relationship inheritance

Logical Data Modeling

Logical data modeling refers to actual implementation of conceptual data model in database. Logical data model is version of data model that represent the business requirement of an organization.

As and when conceptual data model is approved by design architect and functional team, development team start designing logical data model. A good logical data model created for future aspects and business requirement. Logical data model streamlined data structure and relationship between entities and attributes.

Logical data model includes all required entities, attributes, key groups and relationship that represents business information and business rules.

Logical data model characteristics

· Logical model works in iterative manner.

· Design is independent of database

· All Entities and relationship among them

· All attributes for each entity are specified

· Primary Key for each entity is specified

· Foreign key between entities specified

Dimension Data Model:

Dimension data modeling comprises of one or more dimension table and fact table.

Modeling techniques are nothing but way of storing data, dimension data model give us advantage to retrieve data fast.

In dimensional model, everything is divided in 2 distinct categories - dimension or measures. Anything we try to model, must fit in one of these two categories.

In Dimension data model some benefits

Faster Data Retrieval
Better Understand ability
Extensible

Step by step to design dimension data model

1. Identify Dimension: Dimensions are the object or context. That is - dimensions are the 'things' about which something is being spoken.

2. Identify Measures: Measures are the quantifiable subjects and these are often numeric in nature or aggregate value.

3. Identify attributes or properties of dimensions: In real time we have many attributes in one context but we have to select relevant attributes.

4. Identify the granularity of measures: Granularity refers to depth of information stored in data model, how granular is data stored in system

5. History stored: We need to make sure relevant data is getting stored and history is maintained and by using slowly changing dimension we can achieve this.

Physical Data Modeling

Physical data modeling is set of process to develop data structure for data warehousing in selected database.

Below are important points while developing physical data modeling

1. Convert Entities into Physical Tables

2. Convert Relationship to foreign keys

3. Convert Attributes into physical columns

4. Convert Unique Identifiers into define constraint on physical table-columns.

5. Creating Index, Views (materialized view), dimensions in table space.

6. Maintain datatype based on logical data model and conceptual data model.

Apart from these some more points we need to consider

How to insert, update and retrieve data into physical tables as well as how to store non-structured data into tables and Performance tuning, memory utilization.

Pages