Wednesday, November 26, 2014

DATA WAREHOUSE CONCEPTS

What is BI: Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages methods and technologies that focus on counts, statistics and business objectives to improve business performance.
The objective of Business Intelligence is to better understand customers and improve customer service, make the supply and distribution chain more efficient, and to identify and address business problems and opportunities quickly.
Warehouse is used for high level data analysis purpose. It is used for predictions, time series analysis, financial analysis, what -if simulations etc. Basically it is used for better decision making.

DATA WAREHOUSE CONCEPTS:
Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. This data warehouse supports analytical reporting, structured and/or ad hoc queries and decision making. Data Warehousing involves data cleaning, data integration and data consolidations.

DATA WAREHOUSE DEFINITIONS:
Data warehouse is the de-normalized structure of database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis.
Data warehouse is a "Subject-Oriented, Integrated, Time- Variant, Nonvolatile collection of data in support of decision making".
In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/ department level.

Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. For example, the SRDB is not simply made accessible to end-users, but is provided structure and organized according to the specific needs.

Integrated: A single source of information for and about understanding multiple areas of interest. The
data warehouse provides one-stop shopping and contains information about a variety of subjects. Thus the OIRAP data warehouse has information on students, faculty and staff, instructional workload, and student outcomes.

Non-Volatile: Stable information that doesn’t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed.
Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse.

Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users.

Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

Other Definitions
Data Warehouse: A data structure that is optimized for distribution. It collects and stores integrated sets of historical data from multiple operational systems and feeds them to one or more data marts. It may also provide end-user access to support enterprise views of data.

Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers.

Staging Area: Any data store that is designed primarily to receive data into a warehousing environment.

Operational Data Store: A collection of data that addresses operational needs of various operational units. It is not a component of a data warehousing architecture, but a solution to operational needs.

OLAP (On-Line Analytical Processing): A method by which multidimensional analysis occurs. Multidimensional Analysis: The ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and understanding of the underlying data. It is also sometimes referred to as “drilling-down”, “drilling-across” and “slicing and dicing”

Hypercube: A means of visually representing multidimensional data.

Star Schema: A means of aggregating data based on a set of known dimensions. It stores data multidimensionally in a two dimensional Relational Database Management System (RDBMS), such as Oracle.

Snowflake Schema: An extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment.

Multidimensional Database: Also known as MDDB or MDDBS. A class of proprietary, non-relational database management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systems.

OLAP Tools: A set of software products that attempt to facilitate multidimensional analysis. Can incorporate data acquisition, data access, data manipulation, or any combination thereof.

COMPARISON OF DATA WAREHOUSE AND OPERATIONAL DATA 

HOW IS THE WAREHOUSE DIFFERENT?

The data warehouse is distinctly different from the operational data used and maintained by day-to-day operational systems. Data warehousing is not simply an “access wrapper” for operational data, where data is simply “dumped” into tables for direct access. Among the differences: