Data Warehousing 1
Module 2: Data Warehousing
1 Definition
A data warehouse (DW) is simply a single, complete, and consistent store of data obtained
from a variety of sources and made available to end users in a way they can understand and
use it in a business context
Notes:
We have seen operational databasesNow we will see data warehousing
Operational databases - the structure was built to avoid redundancy in order to easily
manage day to day transactions that you support all activities that need to be done in real
time. These can be complex.
Data warehousing has a diferent perspective- goal is to have a structure that is as simple as
possible. Goal is to make queries in order to retrieve info out of all of the data that we have.
We are building the data warehouse for bi. We are not interested in supporting day to day
activities, we want data about production, sales, etc. ,n order to retrieve the data, we have a
special kind of structure that is called the star diagram or the snowfake diagram. ,t has been
known for years. ,t is implemented into a db using a relational diagram. The structure is
much simpler though. A lot of profs like to use the star schema which is the simplest
structure. You have at most 2 links between two tables.
The idea is to centralize all of the data we need for decision making. THis is why a dw is built
out of diferent data sources. Our main concern is to be sure that we are having everything
we need to make decisions just form the dw. ,t is a single point that must be complete and
consistent. The data should be consistent, up to date,e tc.
You define the subject that are important for your analysis for ex. Sales or products
You need to select the right transaction
,t’s not real-time data, but make analysis about events that happenfor ex. ,f you have
data from last 6 months and adding the info from yesterday not will change everything.
You update DW specific time, once a week, once a month, etc.
The star schema gets its name from the physical model's[3] resemblance to a star shape
with a fact table at its center and the dimension tables surrounding it representing the star's
points.
,n computing, a snowflae scheal is a logical arrangement of tables in a multidimensional
database such that the entity relationship diagram resembles a snowfake shape. The
snowfake schema is represented by centralized fact tables which are connected to multiple
dimensions.[citation needed]. "Snowfaking" is a method of normalizing the dimension
tables in a star schema. When it is completely normalized along all the dimension tables, the
resultant structure resembles a snowfake with the fact table in the middle. The principle
behind snowfaking is normalization of the dimension tables by removing low cardinality
attributes and forming separate tables.
The snowfake schema is similar to the star schema. However, in the snowfake schema,
dimensions are normalized into multiple related tables, whereas the star schema's
1
,dimensions are denormalized with each dimension represented by a single table. A complex
snowfake shape emerges when the dimensions of a snowfake schema are elaborate,
having multiple levels of relationships, and the child tables have multiple parent tables
("forks in the road").
,n computing, the stlr scheal is the simplest style of data mart schema and is the approach
most widely used to develop data warehouses and dimensional data marts.[1] The star
schema consists of one or more fact tables referencing any number of dimension tables. The
star schema is an important special case of the snowfake schema, and is more efective for
handling simpler queries.
Whlt is l Dltl Wlrehouse? A Prlcttoners Viewpoint
A DW is a
- Subject -oriented
- ,ntegrated
- Time-varying the db in the state that it is right now, it is not exactly the same state
that it will be in 5 minutes, because of the operational usage, in the dw, it is diferent,
you build it into defined moments in time. ,t isn’t real-time data. You make analysis
about business events, sales, etc., but moment to moment data will not change your
analysis.
- Nonvolatile
Collection of data that is used primarily in organizational decision making.
Subject-oriented:
- A data warehouse is organized around subjects such as sales, products, customers
- Focuses on modeling and analysis of data for decision makers
- Excludes data not useful in decision support process.
,t centralized all data for decision making and built out diferent sources
This tool is only relevant for (middle) managers to make decisions
ETL tool
Integrlton
2
, - Data Warehouse is constructed by integrating multiple heterogeneous sources.
- Data Preprocessing is applied to ensure consistency.
Notes:
- The more we integrate the data management, easier to build DWgood ERP system
- Have ETL tools which take the data and converts it to the right place in the dw.
Integrlted
Everything is
put together
on the same
place
The
DW
integrates all the data necessary for analysis
Tiae-vlrying:
- Provides information from historical perspective e.g. past 5-10 years
- Data is collected at diferent times (e.g. every month to update the data warehouse)
- Every key structure contains either implicitly or explicitly an element of time
Note:
Within the dw, every transaction has a reference to a date of time within it.
,f you are collecting data about sales, it is important to know when the sale has been made
because this gives you relevant info about your sales.
Nonvolltle
- Once data is recorded it will not be updated anymore
- Historicll dltl will not be deleted
- Only new data is added
- A data warehouse requires two
operations in data accessing
o ,nitial loading of data (ETL)
o Access of data
Note:
3
Module 2: Data Warehousing
1 Definition
A data warehouse (DW) is simply a single, complete, and consistent store of data obtained
from a variety of sources and made available to end users in a way they can understand and
use it in a business context
Notes:
We have seen operational databasesNow we will see data warehousing
Operational databases - the structure was built to avoid redundancy in order to easily
manage day to day transactions that you support all activities that need to be done in real
time. These can be complex.
Data warehousing has a diferent perspective- goal is to have a structure that is as simple as
possible. Goal is to make queries in order to retrieve info out of all of the data that we have.
We are building the data warehouse for bi. We are not interested in supporting day to day
activities, we want data about production, sales, etc. ,n order to retrieve the data, we have a
special kind of structure that is called the star diagram or the snowfake diagram. ,t has been
known for years. ,t is implemented into a db using a relational diagram. The structure is
much simpler though. A lot of profs like to use the star schema which is the simplest
structure. You have at most 2 links between two tables.
The idea is to centralize all of the data we need for decision making. THis is why a dw is built
out of diferent data sources. Our main concern is to be sure that we are having everything
we need to make decisions just form the dw. ,t is a single point that must be complete and
consistent. The data should be consistent, up to date,e tc.
You define the subject that are important for your analysis for ex. Sales or products
You need to select the right transaction
,t’s not real-time data, but make analysis about events that happenfor ex. ,f you have
data from last 6 months and adding the info from yesterday not will change everything.
You update DW specific time, once a week, once a month, etc.
The star schema gets its name from the physical model's[3] resemblance to a star shape
with a fact table at its center and the dimension tables surrounding it representing the star's
points.
,n computing, a snowflae scheal is a logical arrangement of tables in a multidimensional
database such that the entity relationship diagram resembles a snowfake shape. The
snowfake schema is represented by centralized fact tables which are connected to multiple
dimensions.[citation needed]. "Snowfaking" is a method of normalizing the dimension
tables in a star schema. When it is completely normalized along all the dimension tables, the
resultant structure resembles a snowfake with the fact table in the middle. The principle
behind snowfaking is normalization of the dimension tables by removing low cardinality
attributes and forming separate tables.
The snowfake schema is similar to the star schema. However, in the snowfake schema,
dimensions are normalized into multiple related tables, whereas the star schema's
1
,dimensions are denormalized with each dimension represented by a single table. A complex
snowfake shape emerges when the dimensions of a snowfake schema are elaborate,
having multiple levels of relationships, and the child tables have multiple parent tables
("forks in the road").
,n computing, the stlr scheal is the simplest style of data mart schema and is the approach
most widely used to develop data warehouses and dimensional data marts.[1] The star
schema consists of one or more fact tables referencing any number of dimension tables. The
star schema is an important special case of the snowfake schema, and is more efective for
handling simpler queries.
Whlt is l Dltl Wlrehouse? A Prlcttoners Viewpoint
A DW is a
- Subject -oriented
- ,ntegrated
- Time-varying the db in the state that it is right now, it is not exactly the same state
that it will be in 5 minutes, because of the operational usage, in the dw, it is diferent,
you build it into defined moments in time. ,t isn’t real-time data. You make analysis
about business events, sales, etc., but moment to moment data will not change your
analysis.
- Nonvolatile
Collection of data that is used primarily in organizational decision making.
Subject-oriented:
- A data warehouse is organized around subjects such as sales, products, customers
- Focuses on modeling and analysis of data for decision makers
- Excludes data not useful in decision support process.
,t centralized all data for decision making and built out diferent sources
This tool is only relevant for (middle) managers to make decisions
ETL tool
Integrlton
2
, - Data Warehouse is constructed by integrating multiple heterogeneous sources.
- Data Preprocessing is applied to ensure consistency.
Notes:
- The more we integrate the data management, easier to build DWgood ERP system
- Have ETL tools which take the data and converts it to the right place in the dw.
Integrlted
Everything is
put together
on the same
place
The
DW
integrates all the data necessary for analysis
Tiae-vlrying:
- Provides information from historical perspective e.g. past 5-10 years
- Data is collected at diferent times (e.g. every month to update the data warehouse)
- Every key structure contains either implicitly or explicitly an element of time
Note:
Within the dw, every transaction has a reference to a date of time within it.
,f you are collecting data about sales, it is important to know when the sale has been made
because this gives you relevant info about your sales.
Nonvolltle
- Once data is recorded it will not be updated anymore
- Historicll dltl will not be deleted
- Only new data is added
- A data warehouse requires two
operations in data accessing
o ,nitial loading of data (ETL)
o Access of data
Note:
3