100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Data Warehousing: Part 1

Rating
-
Sold
-
Pages
7
Uploaded on
19-02-2019
Written in
2018/2019

Business Information Management - Data Management. First part of Module 2 Data Warehousing. This document contains all the slides, notes from 2 students and extra information collected on Internet. I had a 15 out 20 for this subject.

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
February 19, 2019
Number of pages
7
Written in
2018/2019
Type
Summary

Subjects

Content preview

Data Warehousing 1
Module 2: Data Warehousing

1 Definition
A data warehouse (DW) is simply a single, complete, and consistent store of data obtained
from a variety of sources and made available to end users in a way they can understand and
use it in a business context

Notes:
We have seen operational databasesNow we will see data warehousing
Operational databases - the structure was built to avoid redundancy in order to easily
manage day to day transactions that you support all activities that need to be done in real
time. These can be complex.

Data warehousing has a diferent perspective- goal is to have a structure that is as simple as
possible. Goal is to make queries in order to retrieve info out of all of the data that we have.
We are building the data warehouse for bi. We are not interested in supporting day to day
activities, we want data about production, sales, etc. ,n order to retrieve the data, we have a
special kind of structure that is called the star diagram or the snowfake diagram. ,t has been
known for years. ,t is implemented into a db using a relational diagram. The structure is
much simpler though. A lot of profs like to use the star schema which is the simplest
structure. You have at most 2 links between two tables.
The idea is to centralize all of the data we need for decision making. THis is why a dw is built
out of diferent data sources. Our main concern is to be sure that we are having everything
we need to make decisions just form the dw. ,t is a single point that must be complete and
consistent. The data should be consistent, up to date,e tc.
You define the subject that are important for your analysis for ex. Sales or products
You need to select the right transaction
,t’s not real-time data, but make analysis about events that happenfor ex. ,f you have
data from last 6 months and adding the info from yesterday not will change everything.
You update DW specific time, once a week, once a month, etc.
The star schema gets its name from the physical model's[3] resemblance to a star shape
with a fact table at its center and the dimension tables surrounding it representing the star's
points.

,n computing, a snowflae scheal is a logical arrangement of tables in a multidimensional
database such that the entity relationship diagram resembles a snowfake shape. The
snowfake schema is represented by centralized fact tables which are connected to multiple
dimensions.[citation needed]. "Snowfaking" is a method of normalizing the dimension
tables in a star schema. When it is completely normalized along all the dimension tables, the
resultant structure resembles a snowfake with the fact table in the middle. The principle
behind snowfaking is normalization of the dimension tables by removing low cardinality
attributes and forming separate tables.

The snowfake schema is similar to the star schema. However, in the snowfake schema,
dimensions are normalized into multiple related tables, whereas the star schema's

1

,dimensions are denormalized with each dimension represented by a single table. A complex
snowfake shape emerges when the dimensions of a snowfake schema are elaborate,
having multiple levels of relationships, and the child tables have multiple parent tables
("forks in the road").

,n computing, the stlr scheal is the simplest style of data mart schema and is the approach
most widely used to develop data warehouses and dimensional data marts.[1] The star
schema consists of one or more fact tables referencing any number of dimension tables. The
star schema is an important special case of the snowfake schema, and is more efective for
handling simpler queries.

Whlt is l Dltl Wlrehouse? A Prlcttoners Viewpoint
A DW is a
- Subject -oriented
- ,ntegrated
- Time-varying the db in the state that it is right now, it is not exactly the same state
that it will be in 5 minutes, because of the operational usage, in the dw, it is diferent,
you build it into defined moments in time. ,t isn’t real-time data. You make analysis
about business events, sales, etc., but moment to moment data will not change your
analysis.
- Nonvolatile
Collection of data that is used primarily in organizational decision making.

Subject-oriented:
- A data warehouse is organized around subjects such as sales, products, customers
- Focuses on modeling and analysis of data for decision makers
- Excludes data not useful in decision support process.
,t centralized all data for decision making and built out diferent sources
This tool is only relevant for (middle) managers to make decisions




ETL tool
Integrlton




2

, - Data Warehouse is constructed by integrating multiple heterogeneous sources.
- Data Preprocessing is applied to ensure consistency.

Notes:
- The more we integrate the data management, easier to build DWgood ERP system
- Have ETL tools which take the data and converts it to the right place in the dw.

Integrlted


Everything is
put together
on the same
place
The
DW




integrates all the data necessary for analysis


Tiae-vlrying:
- Provides information from historical perspective e.g. past 5-10 years
- Data is collected at diferent times (e.g. every month to update the data warehouse)
- Every key structure contains either implicitly or explicitly an element of time

Note:
Within the dw, every transaction has a reference to a date of time within it.
,f you are collecting data about sales, it is important to know when the sale has been made
because this gives you relevant info about your sales.

Nonvolltle
- Once data is recorded it will not be updated anymore
- Historicll dltl will not be deleted
- Only new data is added
- A data warehouse requires two
operations in data accessing
o ,nitial loading of data (ETL)
o Access of data

Note:



3

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
shafaqsara Katholieke Universiteit Leuven
Follow You need to be logged in order to follow users or courses
Sold
14
Member since
6 year
Number of followers
7
Documents
12
Last sold
3 months ago

4.0

2 reviews

5
0
4
2
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions