100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4,6 TrustPilot
logo-home
Summary

Summary Methodology for Premasters DSS, 800884-B-6

Rating
-
Sold
-
Pages
46
Uploaded on
21-06-2022
Written in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examplesof the coure "Methodology for Premasters DSS" at Tilburg University which is part of the Pre-Master Data Science and Society. The course was given by B. Nicenboim and G. Saygili in the first semester of the academic year 2021/2022 (August to December 2021).

Show more Read less
Institution
Course












Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 21, 2022
Number of pages
46
Written in
2021/2022
Type
Summary

Subjects

Content preview

Tilburg University
Study Program: Pre-Master Data Science and Society
Academic Year 2021/2022, Semester 1 (August to December 2021)


Course: Methodology for Premasters DSS, 800884-B-6
Lecturers: B. Nicenboim and G. Saygili

,2. The data science process


Data Science Introduction
• Data Science is the art of turning data into actions
• the terminology exists since 1960s
• the term data science as the current understanding was introduced in 1990s in
statistics and data mining communities
• It was first named as an independent discipline in 2001
• data consists of
o 1) structured data → traditional relational database management system
o 2) unstructured data
o 3) semi structured data


Data Science in Organizations
• organizations use data to maintain competitiveness
o increase of data usability by 10 % leads to
→ 17 - 49 % increase in their productivity
→ 11 - 42 % return on assets
→ 5 - 6 % performance improvement via data driven decision making
• Big Data Opportunities
o Top 3: Increasing Operational efficiency (51%), Informing Strategic Direction,
Better Customer Service (27%)
• Four key activities of data science in organizations
o acquire: obtaining needed data
o prepare: preprocessing operations
on the data
o analyze: analyzing and interpret results
o act: taking actions based on results


Data Mining Process
• Data Mining is the process of extracting previously unknown and potentially useful
information from the data using mathematical, statistical and machine learning
methods
• CRISP-DM: Cross-Industry Standard Process for Data Mining (late 1990s)
o guideline for a structured approach to execute a data mining process
o six phases (the whole process can restart several times)

, o updated versions of CRISP-DM based on new demands
▪ IBM: ASUM-DM (Analytics
Solutions Unified Method
for Data Mining)
▪ SAS: SEMMA (Sampling,
Exploring, Modifying,
Modeling, Assessing)
▪ Microsoft: TDSP (Team
Data Science Process)
• Phases of CRISP-DM
o 1) Business Understanding
▪ Understanding the
business goal
▪ Situation assessment
▪ Translating the business goal to a data mining objective
▪ Development of a project plan
o 2) Data Understanding
▪ Considering data requirements
▪ Initial data collection, exploration, and quality assessment
o 3) Data Preparation
▪ Selection of required data
close
▪ Data cleaning
dependency
▪ Data transformation and enrichment
o 4) Modeling
▪ Selection of the appropriate modeling technique
▪ Training and test set creation for evaluation
▪ Development and examination of alternative modeling algorithms
▪ Fine tuning the model parameters
o 5) Model Evaluation
▪ Evaluation of the model in the context of the business success criteria
▪ Model approval
o 6) Deployment
▪ Reporting of the findings
▪ Planning and development of deployment procedure
▪ Deployment of the model
▪ Development of a maintenance or update plan
▪ Review of the project and planning the next steps

,• Team Data Science Process (TDSP)
o TDSP is Microsoft’s new version of CRISP-DM
o Key components:
▪ A data science lifecycle definition
▪ A standardized project structure
▪ Infrastructure and resources recommended for data science projects
▪ Tools and utilities recommended for project execution.




o TDSP Infrastructure and Resources for Data Science Projects: TDSP provides
recommendations for managing shared analytics and storage infrastructure
such as:
▪ Cloud file systems for storing datasets
▪ Databases
▪ Big Data (SQL or Spark) clusters
▪ Machine learning service
o TDSP provides recommendations for R and Python
• Data Science Trajectory / Process
o Raw Data → Process Data → Clean Data (suppress noise, add missing data)
o from exploratory data analysis you can go back to get more data or proceed
o data product can be used in real world

, • A Data Scientist’s Role in This Process
o Initial step: What data is needed?
o Data can come from different fields
o raw data can be from different type of data
o process and clean data to suppress noise and discard outliers
o data scientists formulate a hypothesis and research question which should be
answered within a study → we need to know where we are going to




Big Data
• big data combines different sources of data (e.g., social media, transactions,
enterprise data…)
• Big data can be defined as a collection of diverse and large amounts of data that is
hard to process with conventional data processing platforms.
→ Doug Laney’s explanation: big data = 3 V’s (volume, velocity, variety) plus value
→ big data = high volume, high velocity, and high variety of information
o “Data are becoming the new raw material of business: Economic input is
almost equivalent to capital and labor” (Economist, 2010)
o “Information will be the “21th Century Oil”” (Gartner company, 2019)
R136,74
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Document also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
hannahgruber Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
102
Member since
3 year
Number of followers
63
Documents
9
Last sold
2 weeks ago

4,3

8 reviews

5
5
4
1
3
1
2
1
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions