100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Summary Methodology for Premasters DSS, 800884-B-6

Beoordeling
-
Verkocht
-
Pagina's
46
Geüpload op
21-06-2022
Geschreven in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examplesof the coure "Methodology for Premasters DSS" at Tilburg University which is part of the Pre-Master Data Science and Society. The course was given by B. Nicenboim and G. Saygili in the first semester of the academic year 2021/2022 (August to December 2021).

Meer zien Lees minder
Instelling
Vak












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
21 juni 2022
Aantal pagina's
46
Geschreven in
2021/2022
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Tilburg University
Study Program: Pre-Master Data Science and Society
Academic Year 2021/2022, Semester 1 (August to December 2021)


Course: Methodology for Premasters DSS, 800884-B-6
Lecturers: B. Nicenboim and G. Saygili

,2. The data science process


Data Science Introduction
• Data Science is the art of turning data into actions
• the terminology exists since 1960s
• the term data science as the current understanding was introduced in 1990s in
statistics and data mining communities
• It was first named as an independent discipline in 2001
• data consists of
o 1) structured data → traditional relational database management system
o 2) unstructured data
o 3) semi structured data


Data Science in Organizations
• organizations use data to maintain competitiveness
o increase of data usability by 10 % leads to
→ 17 - 49 % increase in their productivity
→ 11 - 42 % return on assets
→ 5 - 6 % performance improvement via data driven decision making
• Big Data Opportunities
o Top 3: Increasing Operational efficiency (51%), Informing Strategic Direction,
Better Customer Service (27%)
• Four key activities of data science in organizations
o acquire: obtaining needed data
o prepare: preprocessing operations
on the data
o analyze: analyzing and interpret results
o act: taking actions based on results


Data Mining Process
• Data Mining is the process of extracting previously unknown and potentially useful
information from the data using mathematical, statistical and machine learning
methods
• CRISP-DM: Cross-Industry Standard Process for Data Mining (late 1990s)
o guideline for a structured approach to execute a data mining process
o six phases (the whole process can restart several times)

, o updated versions of CRISP-DM based on new demands
▪ IBM: ASUM-DM (Analytics
Solutions Unified Method
for Data Mining)
▪ SAS: SEMMA (Sampling,
Exploring, Modifying,
Modeling, Assessing)
▪ Microsoft: TDSP (Team
Data Science Process)
• Phases of CRISP-DM
o 1) Business Understanding
▪ Understanding the
business goal
▪ Situation assessment
▪ Translating the business goal to a data mining objective
▪ Development of a project plan
o 2) Data Understanding
▪ Considering data requirements
▪ Initial data collection, exploration, and quality assessment
o 3) Data Preparation
▪ Selection of required data
close
▪ Data cleaning
dependency
▪ Data transformation and enrichment
o 4) Modeling
▪ Selection of the appropriate modeling technique
▪ Training and test set creation for evaluation
▪ Development and examination of alternative modeling algorithms
▪ Fine tuning the model parameters
o 5) Model Evaluation
▪ Evaluation of the model in the context of the business success criteria
▪ Model approval
o 6) Deployment
▪ Reporting of the findings
▪ Planning and development of deployment procedure
▪ Deployment of the model
▪ Development of a maintenance or update plan
▪ Review of the project and planning the next steps

,• Team Data Science Process (TDSP)
o TDSP is Microsoft’s new version of CRISP-DM
o Key components:
▪ A data science lifecycle definition
▪ A standardized project structure
▪ Infrastructure and resources recommended for data science projects
▪ Tools and utilities recommended for project execution.




o TDSP Infrastructure and Resources for Data Science Projects: TDSP provides
recommendations for managing shared analytics and storage infrastructure
such as:
▪ Cloud file systems for storing datasets
▪ Databases
▪ Big Data (SQL or Spark) clusters
▪ Machine learning service
o TDSP provides recommendations for R and Python
• Data Science Trajectory / Process
o Raw Data → Process Data → Clean Data (suppress noise, add missing data)
o from exploratory data analysis you can go back to get more data or proceed
o data product can be used in real world

, • A Data Scientist’s Role in This Process
o Initial step: What data is needed?
o Data can come from different fields
o raw data can be from different type of data
o process and clean data to suppress noise and discard outliers
o data scientists formulate a hypothesis and research question which should be
answered within a study → we need to know where we are going to




Big Data
• big data combines different sources of data (e.g., social media, transactions,
enterprise data…)
• Big data can be defined as a collection of diverse and large amounts of data that is
hard to process with conventional data processing platforms.
→ Doug Laney’s explanation: big data = 3 V’s (volume, velocity, variety) plus value
→ big data = high volume, high velocity, and high variety of information
o “Data are becoming the new raw material of business: Economic input is
almost equivalent to capital and labor” (Economist, 2010)
o “Information will be the “21th Century Oil”” (Gartner company, 2019)
€6,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
hannahgruber Tilburg University
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
102
Lid sinds
3 jaar
Aantal volgers
63
Documenten
9
Laatst verkocht
2 weken geleden

4,3

8 beoordelingen

5
5
4
1
3
1
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen