100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Summary of the methodology course for the premasters of DSS

Beoordeling
-
Verkocht
1
Pagina's
40
Geüpload op
01-02-2022
Geschreven in
2021/2022

All the stuff covered in the course












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
1 februari 2022
Aantal pagina's
40
Geschreven in
2021/2022
Type
Samenvatting

Voorbeeld van de inhoud

1. A brief overview of data science and methods
Data science definition
Data science is the art of turning data into actions. The data science is the practice of using
computational methods through real knowledge through data.
The terminology exists since 1960s.
The term data science as in the context of this course introduced in 1990s in statistics and
data mining communities.
It was first named as an independent discipline in 2001.

Data types




- Structured data: data is stored/processed/manipulated in a traditional relational
database management system (RDBMS). Examples of structured data include names,
dates, addresses, credit card numbers, stock information, geolocation, and more.
- Unstructured data: is information that is not arranged according to a pre-set data
model or schema, and therefore cannot be stored in a traditional relational database
or RDBMS. including Word documents, email messages, PowerPoint presentations,
survey responses, transcripts of call center interactions and posts from blogs and
social media sites. Other types of unstructured data include images, audio and video
files.
- Semi-structured data: structured by text that is useful for creating a form or order of
hierarchy, somewhere between structured data and unstructured data. A few
examples of semi-structured data sources are emails, XML and other markup
languages, binary executables, TCP/IP packets, zipped files, data integrated from
different sources, and web pages.

Sources of data

,Impact of data science in business
Data science is one of the driving forces in maintaining competitiveness in business world.
Organizations can achieve:
- 17 - 49 % increase in their productivity when they increase data usability by 10 %
- 11 – 42 % return on assets when they increase data access by 10 %
- 5 - 6 % performance improvement via data-driven decisions

The four key activities
The key activities of Data Science in an organization can be classified into four groups:
- Acquire: This step consists of obtaining the needed data
- Prepare: This step includes the preprocessing operations on the data (cleaning the
data)
- Analyze: The data is analyzed, and the results are interpreted at this step
- Act: In accordance with the results that are obtained in the previous step, actions are
taken

Effort vs activities




Data mining
Data mining aims to reveal patterns in data using machine learning, statistics and database
systems.

A more extensive definition: Data Mining is the process of extracting previously unknown
and potentially useful information from the data using mathematical, statistical, and
machine learning methods.

Cross Industry Standard Process for Data Mining (CRISP-DM)
Considering the variety of industries and variety of data that is generated, there is a need of
a structured approach to execute a data mining project (guidelines, step by step procedure)
CRISP-DM is a combined effort of different institutions in an EU project in late 1990s. It
breaks the whole data mining process into six phases.

CRISP-DM Phases
- Starting phase at Business
Understanding
- Process goes back and forth

1

, - BA and DA process influences each other
- Circulating process




Business Understanding:
- Understanding the business goal
- Situation assessment
- Translating the business goal to a data mining objective
- Development of a project plan (most important of the phases)

Data Understanding:
- Considering data requirements
- Initial data collection, exploration, and quality assessment

Data Preparation:
- Selection of required data
- Data cleaning
- Data transformation and enrichment (crucial step before modeling)

Modeling:
- Selection of the appropriate modeling technique
- Training and test set creation for evaluation
- Development and examination of alternative modeling algorithms
- Fine tuning the model parameters

Model Evaluation:
- Evaluation of the model in the context of the business success criteria
- Model approval

Deployment:
- Reporting of the findings
- Planning and development of deployment procedure
- Deployment of the model.
- Development of a maintenance or update plan
- Review of the project and planning the next steps

Approached followed CRISP-DM
CRISP-DM was followed by other approaches from different companies:
- IBM introduced ASUM-DM
- SAS introduced SEMMA
- Microsoft introduced TDSP (Team Data Science Process)
They can be considered as updated versions of CRISP-DM based on new demands.



2

, Team Data Science Process (TDSP)
TDSP is Microsoft’s new version of CRISP-DM.

Key components:
- A data science lifecycle definition
- A standardized project structures
- Infrastructure and resources recommended for data science projects
- Tools and utilities recommended for project execution




TDSP Infrastructure and Resources for Data Science Projects
TDSP provides recommendations for managing shared analytics and storage infrastructure
such as:
- Cloud file systems for storing datasets
- Databases
- Big Data (SQL or Spark) clusters
- Machine learning service
TDSP provides recommendations for R and Python

Data Science Trajectory




3

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
adata Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
28
Lid sinds
4 jaar
Aantal volgers
15
Documenten
12
Laatst verkocht
8 maanden geleden

2,0

2 beoordelingen

5
0
4
0
3
0
2
2
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen