100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05)

Beoordeling
-
Verkocht
-
Pagina's
22
Geüpload op
15-06-2023
Geschreven in
2022/2023

Alle verplichte hoofdstukken en alle colleges worden in deze samenvatting behandeld.

Instelling
Vak










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Gekoppeld boek

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Heel boek samengevat?
Ja
Geüpload op
15 juni 2023
Aantal pagina's
22
Geschreven in
2022/2023
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Samenvatting Data Mining and its Applications
Week 1
- Lecture 1
Data mining is the extraction of interesting information or patterns from large data sources,
which may originally have been developed for other purposes, employing machine and
statistical learning and possibly high-end computational power, in order to serve business
purposes.
Data mining examples: Risk assessment, demand forecasting, fraud detection, anomaly
detection.
From data → knowledge




Data can be at rest, on the move or in use.




There are several data mining stakeholders:
● Business user: Business understanding
● Project Sponsor: Project driver
● Project manager: end to end project delivery
● Business intelligence Analyst: data understanding
● Data administrator & Integrator: data preparation & solution delivery
● Data scientist/ engineer: data modeling and evaluation

Data mining project workflow:
Inception and discovery → Data preparation → Model planning → Model building→
Communicate results → Operationalise
ETL: extraction, Transformation, Loading
Goal of the data understanding phase is gaining general insights about the data that will
potentially be helpful for further steps in the data analysis process. Never trust data until you
have carried out some simple plausibility checks.
Attributes: Features, variables

,Instances: Records, data objects, entries
Data can usually be described in terms of tables or matrices
Attributes differ for their scale type, according to the type of values that they can assume
Three scale types: • Categorical / Nominal • Ordinal • Numeric
Granulariteit, de staat van bestaan in korrels of korrels, verwijst naar de mate waarin een
materiaal of systeem is samengesteld uit te onderscheiden stukken.
Some attributes have a fixed domain (months), some change over time (products in a catalog)
Data quality issues: Availability, usability, reliability, relevance, presentation quality.
Accuracy is defined as the closeness between the value in the data and the true value
→ Syntactic, the value might not be correct but it belongs at least to the domain corresponding
attribtue
→ Semantic, the value might not be in the domain of the corresponding attribute, but it is not
correct.
Data quality issues: completeness
Visualisation charts: Comparison, time series, correlation, value distribution

Chapter 1 - Motivation
Data refer to single instances, describe individual properties, are often available in large
amounts, easy to collect or obtain or do not allow us to make predictions.
Knowledge refers to classes of instances, describes general patterns, structures, laws etc,
consists of as few statements as possible, is often difficult and time consuming to find or to
obtain and allows us to make predictions and forecasts.
Criteria to assess knowledge:
- Correctness
- Generality
- Usefulness
- Comprehensibility
- Novelty
Descriptive statistics summarises data without making specific assumptions about the data.
Inferential statistics provide more rigorous methods than descriptive statistics that are based on
certain assumptions about the data generating random process.
In an experimental study one can control and manipulate the data generating process.
In an observational study one cannot control the data generating process.
Exploratory data analysis is concerned with generating hypotheses from the collected data.
Data science, the opportunity of analysing large real world data repositories that were initially
collected for different purposes that came with the availability of powerful tools and technologies
that can process and analyse massive amounts of data.
CRISP-DM:

, Problem categories:
- Classification, predict the outcome of an experiment with a finite number of possible
results.
- Regression, a prediction task with a numerical value of interest.
- Clustering, summarise the data to get a better overview by forming groups of similar
cases.
- Association analysis, find any correlations or associations to better understand or
describe the interdependencies of all attributes.
- Deviation analysis, knowing already the major trends or structures, find any exceptional
subgroup that behaves differently with respect to some target attribute.

Chapter 2 - Practical data science: an example
An example is described with a naive and a sound approach

Chapter 3 - Project understanding
Determine the project objective: objective, deliverable, success criteria
Assess the situation, assessing resources, clarifying access, evaluating assumptions and risks,
and verifying the suitability of data for the project to avoid wasting resources on potentially
unsuccessful endeavours.
Determine analysis goals: It is crucial to carefully consider the limitations and practical
implications of the chosen architecture to ensure that the developed model aligns with the
intended use and produces valuable results.
Desirable properties: Interpretability, reproducibility, model flexibility, runtime, interestingness

Chapter 4 - Data understanding
Domain is the set of possible values for an attribute.
Scale type: nominal, ordinal, numeric
Granularity is the level of refinement chosen.
Data quality refers to how well the data fit their intended use.
- Accuracy is defined as the closeness between the value in the data and the true value.
- Syntactic accuracy means that a considered value might not be correct, but it belongs at
least to the domain of the corresponding attribute.
€7,83
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
ayebdrenth

Maak kennis met de verkoper

Seller avatar
ayebdrenth Rijksuniversiteit Groningen
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
1
Lid sinds
2 jaar
Aantal volgers
1
Documenten
2
Laatst verkocht
2 jaar geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen