100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05)

Rating
-
Sold
-
Pages
22
Uploaded on
15-06-2023
Written in
2022/2023

All mandatory chapters and lectures are covered in this summary.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
Yes
Uploaded on
June 15, 2023
Number of pages
22
Written in
2022/2023
Type
Summary

Subjects

Content preview

Samenvatting Data Mining and its Applications
Week 1
- Lecture 1
Data mining is the extraction of interesting information or patterns from large data sources,
which may originally have been developed for other purposes, employing machine and
statistical learning and possibly high-end computational power, in order to serve business
purposes.
Data mining examples: Risk assessment, demand forecasting, fraud detection, anomaly
detection.
From data → knowledge




Data can be at rest, on the move or in use.




There are several data mining stakeholders:
● Business user: Business understanding
● Project Sponsor: Project driver
● Project manager: end to end project delivery
● Business intelligence Analyst: data understanding
● Data administrator & Integrator: data preparation & solution delivery
● Data scientist/ engineer: data modeling and evaluation

Data mining project workflow:
Inception and discovery → Data preparation → Model planning → Model building→
Communicate results → Operationalise
ETL: extraction, Transformation, Loading
Goal of the data understanding phase is gaining general insights about the data that will
potentially be helpful for further steps in the data analysis process. Never trust data until you
have carried out some simple plausibility checks.
Attributes: Features, variables

,Instances: Records, data objects, entries
Data can usually be described in terms of tables or matrices
Attributes differ for their scale type, according to the type of values that they can assume
Three scale types: • Categorical / Nominal • Ordinal • Numeric
Granulariteit, de staat van bestaan in korrels of korrels, verwijst naar de mate waarin een
materiaal of systeem is samengesteld uit te onderscheiden stukken.
Some attributes have a fixed domain (months), some change over time (products in a catalog)
Data quality issues: Availability, usability, reliability, relevance, presentation quality.
Accuracy is defined as the closeness between the value in the data and the true value
→ Syntactic, the value might not be correct but it belongs at least to the domain corresponding
attribtue
→ Semantic, the value might not be in the domain of the corresponding attribute, but it is not
correct.
Data quality issues: completeness
Visualisation charts: Comparison, time series, correlation, value distribution

Chapter 1 - Motivation
Data refer to single instances, describe individual properties, are often available in large
amounts, easy to collect or obtain or do not allow us to make predictions.
Knowledge refers to classes of instances, describes general patterns, structures, laws etc,
consists of as few statements as possible, is often difficult and time consuming to find or to
obtain and allows us to make predictions and forecasts.
Criteria to assess knowledge:
- Correctness
- Generality
- Usefulness
- Comprehensibility
- Novelty
Descriptive statistics summarises data without making specific assumptions about the data.
Inferential statistics provide more rigorous methods than descriptive statistics that are based on
certain assumptions about the data generating random process.
In an experimental study one can control and manipulate the data generating process.
In an observational study one cannot control the data generating process.
Exploratory data analysis is concerned with generating hypotheses from the collected data.
Data science, the opportunity of analysing large real world data repositories that were initially
collected for different purposes that came with the availability of powerful tools and technologies
that can process and analyse massive amounts of data.
CRISP-DM:

, Problem categories:
- Classification, predict the outcome of an experiment with a finite number of possible
results.
- Regression, a prediction task with a numerical value of interest.
- Clustering, summarise the data to get a better overview by forming groups of similar
cases.
- Association analysis, find any correlations or associations to better understand or
describe the interdependencies of all attributes.
- Deviation analysis, knowing already the major trends or structures, find any exceptional
subgroup that behaves differently with respect to some target attribute.

Chapter 2 - Practical data science: an example
An example is described with a naive and a sound approach

Chapter 3 - Project understanding
Determine the project objective: objective, deliverable, success criteria
Assess the situation, assessing resources, clarifying access, evaluating assumptions and risks,
and verifying the suitability of data for the project to avoid wasting resources on potentially
unsuccessful endeavours.
Determine analysis goals: It is crucial to carefully consider the limitations and practical
implications of the chosen architecture to ensure that the developed model aligns with the
intended use and produces valuable results.
Desirable properties: Interpretability, reproducibility, model flexibility, runtime, interestingness

Chapter 4 - Data understanding
Domain is the set of possible values for an attribute.
Scale type: nominal, ordinal, numeric
Granularity is the level of refinement chosen.
Data quality refers to how well the data fit their intended use.
- Accuracy is defined as the closeness between the value in the data and the true value.
- Syntactic accuracy means that a considered value might not be correct, but it belongs at
least to the domain of the corresponding attribute.
$9.39
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
ayebdrenth

Get to know the seller

Seller avatar
ayebdrenth Rijksuniversiteit Groningen
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
2 year
Number of followers
1
Documents
2
Last sold
2 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions