Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Summary Data Mining for Business and Governance

Note
-
Vendu
6
Pages
66
Publié le
01-02-2023
Écrit en
2022/2023

English summary of the Data Mining fro Business and Governance course from Master in Data Science and Society. Summary of lecture materials, readings, and notes.

Établissement
Cours











Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

École, étude et sujet

Établissement
Cours
Cours

Infos sur le Document

Publié le
1 février 2023
Nombre de pages
66
Écrit en
2022/2023
Type
Resume

Sujets

Aperçu du contenu

Data mining for business and governance
Week 1

What is data mining: the study of collecting, cleaning, processing, analyzing and gaining
useful insight of information.
- It is an umbrella term and the methods used relates to different disciplines:
o Knowledge discovery in databases
o Statistics
o Artificial intelligence
o Machine learning

Key aspects:
- Computation vs large data sets: trade-off between processing time and memory.
- Computation enables analysis of the large data sets: computers as a tool and with
growing data.
- Data mining often implies knowledge discovery from databases: from unstructured
data to structured knowledge.

What are large amounts or big data:
- Volume
o Too big for manual analysis
o Too big to fit in RAM
o Too big to store on disk
- Variety
o Range of values: variance
o Outliers, confounders and noise
o Different data types
- Velocity
o Data changes quickly: require results before data changes
o Streaming data (no storage)
It is not only about volume but also about complexity (variety) and for example the speed of
the database.

Applications of data mining:
- Companies: business intelligence
- Science: knowledge discovery

,Datapoints could be observations that you have in a dataset. They could be related
(dependent) or independent.
Dependency oriented data types: explicit, like in social networks, relationships can be
captured by edges in graph representations, or implicit like temperature readings of a
sensor, similar values, this relationship is not market explicitly:
- Spatial data, spatio temporal data
- Network data
- Time series data
- String data (text)
- Discrete sequences (event logs)
The dependency properties relate for example to the assumptions of the methods that can
be applied.

In short: Implicit data is information that is not provided intentionally but gathered from
available data streams, either directly or through analysis of explicit data. Explicit data is
information that is provided intentionally, for example through surveys and membership
registration forms.

Non dependency oriented: no specified dependency records:
- Multidimensional data, can be quantitative, categorical, binary (can be seen as
quantitative or categorical)
- Text data with a representation ignoring the relationship, for example looking into
the frequency of words.
For machine learning models, observations are assumed to be independent of each other.

,The general pipeline:




What makes prediction possible?
- Fitting data is easy, but predictions are hard




- Associations between features/target
o Numerical: correlation coefficient
o Categorical: mutual information value of x1 contains information about value
of x2

Statistical descriptions of data:
Measures of central tendency:
- Mean: average
- Median: the middle vale in a set of ordered data values
- Mode: value that occurs most frequently in the set




Statistical descriptions are relevant for summarizing the data, for having an overview when
we are exploring the data.

, Measuring the spread of data, five number summary:
- Range: (max()) – (min())
- Quantiles: points taken at regular intervals of a data distribution, dividing it into
essentially equal size consecutive sets. The second quantile is the median, the 4
quantiles are quartiles (3 data points Q1, Q2, Q3), and 100 quartiles are percentiles.
- Interquartile range: IQR= Q3 – Q1

Basic plots: Box plot: includes Q1, median, Q3, min and max values as well as outliers, points
are at least 1,5 IQR further away from Q1 and Q3. It shows you information about how the
data is spread.




When there are outliers, instead of min max, points that are at 1,5IQR distance from the
median are used instead of the min max values for the whisks (horizontal lines outside the
box). Box plots allow us to compare distributions of several features.

Measuring the dispersion of data:
- Variance standard deviation



Basic plot: scatter plot: when you have two features. It gives you an overview about how the
data is distributed over the x, y plane.

Correlation: measure of how two features are moving together. There is also a coefficient
related to that that you can calculate.
- Pearson’s r measures the strength of linear relationship (dependency)
€4,49
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
liekebuuron Avans Hogeschool
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
170
Membre depuis
5 année
Nombre de followers
103
Documents
15
Dernière vente
2 semaines de cela

3,3

12 revues

5
5
4
2
3
1
2
0
1
4

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions