100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Data Mining 2017/2018 - Summary

Puntuación
1.7
(3)
Vendido
12
Páginas
43
Subido en
10-01-2018
Escrito en
2017/2018

Extended summary (uitgebreide samenvatting) Data Mining Data Science Regression Classification Clustering Dimensionality Reduction

Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
10 de enero de 2018
Número de páginas
43
Escrito en
2017/2018
Tipo
Resumen

Temas

Vista previa del contenido

Data Mining W1
What is Data Mining?
“Data mining is the computational process of discovering patterns in large
data sets involving methods at the intersection of:

 Statistics (branch of mathematics focused on data);
 Machine Learning (branch of Computer Science studying learning from data);
 Artificial Intelligence (interdisciplinary field aiming to develop intelligent machines);
 Database systems.

Key aspects
 Computation vs Large data sets (trade-off between processing time and memory)
 Computation enables analysis of large data sets (computers as a tool and with growing data)
 Data Mining often implies data discovery from databases (from unstructured data to
structured knowledge)
 Text Mining (natural language processing): going from unstructured text to structured
knowledge

What is large amounts or big data?
 Volume (too big: for manual analysis, to fit in RAM, to store on disk)
 Variety (range of values: variance | Outliers, confounders and noise | Interactions, data is co-
dependent
 Velocity (data changes quickly: require results before data changes | Streaming data, no
storage)

Application of data mining
 Companies: Business Intelligence (Amazon, Booking, AH)
o Market analysis and management
 Science: Knowledge Discovery (University, Laboratories)
o Scientific discovery in large data

What makes prediction possible?
 Associations between features/target (Amazon)
 Numerical: correlation coefficient
 Categorical: mutual information Value of x1 contains information about value of x2




 Fitting data is easy, but predictions are hard!

,Iris dataset




Pearson’s r (correlation coefficient)
 Numerator: covariance (to what extent the features change together)
 Denominator: product of standard deviations (makes correlations independent of units)




Pearson’s coefficient of Petal Length by Petal Width:

Caveats
 Pearson’s r only measures linear dependency
 Other types of dependency can also be used for
prediction!
 Correlation does not imply causation, but it may still
enable prediction.

What is machine learning?
“A program is said to learn from experience (E) on task (T) and a performance (P) measure, if its
performance measured by P at tasks in T improves with E.”

,Supervised Learning
INPUT  OUTPUT

 Classification: output » class labels
 Regression: output » continuous values



Classification | Regression




Supervised learning Workflow
1. Collect data (How do you select your sample? Reliability, privacy and other regulations.)
2. Label example (Annotation guidelines, measure inter-annotator agreement, crowdsourcing.)
3. Choose example representation
 Features: attributes describing examples (
o Numerical
o Categorical
 Possibly convert to feature vectors
o A vector is a fixed-size list of numbers
o Some learning algorithms require examples represented as vectors
4. Train model(s)
 Keep some examples for final evaluation: test set
 Use the rest for
o Learning: training set
o Tuning: validation set
5. Evaluate
 Check performance of tuned model on test set
 Goal: estimate how well your model will do in the real world
 Keep evaluation realistic!

Parameter or model tuning
 Learning algorithms typically have settings (aka hyperparameters)
 For each value of hyperparameters:
o Apply algorithm to training set to learn
o Check performance on validation set
o Find/Choose best-performing setting

, Unsupervised learning
INPUT

 Clustering: group similar objects
 Dimensionality reduction: reduce random variables

Clustering | Dimensionality reduction




Clustering
Task of grouping a set of objects in such a way that objects in the same group (called a cluster) are
more similar (in some sense or another) to each other than to those in other groups (clusters).

Dimensionality reduction
 Feature selection: reduce the large amount of data
o Reduce complexity and easier interpretation
o Reduce demand on resources (computation / memory)
o Reduce the ‘curse of dimensionality’
o Reduce chance of over-fitting
 Feature extraction: often domain specific
o Image Processing: edge detection
o From pixels to reduced set of features
o Often part of pre-processing, but might contain the hard problems
$5.38
Accede al documento completo:
Comprado por 12 estudiantes

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Reseñas de compradores verificados

Se muestran los 3 comentarios
7 año hace

7 año hace

Unfortunately does not correspond with subject matter 18/19 and not much addition to sheets

7 año hace

Sad to hear. I deliberately put 2017/2018 in the title to prevent this kind of disappointment.

7 año hace

I understand, but if the substance does not match, the summary of 17/18 is not really of value, of course

7 año hace

You're quite right. Probably the content of the course has changed considerably compared to 2017/18. That course was not entirely faultlessly honest.

7 año hace

1.7

3 reseñas

5
0
4
0
3
1
2
0
1
2
Reseñas confiables sobre Stuvia

Todas las reseñas las realizan usuarios reales de Stuvia después de compras verificadas.

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
JHessels Tilburg University
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
49
Miembro desde
7 año
Número de seguidores
33
Documentos
9
Última venta
1 año hace

2.5

6 reseñas

5
0
4
1
3
3
2
0
1
2

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes