100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Resumen

Summary Data Mining For Business And Governance (880022-M-6)

Puntuación
-
Vendido
-
Páginas
17
Subido en
21-06-2022
Escrito en
2021/2022

Detailed summary of all lectures and additional notes, explanations and examples for the course "Data Mining for Business and Governance" at Tilburg University which is part of the Master Data Science and Society. Course was given by Ç. Güven, G.R. Nápoles during the second semester, block three of the academic year 2021 / 2022 (January to March 2022).

Mostrar más Leer menos
Institución
Grado










Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
21 de junio de 2022
Número de páginas
17
Escrito en
2021/2022
Tipo
Resumen

Temas

Vista previa del contenido

Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Data Mining for Business and Governance (880022-M-6)
Lecturers: Ç. Güven, G.R. Nápoles

,Introduction to Data Mining
• no fixed definition, umbrella term
o Knowledge discovery in databases, Statistics, Artificial Intelligence, Machine learning
• Computation vs large data sets: trade-off between processing time and memory
o the larger the dataset, the more computational resources are needed
• Large amounts or big data: Volume, Variety, Velocity

Pipeline of a data mining task




Basic data types
• Dependency oriented: explicit or implicit relationships
• Non-Dependency oriented: no specified dependency between records (multidimensional
data)
• For many machine learning models, observations are assumed to be independent

What makes prediction possible?
• Associations between features/target, understand how datapoints are related
• Numerical: correlation coefficient
• Categorical: mutual information Value of x1 contains information about value of x2

Correlation coefficient
• Pearson's r/R measures the strength of linear relationship (dependency), no other shapes
• range (-1,1), the lower the number, the more dispersed the data is, 0 = randomly distributed
• for a strong linear relationship between two features, one of the features can be linearly
expressed in terms of the other and that makes one of those redundant in analysis





• Numerator: covariance (to what extent the features change together)
• Denominator: product of standard deviations (makes correlations independent of units)

Correlation versus causation
• Correlation does not imply causation
• correlation is a coincidence
• explain and check causation in an experimental study
o vary a single variable while the others are kept equal

, Supervised learning
• use labeled data to train the algorithm
• classification and regression problems

learning workflow
• 1) collect data
o consider reliability of measurement, privacy, and other regulations
o split data into training, validation, and test set with similar structure
▪ training set for learning
▪ validation set for tuning and setting hyperparameters
▪ test set for final evaluation
• 2) label examples (sometimes part of data collection)
o Annotation guidelines, Measure inter-annotator agreement, Crowdsourcing
• 3) choose representation (part of preprocessing)
o Features: attributes describing examples
o Observations: observed values for a given attribute
▪ numerical features: discrete or continuous
▪ categorical / nominal features, binary features
▪ ordinal features (scale)
o features can be converted to a vector
o ‘feature transformation’: e.g., use dummy coding to transform a categorical feature
to a numerical one
o ‘feature extraction’: select relevant features which represent the input and define
the output
• 4) train model(s)
o hyperparameters: settings for an algorithm decided by the programmer
▪ for each value of hyperparameter:
1) Apply algorithm to training set to learn
2) Check performance on validation set
3) Find/Choose best-performing setting
• 5) evaluate
o Check performance of tuned model on test set
o Goal: estimate how well your model will do in the real world (generalization)

regression task: predicting a numeric quantity
• regression analysis describes the relationship between random variables
• it can predict the value of one variable based on another variable and show trends
• output of regression problem is a function describing the relation between x and y
• numerical prediction (predict values for continuous variables) possible unlike classification

linear regression
• simplest regression technique with two types of variables
• aim is to minimize the difference between the predicted and the actual values
• measurements
o sum of squared errors




o or different loss functions
$7.93
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada


Documento también disponible en un lote

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
hannahgruber Tilburg University
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
103
Miembro desde
3 año
Número de seguidores
63
Documentos
9
Última venta
22 horas hace

4.3

8 reseñas

5
5
4
1
3
1
2
1
1
0

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes