100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
College aantekeningen

Data Analytics (ITNPBD6) Class Notes

Beoordeling
-
Verkocht
1
Pagina's
74
Geüpload op
12-07-2021
Geschreven in
2020/2021

Handwritten notes related to the Data Analytics (ITNPBD6) course at the University of Stirling. I obtained a high 1st in the module. The notes cover all the basics Machine Learning principles and techniques. The outline is as follows: Chapter 1: Introduction to Data Analytics Chapter 2: Model Accuracy Chapter 3: Classification Chapter 4: Regression Chapter 5: Clustering Chapter 6: Neural Networks and Deep Learning Chapter 7: Metaheuristics, hyper-parameters and dimensionality reduction Chapter 8: Natural Language Processing Chapter 9: Visualization, ethics, trust and explainability The notes are 74 pages in total and contain graphs and figures.

Meer zien Lees minder
Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
12 juli 2021
Aantal pagina's
74
Geschreven in
2020/2021
Type
College aantekeningen
Docent(en)
Multiple
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

ITNBD6




DATA ANALYTICS

,CHAPTER 1

INTRODUCTION TO DATA ANALYTICS

Objectives:

• Describe CRISP-DM and how it can be applied to real-world problems
• Recognise the differences between variable types
• Discuss the differences between continuous and discrete distributions
• Identify the need for data cleaning
• Load a dataset and use visualisations to clean the data in both Orange and Python


1.1 Data Analysis
1.1.1 Model
In data analysis the approach
,
is driven by learning something
about data that would have been hard or even impossible to write

computer code for by hand .




The knowledge learnt is then embedded in what is called a model ,
a general framework capable of performing a particular
task Typically the model will take in data points and output
.
,


predictions or estimates It has a number of parameters that
.




are determined as part of the learning process and is a
representation of what has been learned about a data set .




The functionality of the model is determined by the data and
not by pre programmed rules
-
.




• Data mining :
process of learning patterns , making predictions and
building the model .





Hyper parameters :
settings that control how the model learns
and operates .





Learning 1 training :
process by which a model 's parameters are

determined .





Inference :
process of providing previously unseen data to a
trained model and making predictions or estimates
about them .

,1.1.2 Data
Data is the raw material used for machine learning consisting
of a set of variables .
Each variable can take a range of values
known as its domain .




Water volume = C ?) minutes
-


b
-


k d
variable parameter variable



The data in question is a snapshot of real world and data
mining assumes that whatever produced the data will in some

way continue to produce it in the same way in the future .




We might encounter problems with this approach as the data
we're provided with might :




• have errors
• be incorrect

be missing parts

be insufficient in quantity



A collection of data ,
known as a data set ,
contains a set of
values for a number of variables It .
is often represented in
tabular format in which one row is a single data point ( or
instance ) and is made up of a value for each of the variables
in table
the .
A column of the table corresponds to a single
variable .




1.1.3 Supervised vs. Unsupervised Learning
In supervised learning the data the model is trying learn
to
from is marked with the correct values and it can be used to
test the model
quality of a .




It involves data that describes both the inputs and outputs
to the system and requires a
mapping to be learned from
the inputs to the outputs .




In unsupervised learning there is no existing set of clusters to
compare against .




It involves only the inputs and requires the algorithm to
organize and characterize the data in some way .

, 1.1.4 Tasks performed with Data Mining
SUPERVISED LEARNING



Classification :



An inputpattern is classified as belonging to one of a

number of possible classes The output variable is .




nominal and the inputs can be a mix of numerical
and nominal .





Prediction 1 Regression :



A continuous output value is calculated from an
input pattern The learning task is to find the
.




relationship between the input variables and one or
more output variables The inputs can be a mixture .




of numeric and nominal variables but the output of ,


a regression task is always numeric .




UNSUPERVISED LEARNING



Clustering :


Data points that are close to each other , by some

distance metric ,
are assigned to one of a number of
clusters so that members of different clusters are

far apart .




The input variables can be numeric or nominal .




Clustering is similar to classification except that the
class labels are not given by the training data but ,



they are inferred from the distribution of points in the
input data .





Novelty detection :



Requires the system to spot patterns of data that
have not been seen before There is no output variable .




in the training data but the resulting system will have
a binary output that classifies each input pattern as

novel or not .





probability distribution estimation :


Build a model that takes a single data point as input
and produces an estimate of the density of the
population data at that point .




A model is built from the data in the form of a

function from the inputs X to a probability estimate ,


which is not known and must be inferred
p CX ) ,
.
€18,56
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
clacc

Maak kennis met de verkoper

Seller avatar
clacc The University of Stirling
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
1
Lid sinds
4 jaar
Aantal volgers
1
Documenten
1
Laatst verkocht
4 jaar geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen