100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Lecture notes

Data Analytics (ITNPBD6) Class Notes

Rating
-
Sold
1
Pages
74
Uploaded on
12-07-2021
Written in
2020/2021

Handwritten notes related to the Data Analytics (ITNPBD6) course at the University of Stirling. I obtained a high 1st in the module. The notes cover all the basics Machine Learning principles and techniques. The outline is as follows: Chapter 1: Introduction to Data Analytics Chapter 2: Model Accuracy Chapter 3: Classification Chapter 4: Regression Chapter 5: Clustering Chapter 6: Neural Networks and Deep Learning Chapter 7: Metaheuristics, hyper-parameters and dimensionality reduction Chapter 8: Natural Language Processing Chapter 9: Visualization, ethics, trust and explainability The notes are 74 pages in total and contain graphs and figures.

Show more Read less











Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
July 12, 2021
Number of pages
74
Written in
2020/2021
Type
Lecture notes
Professor(s)
Multiple
Contains
All classes

Content preview

ITNBD6




DATA ANALYTICS

,CHAPTER 1

INTRODUCTION TO DATA ANALYTICS

Objectives:

• Describe CRISP-DM and how it can be applied to real-world problems
• Recognise the differences between variable types
• Discuss the differences between continuous and discrete distributions
• Identify the need for data cleaning
• Load a dataset and use visualisations to clean the data in both Orange and Python


1.1 Data Analysis
1.1.1 Model
In data analysis the approach
,
is driven by learning something
about data that would have been hard or even impossible to write

computer code for by hand .




The knowledge learnt is then embedded in what is called a model ,
a general framework capable of performing a particular
task Typically the model will take in data points and output
.
,


predictions or estimates It has a number of parameters that
.




are determined as part of the learning process and is a
representation of what has been learned about a data set .




The functionality of the model is determined by the data and
not by pre programmed rules
-
.




• Data mining :
process of learning patterns , making predictions and
building the model .





Hyper parameters :
settings that control how the model learns
and operates .





Learning 1 training :
process by which a model 's parameters are

determined .





Inference :
process of providing previously unseen data to a
trained model and making predictions or estimates
about them .

,1.1.2 Data
Data is the raw material used for machine learning consisting
of a set of variables .
Each variable can take a range of values
known as its domain .




Water volume = C ?) minutes
-


b
-


k d
variable parameter variable



The data in question is a snapshot of real world and data
mining assumes that whatever produced the data will in some

way continue to produce it in the same way in the future .




We might encounter problems with this approach as the data
we're provided with might :




• have errors
• be incorrect

be missing parts

be insufficient in quantity



A collection of data ,
known as a data set ,
contains a set of
values for a number of variables It .
is often represented in
tabular format in which one row is a single data point ( or
instance ) and is made up of a value for each of the variables
in table
the .
A column of the table corresponds to a single
variable .




1.1.3 Supervised vs. Unsupervised Learning
In supervised learning the data the model is trying learn
to
from is marked with the correct values and it can be used to
test the model
quality of a .




It involves data that describes both the inputs and outputs
to the system and requires a
mapping to be learned from
the inputs to the outputs .




In unsupervised learning there is no existing set of clusters to
compare against .




It involves only the inputs and requires the algorithm to
organize and characterize the data in some way .

, 1.1.4 Tasks performed with Data Mining
SUPERVISED LEARNING



Classification :



An inputpattern is classified as belonging to one of a

number of possible classes The output variable is .




nominal and the inputs can be a mix of numerical
and nominal .





Prediction 1 Regression :



A continuous output value is calculated from an
input pattern The learning task is to find the
.




relationship between the input variables and one or
more output variables The inputs can be a mixture .




of numeric and nominal variables but the output of ,


a regression task is always numeric .




UNSUPERVISED LEARNING



Clustering :


Data points that are close to each other , by some

distance metric ,
are assigned to one of a number of
clusters so that members of different clusters are

far apart .




The input variables can be numeric or nominal .




Clustering is similar to classification except that the
class labels are not given by the training data but ,



they are inferred from the distribution of points in the
input data .





Novelty detection :



Requires the system to spot patterns of data that
have not been seen before There is no output variable .




in the training data but the resulting system will have
a binary output that classifies each input pattern as

novel or not .





probability distribution estimation :


Build a model that takes a single data point as input
and produces an estimate of the density of the
population data at that point .




A model is built from the data in the form of a

function from the inputs X to a probability estimate ,


which is not known and must be inferred
p CX ) ,
.
£16.10
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
clacc

Get to know the seller

Seller avatar
clacc The University of Stirling
View profile
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
4 year
Number of followers
1
Documents
1
Last sold
4 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions