What is machine learning?
G
● ive computers the ability tolearn from datawithout being explicitly programmed
● We requiresufficienthistory data for effective learning
○ Representative data vs. extreme data
● The system should improve predictions based onpriorexperiences
○ Compare results
● There is no need for pre-established rules to determine outputs
● ML focuses onlearning from data (experience)to solveproblems that are difficult to model with
traditional programming
● ML uses training data (experience) to learnpatternsand rules, which are then applied to new, unseen
data
● They are used for tasks that are hard to model with fixed rules but where data (experience) is available.
● Clear performance measures are essential to evaluate success.
E xample:
ML/AI-based weather prediction
ow does AI forecast the weather?
H
● Collect data on the earth
● Traditional: Use physics to determine the relationships and create a forecast
● Zoom in to one area based on their expertise
● Ensemble forecasting - create a lot of forecasts, thousands rather than 50 forecasts
● Ai learns how the model moves, trained on the datasets
○ Create snapshots
○ Compare prediction with the real
○ All data driven
What are the challenges of these new methods, which are built on artificial intelligence rather than
on physics-based forecasting?
● Don't take into account extreme values (like climate change)
● Missing data from local data
Key Tasks in ML - Supervised vs. unsupervised learning
Supervised learning task
● T he algorithm is trained on alabeleddataset
● Each input data point is associated with a corresponding output (label)
● Input → output
● Algorithm learns amapping functionfrom input to outputs
● The goal is to make accurate predictions or classifications on unseen data
,
● ttributes are given
A
● Learn fromhistoryexperienceto predict something
● Ex: Classification, given the attributes "has fur", "meows", "likes height", is a cat
● Classification
○ Predict what class an instance of data should fall into
Regression
●
○ The prediction of a numeric value
○ Ex: best-fit line
Unsupervised learning task
A
● lgorithm is trained onunlabeleddata
● we are telling the algorithm what to predict
● The goal is toidentify patterns, structures, or clusterswithin the datawithoutprior knowledge of
output labels
● Discovering the patterns
● Does not have a label
● You need to decide on the groupings
● Trial and error - mentally attach a label
● Does not have a definite answer
● Ex: grouping the cards by different attributes
● Clustering
○ Group similar items together
● Density estimation
○ Finding statistical values that describe the data
● Deducing the data from many features to a small number so that we can visualize it in 2 or 3 D
Standardized processes for developing ML workflows
● Knowledge Discovery in Databases (KDD)
○ End-to-end process that encompasses many individual steps in convert data into knowledge
○ ML is about predictive and prescriptive
○ Collect data, then unify the data in a centralized database
○ Linear step