Machine learning:
- What is machine learning:
- Machine learning (ML) is a branch of AI that enables computers to learn from data and
improve their performance on tasks over time without being explicitly programmed for
each task
Learning from examples:
- Data ⇒ collect examples of spam and non-spam emails
- Learning algorithm ⇒ learns patterns and rules from the data
- Evaluation ⇒ tests the model on new emails and compares results with known labels
- Deployment ⇒ deploy the trained system for real-world use
Learning algorithms:
- Tree-based: decision trees, random forests, gradient boosted trees
- Linear classifiers and regressors: perceptron, linear and logistic regression
- Neural networks: multi-layer perceptrons, deep learning
Typical ML applications:
- Flag suspicious credit-card transactions
- Guess person’s age based on a sample of writing
- Recognize handwritten numbers and letters
- Determine whether a text expresses positive, negative or no opinion
- Recognize faces in photos
- Classify images
- Recommend books and movies to users based on their own and others’ purchase history
Machine learning:
Supervised learning:
- Learns from pre-labeled examples
,Unsupervised learning:
- No labels, discovers natural grouping and relationships
- We have data, we need to explore patterns to see if they are meaningful
- For example, customer segmentation, which customer has baby, can recommend
different things to them
Reinforcement learning:
- We have an agent, which learns through action in the environment
- Learns by trial and error, we train models without having explicit feedback; each agent will learn
or will adapt to some parameters
- Learning rate alpha
- Inverse temperature beta
- Discount rate y
- Learns from interaction and feedback
Supervised learning; regression:
- Target is a real number, not classes
- Housing price prediction
- Predict a person’s age
- Predict price of a stock
- Predict student’s score on exam
,Binary classification:
- One of two options:
- Detect spam
- Predict polarity of product review: positive vs negative
- Predict sex of an organism (male/ female)
Multiclass classification:
- One of finite set of options
Multilabel classification:
- Multiple labels from a set of options
Ranking:
- Ranking problem, our observations can have different ranks for different queries/ prompts
- For example, the grist result for a certain query can be different if you change the query a
little bit
- An ordering of objects
Sequence labeling:
- A label for each element in input sequence
- System that recognizes a word in a sequence
- Sequence labeling = we have a sequence for each part of this sequence so we can have a label
Sequence-to-sequence prediction:
- For each input we have an output
- For example, translator, we have different sequences, input and output don’t have the same
sequence
- Problems:
- Order of words change
- Grammar
- Different lengths input and output
Model:
- Variable x ⇒ model ⇒ variable y
- Input: feature, independent variable
- Output: target, dependent variable
- Model:
- Linear regression
- Logistic regression
- Perceptron
- Decision trees, random forests
- Gradient boosted trees
- Artificial neural networks (ANN)
Data:
- Data can be:
- Numbers
- Text
- Images
- Any form of input that can be processed by algorithm
, Data splitting:
- Create model, train model, and evaluate model
- Split data, make sure it is split randomly, then divide it into test and train
- Make sure there is no data leakage, so no sample from test data in training data ⇒
will result in false evaluation