100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Natural Language Processing - Summary Slides

Rating
-
Sold
1
Pages
87
Uploaded on
30-12-2024
Written in
2022/2023

A summary of all the slides for the course Natural Language Processing, MSc AI.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
December 30, 2024
Number of pages
87
Written in
2022/2023
Type
Summary

Subjects

Content preview

Natural Language Processing - slides


Lecture 1
Natural Language
● Natural language is quite complex
● Yet, children acquire language easily and are able to understand and produce utterances
they have never heard (Poverty of the Stimulus)
● How is this possible?
○ Language is innate?
○ Learn through imitation?
○ Learn through interaction?
○ Language just like any cognitive faculty, but with more input?

Language Acquisition




Natural language is:
● Compositional
○ The meaning of a sentence is determined by the meanings of its individual words
and the way they are combined/composed
■ "The cat is on the mat." → the animal referred to as a "cat" is located "on"
top of the object referred to as a "mat."
● Arbitrary
○ There is no inherent or logical relationship between the form of a word or
expression and its meaning
■ "Dog" → refers to the domesticated four-legged animal we commonly
associate with the word "dog," but there is no inherent reason why the
sounds "d", "o", and "g" arranged in that particular order should convey
that meaning
● Creative
○ Ability of speakers of a natural language to generate new and meaningful
expressions that may not have been previously encountered or explicitly learned
■ “Selfie”
● Displaced
○ Ability of speakers to refer to things that are not directly perceivable or present
■ "Yesterday, I went to the store and bought some groceries."




1

,Natural Language Processing - slides


What does an NLP system need to know?
● Language consists of many levels of structure.
● Humans fluently integrate all of these in producing and understanding language.
● Ideally, so would a computer!




● Morphology
○ Study of words and their parts or smallest meaningful units of meaning
■ prefixes, suffixes and base words
● Parts of speech
○ Word classes or grammatical categories such as noun, verb, adjective, adverb,
pronoun, preposition, conjunction, interjection
● Syntax
○ Rules that govern the arrangement of words and phrases in a sentence, including
rules for word order, word agreement (e.g., subject-verb agreement), and the
formation of phrases (such as noun phrases, verb phrases, and adjective
phrases)
● Semantics
○ Meaning of words, phrases, sentences
● Pragmatics/discourse
○ Analysis of extended stretches of language use, such as conversations, texts,
and narratives, in their social and cultural contexts

What is Natural Language Processing?
● Core technologies:
○ Language modeling / text generation
○ Sequence / POS tagging
○ Syntactic parsing
○ Named Entity Recognition (NER)
○ Coreference resolution
○ Word disambiguation
○ Semantic role labeling



2

,Natural Language Processing - slides


● Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers
the ability to understand text and spoken words in much the same way human beings
can.
● NLP combines computational linguistics—rule-based modeling of human language—with
statistical, machine learning, and deep learning models. Together, these technologies
enable computers to process human language in the form of text or voice data and to
‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.

What is Natural Language Processing?
● Represent language in a way that a computer can process it
○ representing input
● Process language in a way that is useful for humans
○ generating output
● Understanding language structure and language use
○ computational modelling

Why is NLP hard?
1. Ambiguity
2. Sparse data due to Zipf’s Law
3. Variation
4. Expressivity
5. Context dependence
6. Unknown representation

Ambiguity at many levels
● Word senses: bank (noun: place where people deposit money or verb: to bounce off of
something)
● Part of speech: chair (noun: seat, person in charge of an organization or verb: act as
chairperson)
● Syntactic structure: I saw a man with a telescope (either I had a telescope or the man)
● Quantifier scope: Every child loves some movie (every child loves at least one movie or
every child loves one particular movie)
● Multiple: I saw her duck (saw as in see or a hand tool)
● How can we model ambiguity, and choose the correct analysis in context?

What can we do about ambiguity?
● Non-probabilistic methods (FSMs for morphology, CKY parsers for syntax)
○ Return all possible analyses
● Probabilistic models (HMMs for POS tagging, PCFGs for syntax) and algorithms (Viterbi,
probabilistic CKY)
○ Return the best possible analysis
● But the “best” analysis is only good if our probabilities are accurate. Where do they come
from?


3

, Natural Language Processing - slides



Statistical NLP
● Like most other parts of AI, NLP is dominated by statistical methods
○ Typically more robust than ealier rule-based methods
○ Relevant statistics/probablities are learned from data
○ Normally requires lots of data about any particular phenomenon

Sparse data due to Zipf’s Law
● We have different word counts/ frequencies in large text corpuses




○ Takeaway: Rank-frequency distribution is an inverse relation
■ To really see what’s going on, use logarithmic axes:




● Assume “word” is a string of letters separated by spaces (a great oversimplification…)
● Zipf’s law
○ Summarizes the behaviour above




○ Implications
■ Regardless of how large our corpus is, there will be a lot of infrequent
(and zero-frequency) words
■ In fact, the same holds for many other levels of linguistic structure (e.g.,
syntactic rules in a CFG)
■ This means we need to find clever ways to estimate probabilities for
things we have rarely or never seen

Variation



4
$18.79
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
tararoopram Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
26
Member since
3 year
Number of followers
2
Documents
38
Last sold
2 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions