100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Natural Language Processing - Concise Summary per Lecture

Beoordeling
-
Verkocht
-
Pagina's
46
Geüpload op
30-12-2024
Geschreven in
2022/2023

A more concise summary of the course Natural Language Processing (NLP), MSc AI.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
30 december 2024
Aantal pagina's
46
Geschreven in
2022/2023
Type
Samenvatting

Voorbeeld van de inhoud

Lecture 1
Properties of natural language
● Compositional: the meaning of a sentence is determined by the meanings of its individual words
and the way they are combined/composed
○ The principle of compositionality: meaning of a whole expression (semantics) is a
function of the meaning of its parts and the way they are put together (syntax)
● Arbitrary: there is no inherent or logical relationship between the form of a word or expression and
its meaning
○ There is no inherent reason why the sounds “d”, “o”, “g” arranged in that particular order
should convey the meaning of the four-legged animal
● Creative: one could generate new and meaningful expressions that have not have previously
been encountered or explicitly learned → e.g., Selfie
● Displaced: we could refer to things that are not directly perceivable or present
○ Also referred to displacement

What does an NLP system need to know?
● Language consists of many levels of structure
● Humans fluently integrate all of these in producing and understanding language → ideally, so
would a computer!

Definitions
● Morphology: study of words and their parts or smallest meaningful units
○ i.e., prefixes, suffixes, and base words
● Parts of speech: word classes or grammatical categories like noun, verb, adjective, adverb,
pronoun, preposition, conjunction and interjection
● Syntax: rules that govern the arrangement of words and phrases in a sentence, including rules for
word order, word agreement and the formation of phrases (such as noun phrases, verb phrases
and adjective phrases)
● Semantics: meaning of words, phrases, sentences
● Pragmatics/discourse: analysis of extended stretches of language use, such as conversations,
texts, and narratives in their social/cultural contexts

NLP is concerned with giving computers the ability to understand text and spoken words in much the
same way as humans can
● Combines computational linguistics with statistical, Machine Leaning and Deep Learning models
● Represent language in a way that a computer understands it → representing input
● Process language in way that it is useful for humans → generating output
● Understanding language structure and language use → computational modelling

Why is NLP hard?
1. Ambiguity at many levels
○ Word senses: bank (noun: place to deposit money, verb: to bounce of off something)
○ Parts of speech: chair (noun: seat or person in charge of an organization, verb: act as a
chairperson)
○ Syntactic structure: I saw a man with a telescope (who had the telescope?)
○ Quantifier scope: every child loves some movie
○ Multiple: I saw her duck (saw as in see or the hand tool)
➢ To fix ambiguity:
■ Non-probablistic models → return all possible analyses


1

, ■Probabilistic models → return best possible analyses: only good if probabilities
are accurate
2. Sparse data due to Zipf’s law
○ We have different word counts/frequencies in large corpuses
○ Rank-frequency distribution is an inverse relation → to see what’s really going on, take
logarithm axes




○ Regardless of how large our corpus is, there will be a lot of infrequent (and
zero-frequency) words
3. Variation
○ There are many different languages, dialects, accents
○ People use slang
4. Expressivity
○ Not only can one form have different meanings (ambiguity), but the same meaning can
be expressed with different forms
5. Context dependence
○ Correct interpretation is context dependent
○ Depends on groundedness
6. Unknown representation
○ Difficulty in representing the meaning and structure of language in a way that it can be
understood by machines
7. Natural language is not only written but often spoken and grounded (i.e., our language
understanding is influenced by our sensory and motor experiences in the physical world)
○ We use emojis, we have sign language/dialects

Lecture 2
Human (natural) language
● Language can be manipulated to say infinite things (recursion)
● But the brain is finite → some sort of set of rules
● We can manipulate these rules to say anything (e.g., things that don’t exist of that are totally
abstract)
● There’s a critical period for acquiring language → children need to receive real input
● Language is interconnected with other cognitive abilities

There is structure underlying language
● This is separate from the words we use and the things we say → e.g., we can come up with a
plural for non-existent words (wug → wugs)
● Structure dictates how we use language
● We implicitly know rules about structure → a community of speakers (e.g., Dutch people) share a
roughly consistent consent of these implicit rules
○ All the utterances we can generate from these rules are grammatical
○ But, people don’t fully agree as they have their own idiolect → grammaticality is graded




2

,Subject, Verb, and Object appear in SVO order
● Subject pronouns → I, she, he, they
● Object pronouns → me, her, him, them
● Sentences can be grammatical without meaning
● Not everyone is as strict for some wh- constraints

Why do we even need rules?
● Grammaticality rules accept useless utterances
○ And block out perfectly communicative ones (e.g., me cupcake at)
● A basic fact about language is that we can say anything
○ If we ignore the rules because we know what was probably intended, we are actually
limiting possibilities!
○ Rules give us expressivity

Before self-supervise learning, the way to approach doing NLP was through understanding the human
language system and trying to imitate it
● Now we have language models (LMs) like GPT that catch on to a lot of language patterns →
extract linguistic information
● LLMs often don’t care about word order
● LMs are not engineered around discrete linguistic rules but the pre-training process is not just a
bunch of surface-level memorization → there’s syntactic knowledge but this is complicated
● But there is no ground truth for how language works

Meaning plays a role in linguistic structure
● There’s a lot of rich information in words that affect the final structure of language
● The rich semantics of words is always playing a role in forming and applying the rules of
language

How we train our models these days:




Differential object marking
● Structurally, anything can be an object, but many languages have a special syntactic way of
dealing with this
● LMs are also aware of these gradations

Outlandish meanings are not impossible to express but not all structure-word combinations are possible
● Making less plausible things more prominently is a pervasive feature of grammar




3

, Meaning is not always compositional (in strict sense)
● Language is full of idioms and metaphors/wisdoms
● We’re constantly using constructions that we couldn’t get from just a syntactic + semantic parse
● And even mixed constructions that can compotionally take arguments (e.g., the bigger, the better,
he won’t X, let alone Y)
● Construction grammar provides unique insight into natural LMs
● The meaning of words is sensitive and influenced by context
● Fine-grained lexical semantics in LMs

A big question in NLP: how to strike the balance?
● Language is characterized by the fact that it’s an amazingly abstract system → and we want our
models to capture that
● But, meaning is so rich and multifaceted → high-dimension spaces are much better at capturing
these specificities subtleties than any rules
● Unsolved question: where do deep leaning models stand?
● “While language is full of both broad generalizations and item-specific properties, linguistists have
been dazzled by the quest for general patterns

Lecture 3
Corpora in NLP
● Definition: body of utterances, as words or sentences, assumed to be representative and used for
lexical, grammatical or other linguistic analysis
● Also include metadata-side information about where the language comes from such as author,
date, topic, publication
● Corpora with linguistic annotations → humans marked categories or structures describing their
syntax or meaning

Sentiment analysis: predict sentiment in text → positive/negative/neutral etc.
● Is hard because sentiment is a measure of a person’s private state, which is unobservable
● Sometimes words (e.g., amazing) are a indicator of sentiment but many times it requires deep
world + contextual knowledge
● Text classification: A mapping h from input data x (drawn from instance spance X) to label (or
labels) y from some enumerable output space Y

Why is data-driven evaluation important?
● Good science requires controlled experimentation
● Good engineering requires benchmarks
● Your intuitions about typical inputs are probably wrong
● We also need text corpora to help our systems work well

Annotations
● Supervised learning
● To evaluate and compare sentiment analyzers, we need reviews with gold labels (+ or -)
● These can be either
○ Derived automatically from the original data artifact (metadata such as star ratings)
○ Added by human annotator
■ Issue to consider: how consistent are they?
■ Interannotator agreement (IA)



4

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
tararoopram Vrije Universiteit Amsterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
26
Lid sinds
3 jaar
Aantal volgers
2
Documenten
38
Laatst verkocht
2 maanden geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen