Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Summary Computational Linguistics Final

Note
-
Vendu
5
Pages
58
Publié le
06-06-2022
Écrit en
2021/2022

Extensive summary of the second part of the course Computational Linguistics (lecture 8-12). Also possible to buy both summaries for the midterm and final in a bundle (cheaper :))!

Établissement
Cours











Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

École, étude et sujet

Établissement
Cours
Cours

Infos sur le Document

Publié le
6 juin 2022
Nombre de pages
58
Écrit en
2021/2022
Type
Resume

Sujets

Aperçu du contenu

Computational Linguistics (800962-B-6)
2021/2022
Summary Final
Lecture 8-12
Written by Saskia Kriege

,Lecture 8
Computational Linguistics Lecture 8 – Parsing

YOUTUBE – PCFGs
Probabilities assigned to rules
Any non-terminal has a number of possible expansions, of which the probabilities sum to 1.
Conditional probability of condition of particular non-terminal.
Prob of entire tree = Product of probabilities of different rules

Non-terminal → sequence of non-terminals / one terminal


PARSING
Probabilities, chunks, dependencies

PCFGs = probabilistic context-free grammar
Not only whether sentence is well-formed under the grammar, also delivers likeliest parse for
target sentence given the grammar
Declarative to
Given sequence of words, we want to find the best schematization of grammar
Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to
each parse tree.
Resolving syntactic ambiguity
Differs from CFG by augmenting each rule with conditional probability

Structural Ambiguity
When more than one parse tree can be assigned to a sentence based on the rules in the
grammar.
Attachment ambiguity and coordination ambiguity

Attachment ambiguity: (what do we attach it to?)
Shot an elephant in my pyjamas →
I = NP
Shot an elephant
What does in my pyjamas attach to? Subject I, or an elephant was in my pyjamas when I shot
it
Whole = VP, with elephant as direct object and prepositional phrase modifying this?
We need semantics/lexical info to disambiguate this (meaning of elephant)
Or are those structures more likely in the language? Then we don’t need meaning, only
likelihood of structure in a language

Coordination ambiguity: and (what does coordination entail?)
Tasty sandwiches and flowers make my grandma happy
To what attaches tasty?
Are sandwiches and flowers tasty?
Or only sandwiches → 2 NP coordinated then
Only considering structure, both are possible
Flowers are not tasty, we bring information to disambiguate and prefer the attachment to
sandwiches only.

,More structures can be assigned to one language by the same grammar

CFG → don’t consider meaning, only structural coherence

CKY to use whether it is correct under this grammar
If we want to pick a specific parse, we need more
Level of structure and get likeliest parse structure, we need some way to assign probabilities
to different options and pick most likely one


Parsing as Recognition
CKY only tells if a sentence is well-formed given grammar, not which parse best captures
sentence structure
Sometimes only one parse possible, but how do we choose if two or more are valid?
PCFGs, Which parse best captures sentence structure

Resolving Ambiguity
Rules in CFGs can be augmented with probs indicating how likely RHS is given LHS in order
to compute likelihood of each parse tree given grammar

Probs to different production rules, and use to assign to parse tree

Probabilistic CFG, the CFGs whose rules augmented with probabilities

Extending CFG to PCFG
In CNF/CFG, each rule form X → YZ or X → y
In PCFG in CNF, each rule augmented with prob indicating how likely RHs is given LHS:

X → YZ [ p(YZ | X ) ] or X → y [p(y|X)]

Difference CFG and PCFG:
CFG has the rules. A list of rules that define the set of all well-formed sentences in a
language
PCFG also adds a probability of how likely a RHS is given LHS

Prob of parse tree T
Product of probs of each rule used to expand non-terminals in the parse tree T
Given rules LHS → RHS, prob of any parse tree over a sentence is defined as




Probs can be * bc independent of context and independent of each other (bag of words
assumption and conditional independence)

Pick one with highest probability

Joint prob of parse tree and sentence
Loop through all parse trees, multiply prob of RHS and LHS

, Simplify a bit
p(Ti, s) represents:
- Joint prob of sentence s and parse tree i
- Joint prob of parse tree i, since
p(Ti, s) = p(s|Ti)p(Ti)
boils down to
p(Ti,s) = p(Ti)
as p(s|Ti) is 1, because a parse tree includes all words in s

p(s|Ti) is a constant, 1 so can be left out of equation

Argmax
Out of all parse trees that can parse s (s is called the yield of a parse tree T), the
disambiguation algorithm picks the best parse tree T^ with highest prob given s

With T such that yield(T)=S, since we don’t care about parse trees with a different yield than
the sentence we’re parsing




In estimation, we kick out the sentence. Because we don’t care about parse trees with different
yields. Wont produce meaningful results

Prob of a sentence
PCFGs assign probabilities to strings of words, i.e. sentences, just like language model would
If sentence is unambiguous, its probability under the PCFG is the probability of its only parse
tree with yield s

If s is ambiguous, then its probability is the sum over the probs of all parse trees with yield s
Forward algorithm instead of decoding algorithm
Best option/all options put together
Prob of certain sentence → consider every interpretation
Best one → compare those and get one with highest prob

Cognitive Science → studies shown that people are sensitive to kind of probs across
constituents that PCFG captures, but not Markov chain → garden path sentences
The horse raced passed the bar \and fell → fell doesn’t fit in parse tree
Parse tree has to be adjusted while analysing the sentence
The student forgot the solution was on the back of the book
The student forgot the solution is perfect sentence
But not over, so have to update structure you inferred for the sentence

People form structural relationships, if violated, processing costs

AI → conversational agent that’s human-like, costs should be incorporated
If better than humans, don’t incorporate such costs

Ngram stats with language modelling
€6,99
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
saskiakriege Tilburg University
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
74
Membre depuis
7 année
Nombre de followers
38
Documents
19
Dernière vente
2 semaines de cela

4,7

7 revues

5
5
4
2
3
0
2
0
1
0

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions