Cheat Sheet NLG
Strategic choices: what to say (based on input, knowledge, target language)
Tactical choices: how to say it (dependent on language)
Classic pipeline for NLG and its subtasks (p3)
Modular architecture breaks down the main task into sub-tasks, modelling each one separately.
Dominant and classical approach.
In end-to-end models: no/fewer explicit subtasks.
Raw, unstructured data before document planning, add (1) signal analysis - extract patterns
and trends – and (2) data interpretation
“Classic” BabyTalk system(p5)
Classic systems vs Contemporary models (p7): tension between control and fluency
NLG subtasks in more detail – image to caption (p7)
NLG as (cognitive) modelling of language (p10)
Production errors
Syntax errors spreading activation; human memory is associative
Levelt’s “blueprint” for the speaker (p11)
Modularity (p16) and incrementality (p17)
Relationship between blueprint and classic NLG pipeline (p15)
Conceptual preparation and role of self-perception
How do people identify objects using language? (p17)
Referring expression is contextually available
REG more intertwined approach to what to say + how to say it
1
, REG algorithms and the Gricean maxims of conversation (p18)
Referential form depends on salience of discourse entities and context; salience depends on
o Centering theory (syntactic role)
o Accessibility theory (accessibility/ availability of entity; more = shorter form)
Older models were deterministic, ML models for human variation
Are REG algorithms cognitively plausible? (p19)
Cooperative principle; People behave rationally
Default expectations not fulfilled implicatures (hidden meaning, not explicitly stated)
Conversational maxims: Quantity, Quality, Relation, Manner
Choosing the content of definite descriptions (p20)
Greedy algorithm: most discriminatory property
Incremental algorithm: what a speaker would be likely to select, using preference order
o + efficient, psycho plausible, accounts for overspecification
o – deterministic; people are not (PRO: use sth fully discriminatory, else preference)
High scene variation high eagerness to over specify
Alignment of data and text to train NLG systems (p25)
Source pairs from web = loosely aligned
Automatic alignment = more tightly aligned; noisy
Crowdsource = tighter aligned; expensive, smaller datasets
Opportunistic data collection favours better represented languages
Data-driven content selection: Learning statistical models to decide what to say ( p26)
Content selection as classification problem, but: facts have dependencies & poor coherence
Collective content selection: consider individual preference + probability of linked facts
optimisation
Using Language Models to decide how to say it (p28)
N-gram models look at limited no words before predicting next word
Markov models only look at immediate past state (previous word as only context)
Long-distance dependencies: challenge for classical LMs
Overgenerate-and-rank + capture variation & handles probabilistic linguistic rules -
ambiguity
o HALOGEN Input: recursive, order-independent, contains grammatical and/or
semantic elements. Recasting helps convert between different representations within it
o HALogen Base Generator rules: recasting, ordering, filling, morphing
o Output: forest of trees represents all possible realisations, ranked using a pretrained LM
Rational Speech Act (RSA) model (p34)
Cooperative language use; utility-based reasoning
Pragmatic inference; iterative process
Pragmatic speaker chooses utterance based on expected utility utility = surprisal – cost
(speakers’ effort to avoid ambiguity)
Utility-based reasoning: informative but not overly verbose (= greedy algorithm)
Combining computer vision and NLG: Reference in the ReferIt Game ( p38)
Model calculates correct colour for an obj in a scene by analysing colour histograms
Short introduction to Feedforward neural networks (p41)
Type of NN that accepts a fixed-size input and compute a predicted value
2
Strategic choices: what to say (based on input, knowledge, target language)
Tactical choices: how to say it (dependent on language)
Classic pipeline for NLG and its subtasks (p3)
Modular architecture breaks down the main task into sub-tasks, modelling each one separately.
Dominant and classical approach.
In end-to-end models: no/fewer explicit subtasks.
Raw, unstructured data before document planning, add (1) signal analysis - extract patterns
and trends – and (2) data interpretation
“Classic” BabyTalk system(p5)
Classic systems vs Contemporary models (p7): tension between control and fluency
NLG subtasks in more detail – image to caption (p7)
NLG as (cognitive) modelling of language (p10)
Production errors
Syntax errors spreading activation; human memory is associative
Levelt’s “blueprint” for the speaker (p11)
Modularity (p16) and incrementality (p17)
Relationship between blueprint and classic NLG pipeline (p15)
Conceptual preparation and role of self-perception
How do people identify objects using language? (p17)
Referring expression is contextually available
REG more intertwined approach to what to say + how to say it
1
, REG algorithms and the Gricean maxims of conversation (p18)
Referential form depends on salience of discourse entities and context; salience depends on
o Centering theory (syntactic role)
o Accessibility theory (accessibility/ availability of entity; more = shorter form)
Older models were deterministic, ML models for human variation
Are REG algorithms cognitively plausible? (p19)
Cooperative principle; People behave rationally
Default expectations not fulfilled implicatures (hidden meaning, not explicitly stated)
Conversational maxims: Quantity, Quality, Relation, Manner
Choosing the content of definite descriptions (p20)
Greedy algorithm: most discriminatory property
Incremental algorithm: what a speaker would be likely to select, using preference order
o + efficient, psycho plausible, accounts for overspecification
o – deterministic; people are not (PRO: use sth fully discriminatory, else preference)
High scene variation high eagerness to over specify
Alignment of data and text to train NLG systems (p25)
Source pairs from web = loosely aligned
Automatic alignment = more tightly aligned; noisy
Crowdsource = tighter aligned; expensive, smaller datasets
Opportunistic data collection favours better represented languages
Data-driven content selection: Learning statistical models to decide what to say ( p26)
Content selection as classification problem, but: facts have dependencies & poor coherence
Collective content selection: consider individual preference + probability of linked facts
optimisation
Using Language Models to decide how to say it (p28)
N-gram models look at limited no words before predicting next word
Markov models only look at immediate past state (previous word as only context)
Long-distance dependencies: challenge for classical LMs
Overgenerate-and-rank + capture variation & handles probabilistic linguistic rules -
ambiguity
o HALOGEN Input: recursive, order-independent, contains grammatical and/or
semantic elements. Recasting helps convert between different representations within it
o HALogen Base Generator rules: recasting, ordering, filling, morphing
o Output: forest of trees represents all possible realisations, ranked using a pretrained LM
Rational Speech Act (RSA) model (p34)
Cooperative language use; utility-based reasoning
Pragmatic inference; iterative process
Pragmatic speaker chooses utterance based on expected utility utility = surprisal – cost
(speakers’ effort to avoid ambiguity)
Utility-based reasoning: informative but not overly verbose (= greedy algorithm)
Combining computer vision and NLG: Reference in the ReferIt Game ( p38)
Model calculates correct colour for an obj in a scene by analysing colour histograms
Short introduction to Feedforward neural networks (p41)
Type of NN that accepts a fixed-size input and compute a predicted value
2