100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1

Puntuación
3.5
(2)
Vendido
10
Páginas
25
Subido en
18-04-2020
Escrito en
2015/2016

Dit is een samenvatting van het eerste deeltentamen van het vak Taaltheorie en Taalverwerking van de Universiteit van Amsterdam. De samenvatting is een mix van stof uit het boek (Jurafsky) en extra stof uit de colleges. De samenvatting is op volgorde van de colleges

Mostrar más Leer menos
Institución
Grado










Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Libro relacionado

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

¿Un libro?
Desconocido
Subido en
18 de abril de 2020
Número de páginas
25
Escrito en
2015/2016
Tipo
Resumen

Temas

Vista previa del contenido

Taaltheorie en Taalverwerking Samenvatting 1
Lecture 1a
Reasons for studying natural language processing (NLP) within AI:
 You want a computer to communicate with users in their terms.
 There is a vast store of information recorded in natural language that can be accessible via
computers: news, government reports, social media etc.
 Three main types of applications:
o Enabling human-computer communication
o Improving human-human communication
o Doing useful processing of text or speech

Formal (FLs) and Natural (NLs) languages:
 Formal (computer) languages: e.g. Java, Prolog, Python, HTML
 Natural (human) languages: e.g. Dutch, English, Spanish, German, Japanese
 ‘Languages’ that represent the behavior of a machine or a system: e.g. think about
‘communicating’ with a vending machine via coin insertions and button presses: pressButton1

Syntax trees:
 Syntax: Describes the structural properties of the language. Natural language is much more
complicated than the formal languages used for the artificial languages of logics and computer
programs.




Natural language semantics:
 Semantics: Represents the meaning of words and sentences. Logic is a good candidate.
 Consider the sentence:
o Every student has access to a computer
 The meaning of this can be expressed as two different logical formulas:
o ∀x.(student(x) ⇒ ∃y.(computer(y) ∧ hasAccessTo(x, y)))
 Every student has access to their own computer
o ∃y.(computer(y) ∧ ∀x.(student(x) ⇒ hasAccessTo(x, y)))
 There is a computer to which every student has access to
 Problem: How can (either of) these formulas be mechanically generated from a syntax tree for
the original sentence?

FLs and NLs:
 There are close relationships between FLs and NLs, but also important differences:
o FLs can be pinned down by a precise definition.
o NLs are fluid, fuzzy at the edges and constantly evolving.
o NLs are riddled with ambiguity at all levels.
 This is normally avoidable in FLs.

,Ambiguity in Natural Language:
 Phonological ambiguity: ‘an ice lolly’ vs. ‘a nice lolly’
 Lexical ambiguity: ‘bass’ has at least two meanings: fish and musical instrument
 Syntactic ambiguity: two possible syntax trees for ‘complaints about referees multiplying
o Could mean:
 There are more and more complaints about referees
 Referees are multiplying and there are complaints about that
 Semantic ambiguity: ‘Please use all available doors while boarding the train’ vs. ‘Please fill all
sections in the questionnaire’
o First sentence: Doesn’t mean that you personally have to use every single door
o Second sentence: Does mean that you have to fill in every single section

Levels of Language Complexity:
 Some languages features are ‘more complex’ (harder to describe, harder to process) than
others. We can classify languages on a scale of complexity known as the Chomsky Hierarchy:
o Regular languages: Those whose phrases can be ‘recognized’ by a finite state machine
o Context-free languages: Many aspects of NLs can be described at this level (also used
for most programming languages)
o Context-sensitive languages: Some NLs involve features at this level of complexity
o Recursively enumerable languages: All languages that can in principle be defined via
mechanical rules.

Formal Languages:
 A formal language is a set of strings
 Each string is composed of symbols from a set called an alphabet (or a vocabulary)
o Examples of alphabets:
 English letters: Σ = {a, b, c . . . z}
 Decimal digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 Programming language ‘tokens’: Σ = {if, while, x, ==}
 ‘Primitive’ actions performed by a machine or system, e.g. a vending machine
Σ = {insert50c, pressButton1, ...}
 Words in (some fragment of) a natural language: Σ = {the, an, a, dog, cat,
sleeps}
o Examples of strings over alphabets:
 Let Σ1 = {0, 1} be an alphabet. Then 01101, 000001, 1101 are strings over Σ1.
— in fact, all binary numbers are strings over Σ1
 Let Σ2 = {a, b, c, d, e, f, g} be an alphabet. Then bee, dad, cabbage, and face
are strings over Σ2, as are fffff and agagag.
 Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but
bee, dad or cabbage are not.
 Let Σ4 = {♠, ©, ♣} be an alphabet. Then ♠♠ and ♣©♣ are strings over Σ4.
 The length of a string is the number of token symbols from the alphabet it contains
o Examples:
 The length of ‘face’ over Σ2 = {a, b, c, d, e, f, g} is 4
 The length of ‘face’ over Σ3 = {ba, ca, fa, ce ,fe, ge} is 2
 The string of length 0 is called the empty string, denoted by ε (epsilon).
 Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the
order in which they occur in s.
 An initial substring is called prefix and a final substring a suffix.

, o Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ε, un, unth, unthinkable
are prefixes, while ε, e, able, thinkable, and unthinkable are suffixes. Other substrings
include nthi, inka, bl.
 Σ* denotes the set of all strings over an alphabet Σ
o Σ is always infinite, regardless of the number of symbols Σ contains.
 We may now define a formal language L over an alphabet Σ as any subset of Σ*: L ⊆ Σ*
o Examples: let Σ = (a, b, c … x, y, z}. Then Σ* is the set of strings over the Latin alphabet
and the following subsets of Σ* are possible formal languages:
 The set of strings consisting of consonants (medeklinker) only
 The set of strings containing at least one vowel (klinker) and one consonant
 The set of strings whose length is less than 9 symbols
 The set {one, two, tree, four, five, six, seven, eight, nine, ten}
 The set of all English words
 The empty set

Ways to define a Formal Language:
 Given an alphabet Σ and the infinite set Σ* of formal languages it can give rise to, how can we
select a particular formal language?
o Direct mathematical definition: Σ = {a, b, c}, L1 = {aa, bb, abc, abcc}, L2 = {a nbn |n > 0}
o Formalisms (formal expressions and grammars): sets of rules
o Automata: computational devices for computing languages
 Specify some machine for testing whether a string is legal or not
 Formalisms and automata allow us to distinguish a formal language of interest (a set of
strings) from other possible languages over a given alphabet
o They capture the patterns that characterize a language
o As such, they act as a definition of the language they capture
 From an abstract point of view, a natural language, like Dutch, is a set of strings
(sounds/letters/words etc.)
 Therefore, formalisms and automata can help us to model aspects of natural languages

Regular expressions:
 Regular expressions are a formal notation for characterizing sets of strings that follow a fairly
simple regular pattern.
 We can construct regular expressions over an alphabet Σ as follows:
(these regular expressions are written in mathematical notation)




o We often ignore the dot in concatenation, and simply write ab
o Disjunction (or union) may be written as a|b a+b or a∪b
o a+ is the set of a-strings with at least one a (same as a*a or aa*)
o an can be used to abbreviate the concatenation of a with itself n times
o the notation Σ* can be seen as abbreviating (a|b|…)* for any symbol a, b, … in Σ
 Examples: Let Σ = {a, b, c … x, y, z}
o me(o)*w mew, meow, meooow, meooooooooooow etc.
$6.01
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada


Documento también disponible en un lote

Reseñas de compradores verificados

Se muestran los 2 comentarios
1 año hace

3 año hace

3.5

2 reseñas

5
0
4
1
3
1
2
0
1
0
Reseñas confiables sobre Stuvia

Todas las reseñas las realizan usuarios reales de Stuvia después de compras verificadas.

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
kimgouweleeuw Universiteit Twente
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
86
Miembro desde
5 año
Número de seguidores
59
Documentos
34
Última venta
11 meses hace

3.7

7 reseñas

5
1
4
3
3
3
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes