Summary

Summary Speech and Language Processing

Name: Speech and Language Processing
SKU: doc_409555
Rating: 3.80 (5 reviews)
Author: KenzaS

Rating

3.8

(5)

Sold

Pages

Uploaded on

25-03-2018

Written in

2017/2018

Summary of Speech and Language Processing for Introduction to Linguistics for Artificial Intelligence. Book written by Daniel Jurafsky and James H. Martin (J & M). Contents: H.2, Regular Expressions and Automata: 1, 2, 3 H.3, Words and Transducers: 1, 2, 3, 4, 5, 6, 7 H.7, Phonetics: 1, 2, 3, 4 , 5, 6 H.12, Formal Grammars of English: 1, 2, 3, 5, 6 H.13, Syntactic Parsing: 1, 2, 3, 4 H.14, Statistical Parsing: 1, 2 H.15, Features and Unifications: 1, 2, 3 H.16, Language and Complexity: 1, 2, 3, 4 H.17, The Representation of Meaning: 1, 2, 3, 4 H.18, Computational Semantics: 1, 2 , 3 The content of this summary corresponds to the material to be learned for the exam of Introduction to Linguistics on 29 March 2018 of the Artificial Intelligence course at Utrecht University.

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

Dan Jurafsky, James H. Martin Speech and Language Processing

Edition:2008-04-29
ISBN:9780135041963
Edition:2

Written for

Institution: Universiteit Utrecht (UU)
Study: Kunstmatige Intelligentie
Course: Inleiding Taalkunde

All documents for this subject (2)

Document information

Summarized whole book?: No
Which chapters are summarized?: H.2, 3, 7, 12, 13, 14, 15, 16, 17, 18
Uploaded on: March 25, 2018
Number of pages: 31
Written in: 2017/2018
Type: Summary

Subjects

inleiding taalkunde
taalkunde
kunstmatige intelligentie
speech
language
processing
regular expressions
automata
words
transducers
phonetics
formal grammars
formal grammar
syntactic pars
speech and language processing
daniel jurafsky en james h martin
jm

Content preview

Speech and Language Processing
REGULIERE EXPRESSIES

Chapter 2 - Regular Expressions and Automata

2.1 Regular Expressions
A regular expression is a formula in a special language that specifies simple classes of
strings. A string is a sequence of symbols. Regular expression search requires a pattern
that we want to search for and a corpus of texts to search through. When we give a
search pattern, we assume that the search engine returns the line of the document
returned.

2.1.1 Basic Regular Expression Patterns
// Begin and end of a regular expression. REs are case sensitive.
[ ] Specifies a disjunction of characters to match
- Range of characters, e.g. [A-Z]
^ What a character cannot be, when in the brackets, e.g. [^a]
? The preceding character or nothing
* The Kleene *, zero or more occurrences of the immediately previous character or
regular expression
+ The Kleene +, one or more of the previous character
. Wildcard, any single character (except a carriage return)
/.*/ Any string of characters

Anchors are special characters that anchor regular expressions to particular places in a
string:
^ /^The/ means it will only match The at the start of a line
$ Matches the end of a line
\b Matches a word boundary
\B Matches a non-boundary

2.1.2 Disjunction, Grouping, and Precedence
| Disjunction, also called pipe symbol
() Causes precedence

Operator precedence hierarchy:
Parenthesis ()
Counters +*?{}
Sequences and anchors the ^my end$
Disjunction |

We say patterns are greedy, expanding to cover as much of a string as they can.

2.1.3
False positives: strings that we incorrectly matched
False negatives: strings that we incorrectly missed

2.1.5 Advanced Operators
{3} Exactly 3 occurrences of the previous character or expression

,{n, m} from n to m occurrences of the previous char or expression
{n, } at least n occurrences of the previous char or expression
\n newline
\t tab
For the special characters, use \ to use them as themselves.

2.1.6 Regular Expression Substitution, Memory and ELIZA
An importent use of regular expressions is in substitutions.
<\1> Number operator to refer back.To refer to the integer we’ve found so that we can
easily add the brackets.
Registers: numbered memories. This memory feature is not part of every regular
expression language and is often considered an “extended” feature of regular
expressions.

FORMELE TALEN I

2.2 Finite-State Automata
A RE is one way of describing a finite-state automation (FSA). Second, a RE is one way
of characterising a particular kind of formal language called a regular language.

2.2.1 Use of an FSA to Recognise Sheeptalk
An automaton for modelling a RE, also called FSA, recognises a set of strings. The
automaton has diﬀerent states, which are represented by nodes. State 0 is the start
state. The last state is the final state or accepting state, which we represent by the
double circle. It also has transitions, which we represent by arcs in the graph. Think of
the input as being written on a long tape broken up into cells, with one symbol written in
each cell of the tape. If some input doesn’t match an arc, or if it just happens to get stuck
in some non-final state, we say the machine rejects or fails to accept an input. We can
also represent an automaton with a state-transition table.
A finite automaton is defined by the following 5 parameters:

Q = q0, q1, A finite set of N states
q2….qN-1

Σ A finite input alphabet of symbols

q0 the start state

F the set of final states, F Q

δ(q,i) the transition function or transition matrix between states. Given a state q
relation from Q x Σ to Q

A deterministic algorithm is one that has no choice points; the algorithm always knows
what to do for any input. We can think of “empty” elements in the table as if they all
pointed at one “ empty” state, which we might call the fail state or sink state.

,2.2.2 Formal Languages
Formal language: a model that van both generate and recognise all and only the strings of
a formal language acts as a definition of the formal language. A formal language is a set
of strings, each string composed of symbols from a finite set called an alphabet. Formal
languages are not the same as natural languages, which are the languages that real
people speak. The term generative grammar is sometimes used in linguistics to mean a
grammar of a formal language.

2.2.4 Non-Deterministic FSAs
Automata with decision points are called non-deterministic FSAs (NFSAs). A
deterministic automation is one whose behaviour during recognition is fully determined
by the state it is in and the symbol it is looking at (DFSA). Another common type of non-
determinism is one caused by arcs that have no symbols on them, called ε-transitions.

2.3 Regular Languages and FSAs
The class if languages that are definable by regular expressions is exactly the same as the
class of languages that are characterisable by finite-state automata. We therefore call
these languages the regular languages. The class of regular languages (or regular sets)
over Σ is formally defined as follows:

1. ∅ is a regular language
2. a Σ ε, {a} is a regular language
3. If L1 and L2 are regular languages, then so are:
1. L1 * L2 = {xy | x L1, y L2}, the concatenation of L1 and L2
2. L1 L2, the union or disjunction of L1 and L2
3. L1*, the Kleene closure of L1

Any regular expression can be turned into a (perhaps larger) expression that only makes
use of the three primitive operations.
REs are also closed under the following operations:
Intersection: if L1 and L2 are regular languages, then so is L1 L2, the language
consisting of the set of strings that are in both L1 and L2
Difference: If L1 and L2 are regular languages, then so is L1 - L2, the language consisting
of the set of strings that are in L1 but not L2
Complementation: if L1 is a regular language, then so is Σ* - L1, the set of all possible
strings that aren’t in L1 (Σ* means the infinite set of all possible strings formed from the
alphabet Σ)
Reversal: If L1 is a regular language, then so is L1R, the language consisting of the set of
reversals of all the strings in L1

Showing that an automaton can be built for each regular language and, conversely, that
a regular language can be built for each automaton. For the inductive step, we show that
each of the primitive operations of a regular expression can be imitated by an automaton:

Concatenation: we string two FSAs next to each other by connecting all the final
states of FSA1 to the initial state of FSA2 by an ε-transition.
Closure: We create a new final and initial state, connect the original final states of
the FSA back to the initial states by ε-transitions, and then put direct links between
the new initial and final states by ε-transitions.

, Union: We add a single new initial state q’0, and add new ε-transitions from it to the
former initial states of the two machines to be joined.

MORFOLOGIE

Chapter 3 - Words and Transducers

3.1 Survey of (Mostly) English Morphology
Morphology is the study of the way words are built up from smaller meaning-bearing units,
morphemes. A morpheme is often defined as the minimal meaning-bearing unit in a
language. It is useful to distinguish two broad classes of morphemes: stems and affixes.
The stem is the “main” morpheme of the word, supplying the main meaning, and the
affixes add “additional” meanings of various kinds. Affixes are further divided into prefixes,
suffixes, infixes, and circumfixes. Prefixes precede the stem, suffixes follow the
stem, circumfixes do both, and infixes are inserted inside the stem. A word can have more
than one affix. Languages that tend to string affixes together, as Turkish does, are called
agglutinative languages.
There are many ways to combine morphemes to create words:

Inflection is the combination of a word stem with a grammatical morpheme, usually
resulting in a word of the same class as the original stem and usually filling some
syntactic function like agreement.
Derivation is the combination of a word stem with a grammatical morpheme, usually
resulting in a word of a different class, often with a meaning hard to predict exactly.
Compounding is the combination of multiple word stems together
Cliticization is the combination of a word stem with a clitic. A clitic is a morpheme
that acts syntactically like a word but is reduced in form and attached to another
word.

3.1.1 Inflectional Morphology
English nouns have only two kinds of inflection: an affix that marks plural and an affix that
marks possessive. English has three kinds of verbs; main verbs (eat, sleep), modal
verbs (can, will), and primary verbs (be, have, do). Of the main and primary verbs (these
have inflectional endings), a large class are regular, that is, all verbs of this class have the
same endings marking the same functions. Regular verbs and forms are significant in the
morphology of English: first, because they cover a majority of the verbs; and second,
because the regular class is productive. A productive class is one that automatically
includes any new words that enter the language. The irregular verbs are those that have
some more or less idiosyncratic forms of inflection. Note that an irregular verb can inflect in
the past form (also called preterite) by changing its vowel (eat/ate), its vowel and some
consonants (catch/caught), or with no change at all (cut/cut).
The -ing participle is used in the progressive construction to mark present or
ongoing activity or when the vern is treated as a noun; this latter kind of nominal use of a
verb is called a gerund. The -ed/-en participle is used in the perfect construction or the
passive construction. A number of regular spelling changes occur at these morpheme
boundaries (beg/begging/begged).

3.1.2 Derivational Morphology
A common end of derivation in English is the formation of new nouns, often from verbs or
adjectives. This process is called nominalization (computerize —> computerization).

$6.70

Get access to the full document:

Purchased by 24 students

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

KenzaS

4.0

(46)

Reviews from verified buyers

Showing all 5 reviews

ramtinbashardoost80 Kunstmatige Intelligentie · 8 reviews

1 year ago

pieterj1 Informatiekunde · 3 reviews

4 year ago

karmacharyaanjan Masters in Information Technology · 1 review

5 year ago

jaapstefels Kunstmatige intelligentie · 7 reviews

6 year ago

jtenpas Kunstmatige Intelligentie · 102 reviews

6 year ago

3.8

5 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

KenzaS Universiteit Utrecht

View profile

Sold

203

Member since

9 year

Number of followers

128

Documents

Last sold

3 weeks ago

4.0

46 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller KenzaS. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.70. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 57595 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Speech and Language Processing

Connected book

Written for

Document information

Subjects

Content preview

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?