Summary

Samenvatting les 3 t.e.m. de laatste les: Inleiding tot de Digitale Tekstanalyse

Rating

Sold

Pages

Uploaded on

10-12-2025

Written in

2025/2026

Samenvatting les 3 tem de laatste les

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

Veronique Hoste, Cynthia Van Hee Taaltechnologie ontrafeld

Edition:13 mei 2024
ISBN:9789463106221
Edition:1

Written for

Institution: Universiteit Gent (UGent)
Study: Toegepaste Taalkunde: Combinatie Van Ten Minste Twee Talen
Course: Inleiding tot de digitale tekstanalyse

All documents for this subject (2)

Document information

Summarized whole book?: No
Which chapters are summarized?: Les 3 tem laatste les
Uploaded on: December 10, 2025
Number of pages: 94
Written in: 2025/2026
Type: Summary

Subjects

sentimentanalyse
tekstverwerking
robots
ai

Content preview

Les 3: tekstcollecties analyseren en visualiseren

Concordanties

Concordantie = woordgebruik in context

Distant reading → automatische analyse van tekst

• Grote volume tekst analyseren
• bijv. computationele literatuurwetenschappen (computational literary studies)

Populaire methodes tekstanalyse:

• Concordanties: woordgebruik in context
• Topic modeling: identificatie van belangrijke thema’s/topics
• Stylometrie: statistische analyse / bestuderen van schrijfstijl (lexicale en
syntactische informatie)

Stylometrie: toepassingen

- Bepalen van auteurschap (authorship attribution)
- Forensische taalkunde

Autorship attribution

→ Bepalen op basis van lexicale en syntactische kenmerken wie de auteur van een
tekst is

 Bv: frequenties van functiewoorden, PoS-patronen, frequente opeenvolging van
woorden

Bekende voorbeelden:

• Authenticiteit van werken van Shakespeare (Frantzeskou et al., 2006)

, • JK Rawling / Robert Calbraith (pseudoniem)

Forensische taalkunde

➔ Alles wat met taal en wet/strafrecht te maken heeft
• Forensische fonetiek: identificeren van een spreker met spraak- en stemanalyse
• Bestuderen van juridische en strafrechtelijk interessante teksten, bijv.
dreigbrieven, chantagebrieven, plagiaat, bekentenissen, testamenten, … waarvan
de precieze auteur niet bekend is.

Doel: achterhalen van de auteur van anonieme teksten (analyse van handschrift,
schrijfstijl)

Bekend voorbeeld: Unabomber case

- 16 bommen (verschillende locaties in Amerika)
- Manifest van de dader (35,000 woorden) in krant
- Broer herkent schrijfstijl en belt de politie

Frequenties van woorden
Woordfrequentie analyseren om te onderzoeken hoe belangrijk woorden zijn in een
corpus

Frequentie = aantal keer dat woord voorkomt in corpus

➢ absolute frequentie /vs/

Relatieve frequentie = frequentie / aantal woorden in corpus

=> Belang van relatieve frequentie: frequenties vergelijken in corpora van
verschillende groottes

Frequentielijst = gesorteerde woordenlijst op basis van frequentie in het corpus

Voorbeelden:

2 corpora op basis van vrij beschikbare boeken (via Project Gutenberg:
https://www.gutenberg.org/) + sample van Vlaams boek:

1. Shakespeare – “Romeo en Julia” (vertaling)

, 2. Multatuli - ”Max Havelaar”

3. Sample van Lize Spit – “Het smelt”

Corpus Aantal woorden

Shakespeare – “Romeo en Julia” (vertaling) 31,817

Multatuli - ”Max Havelaar” 120,824

Sample van Lize Spit – “Het smelt” 1,005

, Frequentielijst “Het smelt”: zelfstandige naamwoorden

• Meest frequente woorden per woordsoort, gesorteerd op dalende frequentie
• Distributie berekenen per woordsoort:
o 206 zelfstandige naamwoorden / totaal 1005
woorden = 20%

Type-token ratio (TTR)

• Tokens = totale aantal woorden in een tekst
• Types = totaal aantal unieke woorden in een tekst
o Type-token ratio =
o #types / #tokens

$11.44

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

margotmarchal

Get to know the seller

margotmarchal Universiteit Gent

View profile

Sold

Member since

1 week

Number of followers

Documents

Last sold

3 days ago

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller margotmarchal. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $11.44. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 46153 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now