Computational Social Science (CADC)
Full summary 2023-2024
VU Amsterdam
Master communication science
1
,Table of contents
Week 1: Introduction to computational methods in communication science .................................................. 3
Lecture 1 ............................................................................................................................................................. 3
Required readings ............................................................................................................................................. 11
Week 2: Automated Text Analysis and Dictionary Approaches ...................................................................... 14
Lecture 2 ........................................................................................................................................................... 14
Required readings ............................................................................................................................................. 30
Week 3: Text classification using classic machine learning ............................................................................ 37
Lecture 3 ........................................................................................................................................................... 37
Required readings ............................................................................................................................................. 73
Week 4: Word embeddings, transformers and large language models .......................................................... 82
Lecture 4 ........................................................................................................................................................... 82
Required readings ........................................................................................................................................... 105
2
,Week 1: Introduction to computational methods in communication
science
Lecture 1
Field of social science that uses algorithmic tools and large/unstructured data to understand
human and social behavior
Complements rather than replaces traditional methodologies: Methods are not the goal, but
contribute to data generation
Includes methods such as, e.g.,:
- Data mining (e.g., scraping and gathering of large data sets)
- Software development for social science experiments
- Automated text analysis (e.g., sentiment analysis, keyword extraction, dictionary
approaches)
- Image classification (e.g., face recognition, visual topic modeling)
- Machine learning approaches (e.g., for classification, prediction, topic modeling)
- Actor-based modeling (e.g., simulation of social behavior, spreading of information)
Why is this important now?
- Vast amounts of digitally available data, ranging from social media messages and
other digital traces to web archives and newly digitized newspaper and other
historical archives
- Large-scale records (big data) of persons or businesses are created constantly
- Powerful and comparatively cheap processing power, and easy to use computing
infrastructure for processing these data
- Improved tools to analyze this data, including network analysis methods and
automatic text analysis methods such as supervised text classification, topic
modeling, word embeddings, as well as large language models
3
, 10 characteristics of big data
Pro’s and cons of computational methods
Opportunities
- We can study actual behavior instead of simply self-reports
- We can study human beings in their social context instead of in an artificial lab
setting
- We can increase our N (higher power)
- Potential to uncover patterns and insights that we couldn’t investigate before
Pitfalls
- Techniques often (rather) complicated
- Data is often proprietary (not shared openly)
- Samples are often biased
- Often, data have only insufficient metadata
Theoretical definition
“Computational Communication Science (CCS) is the label applied to the emerging subfield
that investigates the use of computational algorithms to gather and analyze big and often
semi- or unstructured data sets to develop and test communication science theories”
Typical research areas
Computational communication science studies thus usually involve:
4
Full summary 2023-2024
VU Amsterdam
Master communication science
1
,Table of contents
Week 1: Introduction to computational methods in communication science .................................................. 3
Lecture 1 ............................................................................................................................................................. 3
Required readings ............................................................................................................................................. 11
Week 2: Automated Text Analysis and Dictionary Approaches ...................................................................... 14
Lecture 2 ........................................................................................................................................................... 14
Required readings ............................................................................................................................................. 30
Week 3: Text classification using classic machine learning ............................................................................ 37
Lecture 3 ........................................................................................................................................................... 37
Required readings ............................................................................................................................................. 73
Week 4: Word embeddings, transformers and large language models .......................................................... 82
Lecture 4 ........................................................................................................................................................... 82
Required readings ........................................................................................................................................... 105
2
,Week 1: Introduction to computational methods in communication
science
Lecture 1
Field of social science that uses algorithmic tools and large/unstructured data to understand
human and social behavior
Complements rather than replaces traditional methodologies: Methods are not the goal, but
contribute to data generation
Includes methods such as, e.g.,:
- Data mining (e.g., scraping and gathering of large data sets)
- Software development for social science experiments
- Automated text analysis (e.g., sentiment analysis, keyword extraction, dictionary
approaches)
- Image classification (e.g., face recognition, visual topic modeling)
- Machine learning approaches (e.g., for classification, prediction, topic modeling)
- Actor-based modeling (e.g., simulation of social behavior, spreading of information)
Why is this important now?
- Vast amounts of digitally available data, ranging from social media messages and
other digital traces to web archives and newly digitized newspaper and other
historical archives
- Large-scale records (big data) of persons or businesses are created constantly
- Powerful and comparatively cheap processing power, and easy to use computing
infrastructure for processing these data
- Improved tools to analyze this data, including network analysis methods and
automatic text analysis methods such as supervised text classification, topic
modeling, word embeddings, as well as large language models
3
, 10 characteristics of big data
Pro’s and cons of computational methods
Opportunities
- We can study actual behavior instead of simply self-reports
- We can study human beings in their social context instead of in an artificial lab
setting
- We can increase our N (higher power)
- Potential to uncover patterns and insights that we couldn’t investigate before
Pitfalls
- Techniques often (rather) complicated
- Data is often proprietary (not shared openly)
- Samples are often biased
- Often, data have only insufficient metadata
Theoretical definition
“Computational Communication Science (CCS) is the label applied to the emerging subfield
that investigates the use of computational algorithms to gather and analyze big and often
semi- or unstructured data sets to develop and test communication science theories”
Typical research areas
Computational communication science studies thus usually involve:
4