100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Socio-info R-studio notes A2

Rating
-
Sold
-
Pages
114
Uploaded on
25-10-2025
Written in
2025/2026

These are in depth notes based on the textbook R for Data science.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
Chapter 1 - 26
Uploaded on
October 25, 2025
Number of pages
114
Written in
2025/2026
Type
Summary

Subjects

Content preview

INTRO TO COMPUTATIONAL SOCIAL SCIENCE
TODAY’S READING
• Bit by Bit – Chapter 2 (by Salganik)
• R for Data Science – Introduction (by Wickham et al.)

WHAT IS COMPUTATIONAL SOCIAL SCIENCE (CSS)?
CSS uses digital data and computational methods to study human behaviour.
It combines:
• Social science theory – understanding why people act as they do
• Data science methods – statistics, programming, algorithms
Examples
• Social media analysis
• Digital trace data
• Text mining
• Network analysis


DIGITAL DATA & BIG DATA
Traditional vs Digital Data
• Collecting behavioural data used to be expensive and rare.
• Digitisation now makes it cheap and continuous.
New Sources
• Digital trace data: online interactions (e.g. tweets, likes)
• Government records: census, tax data
• Corporate data: sales, customer information
These are observational data – collected without active intervention.

“3 VS” OF BIG DATA
1. Volume – huge amounts of data
2. Variety – different types (text, images, video)
3. Velocity – generated very quickly
Big data is often created for non-research purposes, so it must be repurposed.
Example: Twitter posts vs formal survey data.

REPURPOSING DATA
• Found data: already exists (e.g. tweets)
• Collected data: actively gathered (e.g. survey)
Understand why and how the data was created and how it differs from your ideal dataset.

TEN COMMON CHARACTERISTICS OF BIG DATA
1. Big
• Large datasets enable fine-grained analysis but can still contain errors.
o Example: Chetty et al. (2014) used US tax data (40 million records).
o Warning: Back et al. (2010) pager data was corrupted by bot messages.
2. Always-On

, • Data collected continuously → real-time insights (e.g. crisis tracking).
3. Non-Reactive
• People don’t realise they’re observed → natural behaviour but ethical issues.
4. Incomplete
• Missing variables are common; can use imputation or record linkage.
5. Inaccessible
• Controlled by governments or companies; partnerships may limit research.
6. Non-Representative
• Large ≠ representative (e.g. social media users ≠ general population).
7. Drifting
• Population, behavioural, or system drift affects long-term trends.
8. Algorithmically Confounded
• Algorithms shape behaviour (e.g. Facebook recommendations).
Verify that observed patterns are not system-designed.
9. Dirty
• Automatically collected data is often noisy and must be cleaned.
10. Sensitive
• Privacy and consent issues are crucial (especially government/corporate data).

THREE MAIN RESEARCH STRATEGIES
1. Counting things – measure behaviour/events.
Example: Chetty et al. (2014) on intergenerational mobility.
2. Forecasting things – predict present/future states (“nowcasting”).
Example: Google Flu Trends failed due to algorithm changes and behaviour drift.
3. Approximating experiments – infer causal effects without real experiments.
o Natural experiments: random real-world events.
o Matching: pair similar subjects with/without “treatment”.


R, TIDYVERSE, AND WORKFLOWS

1. THE TYPICAL DATA SCIENCE PROJECT MODEL
Adapted from R FOR DATA SCIENCE (R4DS) by Wickham et al. (2023), the typical workflow for a
data project includes:
1. Importing data into R
2. Tidying the data so it’s structured properly
3. Transforming the data (filtering, calculating, summarising)
4. Visualising results
5. Modelling relationships or patterns
6. Communicating your findings
You’ll follow this process throughout the course.

2. WHAT ARE R AND RSTUDIO?
R

,R is a programming language used mainly for data analysis, statistics, and visualization.
RSTUDIO
RStudio is an interface (a program you use to interact with R).
It makes it easier to:
• Write and run R code
• Organise your files
• View data and plots
• Manage packages
WHY USE R AND RSTUDIO?
• Reproducibility: Anyone can re-run your code and get the same results.
• Power: It handles large and complex data easily.
• Open-source: Free to use, modify, and share.
• Community: Supported by a huge online network of users and tutorials.

3. BASICS OF R
The slides labelled “Basics” refer to introductory skills like:
• Assigning values to variables
• Running simple operations
• Using RStudio’s console and script editor
Although no code was shown, let’s look at typical examples you’ll encounter.
EXAMPLE 1: ASSIGNING VALUES
x <- 5
Explanation:
• x is the variable name (a label for storing information).
• <- is the assignment operator (it means “store the value on the right into the variable on the
left”).
• 5 is the value being stored.
This means “store the number 5 in a variable called x.”
You can then use that variable in calculations:
x + 2
This takes the value in x (which is 5), adds 2, and returns 7.

4. PACKAGES AND FUNCTIONS
PACKAGES
Packages are collections of functions that extend what R can do.
For example, the ggplot2 package adds graphing tools, while dplyr helps you clean and manipulate
data.
You install and load packages like this:
install.packages("ggplot2")
library(ggplot2)
Explanation:
• install.packages("ggplot2") downloads the package from the internet.
• library(ggplot2) loads it into your R session so you can use it.

, FUNCTIONS
Functions are pre-written instructions that do something specific.
They take inputs (called ARGUMENTS) and return an output (a result).
EXAMPLE 1: THE MEAN() FUNCTION
mean(student_marks, na.rm = TRUE)
Explanation:
• mean() is the function name – it calculates the average of numbers.
• student_marks is the input, usually a list of marks (like [60, 70, 80]).
• na.rm = TRUE tells R to remove missing values (NAs) before calculating.
If you don’t include na.rm = TRUE, R will not ignore missing data and might return “NA” as the
result.
By default, na.rm (short for “NA remove”) is set to FALSE.
So, this example overrides the default behaviour.


EXAMPLE 2: THE SEQ() FUNCTION
seq(from = 1, to = 10, by = 1)
Explanation:
• seq() stands for sequence – it generates a series of numbers.
• from = 1 means the sequence starts at 1.
• to = 10 means it ends at 10.
• by = 1 means each step increases by 1.
So the result is:
1 2 3 4 5 6 7 8 9 10
If you don’t specify by = 1, R assumes this as the default value.

5. SCRIPTS
Scripts are files where you save your R code (ending in .R).
You can open one by clicking:
File ▷ New File ▷ R Script
WHY USE SCRIPTS?
• Saves time – you can re-run code anytime.
• Ensures reproducibility – your analysis can be repeated by others.
• Supports collaboration – you can share your code easily.
• Allows version control – track changes over time.
• Helps debugging – you can test and fix code in parts.
• Enables automation – R can perform repetitive tasks automatically.
TIP:
Use consistent naming for scripts, such as:
week01_practical.R
6. THE TIDYVERSE
The Tidyverse is a collection of R packages designed for data science.
They all share the same principles and syntax, making them easy to use together.
Core Tidyverse packages include:
• ggplot2 – data visualisation
• dplyr – data manipulation (filtering, summarising)
$9.37
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
annistafford04

Get to know the seller

Seller avatar
annistafford04 Stellenbosch University
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
9 months
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions