Contents
Lecture 1 .................................................................................................................................2
Tutorial 1 .................................................................................................................................2
Data Carpentry – R for Social Scientists Ch. 1 - 4 ......................................................................3
A Guide to Scrum for Researchers ............................................................................................6
Lecture 2 .................................................................................................................................6
Data Carpentry – R for Social Scientists Ch. 7 ...........................................................................8
Tutorial 3 .................................................................................................................................9
Principles of Project Setup and Workflow Management ........................................................... 10
Lecture 4 ............................................................................................................................... 15
Tutorial 4 ............................................................................................................................... 16
Data preparation process and challenges ............................................................................... 16
Lecture 5 ............................................................................................................................... 17
Tutorial 5 ............................................................................................................................... 18
Use Makefiles to Manage, Automate, and Reproduce Projects................................................. 20
This is why you should automate the pipeline of your research project..................................... 21
Four steps to automate & make reproducible your empirical research project.......................... 21
,Lecture 1
What’s e icient?
- Develop and prototype the “final pipeline” of your project, refine later.
- Reduce setup costs to return to a project (e.g., finding relevant files, getting your code to
work)
- Reduce coding mistakes (or catching them at all)
- Rotating and collaborating on tasks (e.g., team members taking over)
- Sharing/reusing code (e.g., in so-called “packages” or “libraries”)
- Receiving feedback from others
Tutorial 1
RStudio = graphical user interface to R
Why learn R?
- No point and clicking
- Great for reproducibility
- Interdisciplinary & extensible
o It has packages from many di erent disciplines
- Supports all data formats
- Can produce high-quality graphics
- Has a large community
- It’s free, open-source, and cross-platform
- Works even in the cloud!
Working directory = the directory that the computer is in at the moment
Absolute directory = a path that starts from the root directory, providing the complete location
of a file or folder.
Relative directory: one directory is our main directory and all directories are relative to that
directory.
- Use ‘../’ to go “up” one directory.
Create a directory by: mkdir data # continue w/ other directories
Using R:
- As a calculator
- Variable assignment
- = vs. ==
o == compares two things
o = assigns a value to a variable
- Functions
o Using functions (e.g., round))
o Getting help on functions
“?function”
o What functions exist?
- Vectors and data types (c(1,2,3), c(“Berlin”, “Barcelona”), NA)
, - Subsetting vectors (explicitly, conditionally with logical operators)
o [1] will give you the first value of a vector
Starting with data
- Individual values vs “holding data” (rows, columns)
- Di erence between Excel & R? In R, each column holds the same data type.
$column_name shows data - $ allows us to access data that is stored in columns of a data file.
Data %>% select(column_name)does the same thing, but is more modern. It subsets/selects
columns
filter(): subsets rows on conditions
- Example: streams %>% filter(`Track Name` == "Pepas")
mutate (): create new columns by using information from other columns
- Example: streams <- streams %>% mutate(weekly_streams = `Streams` * 7) streams %>%
select(`Track Name`, `Streams`, weekly_streams)
arrange(): sorting data from smallest to largest
- Example: streams <- streams %>% arrange(streams)
- To sort from largest to smallest:
o streams <- streams %>% arrange(-streams)
o streams <- streams %>% arrange(desc(streams))
group_by()and summarize(): create summary statistics on grouped data
- Example: streams %>% group_by(Artist) %>% summarize(total_streams = sum(Streams))
count(): count discrete values
- Example: streams %>% count(Streams>1E6)
Data Carpentry – R for Social Scientists Ch. 1 - 4
Why learn R?
- No pointing and clicking
o Results of your analysis rely on a series of written commands. Working with
scripts makes the steps you used in your analysis clear, and the code you write
can be inspected by someone else. It also forces you to have a deeper
understanding of what you are doing, and facilitates your learning and
comprehension of the methods you use.
- Reproducibility
o Reproducibility = when someone else can obtain the same results from the same
dataset when using the same analysis.
o If you collect more data, or fix a mistake in your dataset, your figures and the
statistical tests in your manuscript are updated automatically.
- Interdisciplinary and extensible
Lecture 1 .................................................................................................................................2
Tutorial 1 .................................................................................................................................2
Data Carpentry – R for Social Scientists Ch. 1 - 4 ......................................................................3
A Guide to Scrum for Researchers ............................................................................................6
Lecture 2 .................................................................................................................................6
Data Carpentry – R for Social Scientists Ch. 7 ...........................................................................8
Tutorial 3 .................................................................................................................................9
Principles of Project Setup and Workflow Management ........................................................... 10
Lecture 4 ............................................................................................................................... 15
Tutorial 4 ............................................................................................................................... 16
Data preparation process and challenges ............................................................................... 16
Lecture 5 ............................................................................................................................... 17
Tutorial 5 ............................................................................................................................... 18
Use Makefiles to Manage, Automate, and Reproduce Projects................................................. 20
This is why you should automate the pipeline of your research project..................................... 21
Four steps to automate & make reproducible your empirical research project.......................... 21
,Lecture 1
What’s e icient?
- Develop and prototype the “final pipeline” of your project, refine later.
- Reduce setup costs to return to a project (e.g., finding relevant files, getting your code to
work)
- Reduce coding mistakes (or catching them at all)
- Rotating and collaborating on tasks (e.g., team members taking over)
- Sharing/reusing code (e.g., in so-called “packages” or “libraries”)
- Receiving feedback from others
Tutorial 1
RStudio = graphical user interface to R
Why learn R?
- No point and clicking
- Great for reproducibility
- Interdisciplinary & extensible
o It has packages from many di erent disciplines
- Supports all data formats
- Can produce high-quality graphics
- Has a large community
- It’s free, open-source, and cross-platform
- Works even in the cloud!
Working directory = the directory that the computer is in at the moment
Absolute directory = a path that starts from the root directory, providing the complete location
of a file or folder.
Relative directory: one directory is our main directory and all directories are relative to that
directory.
- Use ‘../’ to go “up” one directory.
Create a directory by: mkdir data # continue w/ other directories
Using R:
- As a calculator
- Variable assignment
- = vs. ==
o == compares two things
o = assigns a value to a variable
- Functions
o Using functions (e.g., round))
o Getting help on functions
“?function”
o What functions exist?
- Vectors and data types (c(1,2,3), c(“Berlin”, “Barcelona”), NA)
, - Subsetting vectors (explicitly, conditionally with logical operators)
o [1] will give you the first value of a vector
Starting with data
- Individual values vs “holding data” (rows, columns)
- Di erence between Excel & R? In R, each column holds the same data type.
$column_name shows data - $ allows us to access data that is stored in columns of a data file.
Data %>% select(column_name)does the same thing, but is more modern. It subsets/selects
columns
filter(): subsets rows on conditions
- Example: streams %>% filter(`Track Name` == "Pepas")
mutate (): create new columns by using information from other columns
- Example: streams <- streams %>% mutate(weekly_streams = `Streams` * 7) streams %>%
select(`Track Name`, `Streams`, weekly_streams)
arrange(): sorting data from smallest to largest
- Example: streams <- streams %>% arrange(streams)
- To sort from largest to smallest:
o streams <- streams %>% arrange(-streams)
o streams <- streams %>% arrange(desc(streams))
group_by()and summarize(): create summary statistics on grouped data
- Example: streams %>% group_by(Artist) %>% summarize(total_streams = sum(Streams))
count(): count discrete values
- Example: streams %>% count(Streams>1E6)
Data Carpentry – R for Social Scientists Ch. 1 - 4
Why learn R?
- No pointing and clicking
o Results of your analysis rely on a series of written commands. Working with
scripts makes the steps you used in your analysis clear, and the code you write
can be inspected by someone else. It also forces you to have a deeper
understanding of what you are doing, and facilitates your learning and
comprehension of the methods you use.
- Reproducibility
o Reproducibility = when someone else can obtain the same results from the same
dataset when using the same analysis.
o If you collect more data, or fix a mistake in your dataset, your figures and the
statistical tests in your manuscript are updated automatically.
- Interdisciplinary and extensible