Samenvatting

Summary Notes - Skills: Data Preparation and Workflow Management (dPrep)

Beoordeling

Verkocht

Pagina's

Geüpload op

03-02-2025

Geschreven in

2024/2025

Notes on the DPREP course from all lectures, tutorials, and assigned readings.

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Marketing Analytics
Vak: Skills: Data Preparation and Workflow Management

Alle documenten voor dit vak (2)

Documentinformatie

Geüpload op: 3 februari 2025
Aantal pagina's: 22
Geschreven in: 2024/2025
Type: Samenvatting

Onderwerpen

rstudio
data

Voorbeeld van de inhoud

Contents
Lecture 1 .................................................................................................................................2
Tutorial 1 .................................................................................................................................2
Data Carpentry – R for Social Scientists Ch. 1 - 4 ......................................................................3
A Guide to Scrum for Researchers ............................................................................................6
Lecture 2 .................................................................................................................................6
Data Carpentry – R for Social Scientists Ch. 7 ...........................................................................8
Tutorial 3 .................................................................................................................................9
Principles of Project Setup and Workﬂow Management ........................................................... 10
Lecture 4 ............................................................................................................................... 15
Tutorial 4 ............................................................................................................................... 16
Data preparation process and challenges ............................................................................... 16
Lecture 5 ............................................................................................................................... 17
Tutorial 5 ............................................................................................................................... 18
Use Makeﬁles to Manage, Automate, and Reproduce Projects................................................. 20
This is why you should automate the pipeline of your research project..................................... 21
Four steps to automate & make reproducible your empirical research project.......................... 21

,Lecture 1

What’s e icient?

- Develop and prototype the “ﬁnal pipeline” of your project, reﬁne later.
- Reduce setup costs to return to a project (e.g., ﬁnding relevant ﬁles, getting your code to
work)
- Reduce coding mistakes (or catching them at all)
- Rotating and collaborating on tasks (e.g., team members taking over)
- Sharing/reusing code (e.g., in so-called “packages” or “libraries”)
- Receiving feedback from others

Tutorial 1

RStudio = graphical user interface to R

Why learn R?

- No point and clicking
- Great for reproducibility
- Interdisciplinary & extensible
o It has packages from many di erent disciplines
- Supports all data formats
- Can produce high-quality graphics
- Has a large community
- It’s free, open-source, and cross-platform
- Works even in the cloud!

Working directory = the directory that the computer is in at the moment

Absolute directory = a path that starts from the root directory, providing the complete location
of a ﬁle or folder.

Relative directory: one directory is our main directory and all directories are relative to that
directory.

- Use ‘../’ to go “up” one directory.

Create a directory by: mkdir data # continue w/ other directories

Using R:

- As a calculator
- Variable assignment
- = vs. ==
o == compares two things
o = assigns a value to a variable
- Functions
o Using functions (e.g., round))
o Getting help on functions
 “?function”
o What functions exist?
- Vectors and data types (c(1,2,3), c(“Berlin”, “Barcelona”), NA)

, - Subsetting vectors (explicitly, conditionally with logical operators)
o [1] will give you the ﬁrst value of a vector

Starting with data

- Individual values vs “holding data” (rows, columns)
- Di erence between Excel & R? In R, each column holds the same data type.

$column_name shows data - $ allows us to access data that is stored in columns of a data ﬁle.

Data %>% select(column_name)does the same thing, but is more modern. It subsets/selects
columns

filter(): subsets rows on conditions

- Example: streams %>% filter(`Track Name` == "Pepas")

mutate (): create new columns by using information from other columns

- Example: streams <- streams %>% mutate(weekly_streams = `Streams` * 7) streams %>%
select(`Track Name`, `Streams`, weekly_streams)

arrange(): sorting data from smallest to largest

- Example: streams <- streams %>% arrange(streams)
- To sort from largest to smallest:
o streams <- streams %>% arrange(-streams)
o streams <- streams %>% arrange(desc(streams))

group_by()and summarize(): create summary statistics on grouped data

- Example: streams %>% group_by(Artist) %>% summarize(total_streams = sum(Streams))

count(): count discrete values

- Example: streams %>% count(Streams>1E6)

Data Carpentry – R for Social Scientists Ch. 1 - 4

Why learn R?

- No pointing and clicking
o Results of your analysis rely on a series of written commands. Working with
scripts makes the steps you used in your analysis clear, and the code you write
can be inspected by someone else. It also forces you to have a deeper
understanding of what you are doing, and facilitates your learning and
comprehension of the methods you use.
- Reproducibility
o Reproducibility = when someone else can obtain the same results from the same
dataset when using the same analysis.
o If you collect more data, or ﬁx a mistake in your dataset, your ﬁgures and the
statistical tests in your manuscript are updated automatically.
- Interdisciplinary and extensible

€5,48

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

BAstudent1

3,8

(4)

Maak kennis met de verkoper

BAstudent1 Tilburg University

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

3 weken geleden

BAstudent

3,8

4 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper BAstudent1. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 43175 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Summary Notes - Skills: Data Preparation and Workflow Management (dPrep)

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Tilburg University (UVT) > Marketing Analytics

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?