100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Samenvatting chapter 1-3

Beoordeling
-
Verkocht
-
Pagina's
25
Geüpload op
20-11-2025
Geschreven in
2025/2026

Een samenvatting van de belangrijkste zaken van hoofdstuk 1 tem 3 van de course text van de prof. R code werd aangeduid in paars.











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
20 november 2025
Aantal pagina's
25
Geschreven in
2025/2026
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

STATISTICS 5
CHAPTER 1: VISUALISATION

1.2 BACKGROUND

https://www.youtube.com/watch?v=FddF4b_GuTI&t=363s

Each row = observation
Each column = variable

Glimpse(): gives you a glimpse of a dataset

?dataframe: gives you information about what’s in the dataframe

Nrow(): # numbers of rows
ncol(): # numbers of columns
dim(): # dimensions (row column)

Data visualization  ggplot2 (inside tidyverse)
ggplot(data = dataframe, mapping = aes (x=…, y=…))
+ geom_point() (to present the data as a point)

https://www.youtube.com/watch?v=s2NF2J36ljE&t=1s

ggplot: after y=…, you can add … for different groups

- colour = …
- shape = …
- size = … (dikker of dunner bolletje)
- alpha = … (meer of minder doorzichtig)

You can leave data = and mapping = out of the coding

Mapping: determine the size, alpha … of points based on the values of a variable in the data  goes
into eas()
Setting: determine the size, alpha … of points NOT based on the values of a variable in the data 
goed into geom_*() (ex: geom_point)

1.4 THE ‘MISE EN PLACE’: SETTING UP

1. Step 1: create a working directory or folder (statistics V)
2. Step 2: save the data in that folder
3. Step 3: open and save an R script
4. Step 4: specify the working directory (Use the top menu Session -> Set Working Directory -
> To Source File Location)
5. Step 5: install and load the necessary packages
6. Step 6: read in the data
7. Step 7: check your data
a. Read.csv(): gives a data frame
b. Read_csv(): gives a tibble (is a data frame for tidyverse)
c. You can also use glimspe()

1.5 VISUALISATION: LET’S START SIMPLE!

,Data: every plot start with ggplot() and a tibble  ex: ggplot(data=dfsatisf)

 Gives an empty canvas like this

Mapping: tells the plot which columns in the data should be represented by different aspects of the
plot, the x-axis, y-axis, line color…  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness))

Geoms: to add layers onto the base plot  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap,
y= Happiness)) +
geom_point() => gives a scatterplot

 The + has to be on the end of the previous line, not at the start of the next line




Multiple geoms: you can add multiple layers, these display in the order that you set them up  ex:
ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)  line on top of the dots

Assigning plots to objects: you can save the output of ggplot()  ex: point_first <- ggplot(data =
dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)

Combining plots: using functions from the patchwork package  ex: point_first + line_first +
plot_layout(nrow=1)

Saving plots: you can also save as a file  ggsave(“plot name”)

Flexible smooth regression lines  ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness)) +
geom_point() +
geom_smooth(method = “gam”)

1.6 CUSTOMIZING PLOTS

Styling geoms

- Change the overall style of a geom, you can set the arguments color, alpha, shape, size and
linetype
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, linetype =1)

 to get ~ type: option n

Setting aesthetics overall versus by category

- If you want, for example, points to have different colors depending on which issue category they
are from you set the argument color = inside the aes() function for the mapping
 Ex: … color = continent)) + …

Logarithmic axes

- Here, not differences, but ratios are meaningful  logarithmic scaled axis is more meaningful
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +

, geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, color = rgb(0, .5, .8), linetype = 1 ) +
scale_x_continuous(trans = "log10")

Separate lines per group

- If we delete the argument color = … from the geom_smooth() line, we get a regression line per
continent

Format axes

- Functions to change the axis labels  scale_ …
- You need to use a scale function that matches the data you’re plotting  here continuous
o Ex: scale_x_continuous() and scale_y_continuous()
- Name = “…” changes the axis label
- Breaks = c() sets the major units and needs a vector of possible values
 Add them on the end of the other line of code

Axis limits

- Change the minimum and maximum values on an axis  coord_cartesian()
- Ex: coord_cartesian(ylim = c(0,10)
 Also added on the end of the code

Making dots proportional to population size

- We insert a new aes() into geom_point() in wich we specify the size
- Ex: … geom_point(aes(size = PopSize), alpha = 0.4, shape = 20) …
- We get small dots  scale_size_conitnuous(range = c(2,15)) (the range indicates the minimum
and maximum size)
- If you want the legend to go  add: show.legend = FALSE (in this case only for the points, so
you have to add it to the geom_points() )

Themes

- Ex: theme_minimal() and theme_bw()
- You add this at the end of the line of code
- Ex: change font size  theme_bw(base.size=12)

Adding text to the plot  see online example

- Labels = …

1.7 OTHER PLOT TYPES

See ggplot2 cheat sheet

Bar plot: counting categories

- Geom_bar()  you only need to provide x = …
- Ex: ggplot(dfsatisf, aes(x = Continent)) + geom_bar()

Column plot

1. Use count () to make a table with a row for each continent and a column called n with the
numver of observations in each continent
 count_data <- count(dfsatisf, Continent)
2. for a column plot you need to set both x and y
 ggplot(count_data, aes(x = Continent, y = n)) + geom_col()
 This gives the same result as the bar graph

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
lunathijssen Katholieke Universiteit Leuven
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
13
Lid sinds
3 jaar
Aantal volgers
4
Documenten
18
Laatst verkocht
1 dag geleden

3,5

2 beoordelingen

5
0
4
1
3
1
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen