Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Samenvatting chapter 1-3

Note
-
Vendu
-
Pages
25
Publié le
20-11-2025
Écrit en
2025/2026

Een samenvatting van de belangrijkste zaken van hoofdstuk 1 tem 3 van de course text van de prof. R code werd aangeduid in paars.











Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Infos sur le Document

Publié le
20 novembre 2025
Nombre de pages
25
Écrit en
2025/2026
Type
Resume

Sujets

Aperçu du contenu

STATISTICS 5
CHAPTER 1: VISUALISATION

1.2 BACKGROUND

https://www.youtube.com/watch?v=FddF4b_GuTI&t=363s

Each row = observation
Each column = variable

Glimpse(): gives you a glimpse of a dataset

?dataframe: gives you information about what’s in the dataframe

Nrow(): # numbers of rows
ncol(): # numbers of columns
dim(): # dimensions (row column)

Data visualization  ggplot2 (inside tidyverse)
ggplot(data = dataframe, mapping = aes (x=…, y=…))
+ geom_point() (to present the data as a point)

https://www.youtube.com/watch?v=s2NF2J36ljE&t=1s

ggplot: after y=…, you can add … for different groups

- colour = …
- shape = …
- size = … (dikker of dunner bolletje)
- alpha = … (meer of minder doorzichtig)

You can leave data = and mapping = out of the coding

Mapping: determine the size, alpha … of points based on the values of a variable in the data  goes
into eas()
Setting: determine the size, alpha … of points NOT based on the values of a variable in the data 
goed into geom_*() (ex: geom_point)

1.4 THE ‘MISE EN PLACE’: SETTING UP

1. Step 1: create a working directory or folder (statistics V)
2. Step 2: save the data in that folder
3. Step 3: open and save an R script
4. Step 4: specify the working directory (Use the top menu Session -> Set Working Directory -
> To Source File Location)
5. Step 5: install and load the necessary packages
6. Step 6: read in the data
7. Step 7: check your data
a. Read.csv(): gives a data frame
b. Read_csv(): gives a tibble (is a data frame for tidyverse)
c. You can also use glimspe()

1.5 VISUALISATION: LET’S START SIMPLE!

,Data: every plot start with ggplot() and a tibble  ex: ggplot(data=dfsatisf)

 Gives an empty canvas like this

Mapping: tells the plot which columns in the data should be represented by different aspects of the
plot, the x-axis, y-axis, line color…  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness))

Geoms: to add layers onto the base plot  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap,
y= Happiness)) +
geom_point() => gives a scatterplot

 The + has to be on the end of the previous line, not at the start of the next line




Multiple geoms: you can add multiple layers, these display in the order that you set them up  ex:
ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)  line on top of the dots

Assigning plots to objects: you can save the output of ggplot()  ex: point_first <- ggplot(data =
dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)

Combining plots: using functions from the patchwork package  ex: point_first + line_first +
plot_layout(nrow=1)

Saving plots: you can also save as a file  ggsave(“plot name”)

Flexible smooth regression lines  ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness)) +
geom_point() +
geom_smooth(method = “gam”)

1.6 CUSTOMIZING PLOTS

Styling geoms

- Change the overall style of a geom, you can set the arguments color, alpha, shape, size and
linetype
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, linetype =1)

 to get ~ type: option n

Setting aesthetics overall versus by category

- If you want, for example, points to have different colors depending on which issue category they
are from you set the argument color = inside the aes() function for the mapping
 Ex: … color = continent)) + …

Logarithmic axes

- Here, not differences, but ratios are meaningful  logarithmic scaled axis is more meaningful
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +

, geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, color = rgb(0, .5, .8), linetype = 1 ) +
scale_x_continuous(trans = "log10")

Separate lines per group

- If we delete the argument color = … from the geom_smooth() line, we get a regression line per
continent

Format axes

- Functions to change the axis labels  scale_ …
- You need to use a scale function that matches the data you’re plotting  here continuous
o Ex: scale_x_continuous() and scale_y_continuous()
- Name = “…” changes the axis label
- Breaks = c() sets the major units and needs a vector of possible values
 Add them on the end of the other line of code

Axis limits

- Change the minimum and maximum values on an axis  coord_cartesian()
- Ex: coord_cartesian(ylim = c(0,10)
 Also added on the end of the code

Making dots proportional to population size

- We insert a new aes() into geom_point() in wich we specify the size
- Ex: … geom_point(aes(size = PopSize), alpha = 0.4, shape = 20) …
- We get small dots  scale_size_conitnuous(range = c(2,15)) (the range indicates the minimum
and maximum size)
- If you want the legend to go  add: show.legend = FALSE (in this case only for the points, so
you have to add it to the geom_points() )

Themes

- Ex: theme_minimal() and theme_bw()
- You add this at the end of the line of code
- Ex: change font size  theme_bw(base.size=12)

Adding text to the plot  see online example

- Labels = …

1.7 OTHER PLOT TYPES

See ggplot2 cheat sheet

Bar plot: counting categories

- Geom_bar()  you only need to provide x = …
- Ex: ggplot(dfsatisf, aes(x = Continent)) + geom_bar()

Column plot

1. Use count () to make a table with a row for each continent and a column called n with the
numver of observations in each continent
 count_data <- count(dfsatisf, Continent)
2. for a column plot you need to set both x and y
 ggplot(count_data, aes(x = Continent, y = n)) + geom_col()
 This gives the same result as the bar graph
€5,46
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
lunathijssen Katholieke Universiteit Leuven
Voir profil
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
13
Membre depuis
3 année
Nombre de followers
4
Documents
18
Dernière vente
1 jours de cela

3,5

2 revues

5
0
4
1
3
1
2
0
1
0

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions