100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Samenvatting chapter 1-3

Rating
-
Sold
-
Pages
25
Uploaded on
20-11-2025
Written in
2025/2026

A summary of the most important issues of chapters 1 through 3 of the course text of the prof. R code was marked in purple.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
November 20, 2025
Number of pages
25
Written in
2025/2026
Type
Summary

Subjects

Content preview

STATISTICS 5
CHAPTER 1: VISUALISATION

1.2 BACKGROUND

https://www.youtube.com/watch?v=FddF4b_GuTI&t=363s

Each row = observation
Each column = variable

Glimpse(): gives you a glimpse of a dataset

?dataframe: gives you information about what’s in the dataframe

Nrow(): # numbers of rows
ncol(): # numbers of columns
dim(): # dimensions (row column)

Data visualization  ggplot2 (inside tidyverse)
ggplot(data = dataframe, mapping = aes (x=…, y=…))
+ geom_point() (to present the data as a point)

https://www.youtube.com/watch?v=s2NF2J36ljE&t=1s

ggplot: after y=…, you can add … for different groups

- colour = …
- shape = …
- size = … (dikker of dunner bolletje)
- alpha = … (meer of minder doorzichtig)

You can leave data = and mapping = out of the coding

Mapping: determine the size, alpha … of points based on the values of a variable in the data  goes
into eas()
Setting: determine the size, alpha … of points NOT based on the values of a variable in the data 
goed into geom_*() (ex: geom_point)

1.4 THE ‘MISE EN PLACE’: SETTING UP

1. Step 1: create a working directory or folder (statistics V)
2. Step 2: save the data in that folder
3. Step 3: open and save an R script
4. Step 4: specify the working directory (Use the top menu Session -> Set Working Directory -
> To Source File Location)
5. Step 5: install and load the necessary packages
6. Step 6: read in the data
7. Step 7: check your data
a. Read.csv(): gives a data frame
b. Read_csv(): gives a tibble (is a data frame for tidyverse)
c. You can also use glimspe()

1.5 VISUALISATION: LET’S START SIMPLE!

,Data: every plot start with ggplot() and a tibble  ex: ggplot(data=dfsatisf)

 Gives an empty canvas like this

Mapping: tells the plot which columns in the data should be represented by different aspects of the
plot, the x-axis, y-axis, line color…  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness))

Geoms: to add layers onto the base plot  ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap,
y= Happiness)) +
geom_point() => gives a scatterplot

 The + has to be on the end of the previous line, not at the start of the next line




Multiple geoms: you can add multiple layers, these display in the order that you set them up  ex:
ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)  line on top of the dots

Assigning plots to objects: you can save the output of ggplot()  ex: point_first <- ggplot(data =
dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)

Combining plots: using functions from the patchwork package  ex: point_first + line_first +
plot_layout(nrow=1)

Saving plots: you can also save as a file  ggsave(“plot name”)

Flexible smooth regression lines  ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness)) +
geom_point() +
geom_smooth(method = “gam”)

1.6 CUSTOMIZING PLOTS

Styling geoms

- Change the overall style of a geom, you can set the arguments color, alpha, shape, size and
linetype
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, linetype =1)

 to get ~ type: option n

Setting aesthetics overall versus by category

- If you want, for example, points to have different colors depending on which issue category they
are from you set the argument color = inside the aes() function for the mapping
 Ex: … color = continent)) + …

Logarithmic axes

- Here, not differences, but ratios are meaningful  logarithmic scaled axis is more meaningful
 ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +

, geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, color = rgb(0, .5, .8), linetype = 1 ) +
scale_x_continuous(trans = "log10")

Separate lines per group

- If we delete the argument color = … from the geom_smooth() line, we get a regression line per
continent

Format axes

- Functions to change the axis labels  scale_ …
- You need to use a scale function that matches the data you’re plotting  here continuous
o Ex: scale_x_continuous() and scale_y_continuous()
- Name = “…” changes the axis label
- Breaks = c() sets the major units and needs a vector of possible values
 Add them on the end of the other line of code

Axis limits

- Change the minimum and maximum values on an axis  coord_cartesian()
- Ex: coord_cartesian(ylim = c(0,10)
 Also added on the end of the code

Making dots proportional to population size

- We insert a new aes() into geom_point() in wich we specify the size
- Ex: … geom_point(aes(size = PopSize), alpha = 0.4, shape = 20) …
- We get small dots  scale_size_conitnuous(range = c(2,15)) (the range indicates the minimum
and maximum size)
- If you want the legend to go  add: show.legend = FALSE (in this case only for the points, so
you have to add it to the geom_points() )

Themes

- Ex: theme_minimal() and theme_bw()
- You add this at the end of the line of code
- Ex: change font size  theme_bw(base.size=12)

Adding text to the plot  see online example

- Labels = …

1.7 OTHER PLOT TYPES

See ggplot2 cheat sheet

Bar plot: counting categories

- Geom_bar()  you only need to provide x = …
- Ex: ggplot(dfsatisf, aes(x = Continent)) + geom_bar()

Column plot

1. Use count () to make a table with a row for each continent and a column called n with the
numver of observations in each continent
 count_data <- count(dfsatisf, Continent)
2. for a column plot you need to set both x and y
 ggplot(count_data, aes(x = Continent, y = n)) + geom_col()
 This gives the same result as the bar graph

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lunathijssen Katholieke Universiteit Leuven
Follow You need to be logged in order to follow users or courses
Sold
13
Member since
3 year
Number of followers
4
Documents
18
Last sold
1 day ago

3.5

2 reviews

5
0
4
1
3
1
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions