STATISTICS 5
CHAPTER 1: VISUALISATION
1.2 BACKGROUND
https://www.youtube.com/watch?v=FddF4b_GuTI&t=363s
Each row = observation
Each column = variable
Glimpse(): gives you a glimpse of a dataset
?dataframe: gives you information about what’s in the dataframe
Nrow(): # numbers of rows
ncol(): # numbers of columns
dim(): # dimensions (row column)
Data visualization ggplot2 (inside tidyverse)
ggplot(data = dataframe, mapping = aes (x=…, y=…))
+ geom_point() (to present the data as a point)
https://www.youtube.com/watch?v=s2NF2J36ljE&t=1s
ggplot: after y=…, you can add … for different groups
- colour = …
- shape = …
- size = … (dikker of dunner bolletje)
- alpha = … (meer of minder doorzichtig)
You can leave data = and mapping = out of the coding
Mapping: determine the size, alpha … of points based on the values of a variable in the data goes
into eas()
Setting: determine the size, alpha … of points NOT based on the values of a variable in the data
goed into geom_*() (ex: geom_point)
1.4 THE ‘MISE EN PLACE’: SETTING UP
1. Step 1: create a working directory or folder (statistics V)
2. Step 2: save the data in that folder
3. Step 3: open and save an R script
4. Step 4: specify the working directory (Use the top menu Session -> Set Working Directory -
> To Source File Location)
5. Step 5: install and load the necessary packages
6. Step 6: read in the data
7. Step 7: check your data
a. Read.csv(): gives a data frame
b. Read_csv(): gives a tibble (is a data frame for tidyverse)
c. You can also use glimspe()
1.5 VISUALISATION: LET’S START SIMPLE!
,Data: every plot start with ggplot() and a tibble ex: ggplot(data=dfsatisf)
Gives an empty canvas like this
Mapping: tells the plot which columns in the data should be represented by different aspects of the
plot, the x-axis, y-axis, line color… ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness))
Geoms: to add layers onto the base plot ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap,
y= Happiness)) +
geom_point() => gives a scatterplot
The + has to be on the end of the previous line, not at the start of the next line
Multiple geoms: you can add multiple layers, these display in the order that you set them up ex:
ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm) line on top of the dots
Assigning plots to objects: you can save the output of ggplot() ex: point_first <- ggplot(data =
dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)
Combining plots: using functions from the patchwork package ex: point_first + line_first +
plot_layout(nrow=1)
Saving plots: you can also save as a file ggsave(“plot name”)
Flexible smooth regression lines ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness)) +
geom_point() +
geom_smooth(method = “gam”)
1.6 CUSTOMIZING PLOTS
Styling geoms
- Change the overall style of a geom, you can set the arguments color, alpha, shape, size and
linetype
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, linetype =1)
to get ~ type: option n
Setting aesthetics overall versus by category
- If you want, for example, points to have different colors depending on which issue category they
are from you set the argument color = inside the aes() function for the mapping
Ex: … color = continent)) + …
Logarithmic axes
- Here, not differences, but ratios are meaningful logarithmic scaled axis is more meaningful
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
, geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, color = rgb(0, .5, .8), linetype = 1 ) +
scale_x_continuous(trans = "log10")
Separate lines per group
- If we delete the argument color = … from the geom_smooth() line, we get a regression line per
continent
Format axes
- Functions to change the axis labels scale_ …
- You need to use a scale function that matches the data you’re plotting here continuous
o Ex: scale_x_continuous() and scale_y_continuous()
- Name = “…” changes the axis label
- Breaks = c() sets the major units and needs a vector of possible values
Add them on the end of the other line of code
Axis limits
- Change the minimum and maximum values on an axis coord_cartesian()
- Ex: coord_cartesian(ylim = c(0,10)
Also added on the end of the code
Making dots proportional to population size
- We insert a new aes() into geom_point() in wich we specify the size
- Ex: … geom_point(aes(size = PopSize), alpha = 0.4, shape = 20) …
- We get small dots scale_size_conitnuous(range = c(2,15)) (the range indicates the minimum
and maximum size)
- If you want the legend to go add: show.legend = FALSE (in this case only for the points, so
you have to add it to the geom_points() )
Themes
- Ex: theme_minimal() and theme_bw()
- You add this at the end of the line of code
- Ex: change font size theme_bw(base.size=12)
Adding text to the plot see online example
- Labels = …
1.7 OTHER PLOT TYPES
See ggplot2 cheat sheet
Bar plot: counting categories
- Geom_bar() you only need to provide x = …
- Ex: ggplot(dfsatisf, aes(x = Continent)) + geom_bar()
Column plot
1. Use count () to make a table with a row for each continent and a column called n with the
numver of observations in each continent
count_data <- count(dfsatisf, Continent)
2. for a column plot you need to set both x and y
ggplot(count_data, aes(x = Continent, y = n)) + geom_col()
This gives the same result as the bar graph
CHAPTER 1: VISUALISATION
1.2 BACKGROUND
https://www.youtube.com/watch?v=FddF4b_GuTI&t=363s
Each row = observation
Each column = variable
Glimpse(): gives you a glimpse of a dataset
?dataframe: gives you information about what’s in the dataframe
Nrow(): # numbers of rows
ncol(): # numbers of columns
dim(): # dimensions (row column)
Data visualization ggplot2 (inside tidyverse)
ggplot(data = dataframe, mapping = aes (x=…, y=…))
+ geom_point() (to present the data as a point)
https://www.youtube.com/watch?v=s2NF2J36ljE&t=1s
ggplot: after y=…, you can add … for different groups
- colour = …
- shape = …
- size = … (dikker of dunner bolletje)
- alpha = … (meer of minder doorzichtig)
You can leave data = and mapping = out of the coding
Mapping: determine the size, alpha … of points based on the values of a variable in the data goes
into eas()
Setting: determine the size, alpha … of points NOT based on the values of a variable in the data
goed into geom_*() (ex: geom_point)
1.4 THE ‘MISE EN PLACE’: SETTING UP
1. Step 1: create a working directory or folder (statistics V)
2. Step 2: save the data in that folder
3. Step 3: open and save an R script
4. Step 4: specify the working directory (Use the top menu Session -> Set Working Directory -
> To Source File Location)
5. Step 5: install and load the necessary packages
6. Step 6: read in the data
7. Step 7: check your data
a. Read.csv(): gives a data frame
b. Read_csv(): gives a tibble (is a data frame for tidyverse)
c. You can also use glimspe()
1.5 VISUALISATION: LET’S START SIMPLE!
,Data: every plot start with ggplot() and a tibble ex: ggplot(data=dfsatisf)
Gives an empty canvas like this
Mapping: tells the plot which columns in the data should be represented by different aspects of the
plot, the x-axis, y-axis, line color… ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness))
Geoms: to add layers onto the base plot ex: ggplot(data = dfsatisf, mapping = aes(x= GDPpercap,
y= Happiness)) +
geom_point() => gives a scatterplot
The + has to be on the end of the previous line, not at the start of the next line
Multiple geoms: you can add multiple layers, these display in the order that you set them up ex:
ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm) line on top of the dots
Assigning plots to objects: you can save the output of ggplot() ex: point_first <- ggplot(data =
dfsatisf, mapping = aes(x= GDPpercap, y= Happiness)) +
geom_point() +
geom_smooth(method=lm)
Combining plots: using functions from the patchwork package ex: point_first + line_first +
plot_layout(nrow=1)
Saving plots: you can also save as a file ggsave(“plot name”)
Flexible smooth regression lines ggplot(data = dfsatisf, mapping = aes(x= GDPpercap, y=
Happiness)) +
geom_point() +
geom_smooth(method = “gam”)
1.6 CUSTOMIZING PLOTS
Styling geoms
- Change the overall style of a geom, you can set the arguments color, alpha, shape, size and
linetype
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, linetype =1)
to get ~ type: option n
Setting aesthetics overall versus by category
- If you want, for example, points to have different colors depending on which issue category they
are from you set the argument color = inside the aes() function for the mapping
Ex: … color = continent)) + …
Logarithmic axes
- Here, not differences, but ratios are meaningful logarithmic scaled axis is more meaningful
ggplot(dfsatisf, aes(x = GDPpercap, y = Happiness)) +
, geom_point(color = "dodgerblue", alpha = 0.4, shape = 18, size = 2) +
geom_smooth(method = lm, formula = y ~ x, color = rgb(0, .5, .8), linetype = 1 ) +
scale_x_continuous(trans = "log10")
Separate lines per group
- If we delete the argument color = … from the geom_smooth() line, we get a regression line per
continent
Format axes
- Functions to change the axis labels scale_ …
- You need to use a scale function that matches the data you’re plotting here continuous
o Ex: scale_x_continuous() and scale_y_continuous()
- Name = “…” changes the axis label
- Breaks = c() sets the major units and needs a vector of possible values
Add them on the end of the other line of code
Axis limits
- Change the minimum and maximum values on an axis coord_cartesian()
- Ex: coord_cartesian(ylim = c(0,10)
Also added on the end of the code
Making dots proportional to population size
- We insert a new aes() into geom_point() in wich we specify the size
- Ex: … geom_point(aes(size = PopSize), alpha = 0.4, shape = 20) …
- We get small dots scale_size_conitnuous(range = c(2,15)) (the range indicates the minimum
and maximum size)
- If you want the legend to go add: show.legend = FALSE (in this case only for the points, so
you have to add it to the geom_points() )
Themes
- Ex: theme_minimal() and theme_bw()
- You add this at the end of the line of code
- Ex: change font size theme_bw(base.size=12)
Adding text to the plot see online example
- Labels = …
1.7 OTHER PLOT TYPES
See ggplot2 cheat sheet
Bar plot: counting categories
- Geom_bar() you only need to provide x = …
- Ex: ggplot(dfsatisf, aes(x = Continent)) + geom_bar()
Column plot
1. Use count () to make a table with a row for each continent and a column called n with the
numver of observations in each continent
count_data <- count(dfsatisf, Continent)
2. for a column plot you need to set both x and y
ggplot(count_data, aes(x = Continent, y = n)) + geom_col()
This gives the same result as the bar graph