100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

BW3 Data Science with R summary

Rating
-
Sold
1
Pages
16
Uploaded on
06-10-2022
Written in
2021/2022

Summary of the Structure and basic knowledge of working with the R software package and programming language for data analysis. Contents: > R and R studio: introduction, calculator, variable, projects, scripts > Data structures: install/use package, vectors, factors, list, formulas > Data manipulation: Tidyverse, data, tibble, select variables/cols, filter observations/rows, add/modify variables, pipe operator, summarise variables, groups, joining tables, reshaping tables > Graphics: plots/ggplot2, axes/scales, panels/facets/size, types

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 6, 2022
Number of pages
16
Written in
2021/2022
Type
Summary

Subjects

Content preview

Summary Data Science with R

R and RStudio

INTRODUCTION R (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S01L02l_introduction0.html
Statisticians develop new methods and make them first available as packages in R (no need to wait until these
methods are programmed into SPSS)
Now R is known as “a free software environment for statistical computing and graphics.
CRAN (Comprehensive R Archive Network) = a central repository for R language interpreter and R packages.
It also contains manuals and mailing lists (well indexed on google)
Rstudio = an open source integrated development environment for programming in R language. It provides
useful features to help in development of the R code and in organization of projects.

CALCULATOR (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S01L03l_basic_calculator0.html
RStudio consists of various panes which are parts of a window, namely the console, source,
environment(workspace) and files (directory).

The console is the place to type & execute commands
● The “#” sign = comment. All text after it is simply ignored
● The “>” sign = prompt, meaning that one van type any expression. One can press enter to see the
result (output) of the expression.
● R can be used as a simple calculator with arithmetic operators:
○ Addition: +
○ Subtraction: -
○ Multiplication: *
○ Division: /
○ Exponentiation: ^
○ Use (...) for the correct order
■ Multiline commands: If (.... too much → R expects the rest is still coming → no error
or result but a + instead of > symbol on next line which means the command is not
finished yet.
If error → press ESCape button or ctrl C stops the demand.
○ Absolute number: |x| = abs(x)
○ Squareroot: √x = sqrt(x)
○ Decimal separator: one has to use a dot instead of a comma for decimals.
● Useful console keystrokes:
○ Ctrl + L clears the console pane, but not the history.
○ Ctrl + R shows the history ( can be checked with history()) in the environment pane.
■ The environment can be set to list
○ Use up-arrow and down-arrow to scroll through the history of the commands types before;
click one to To replay command
● Getting help by typing ?name in the console

VARIABLE (15/11)

,https://lumc.github.io/rcourse/B3DS_202111/S01L04l_basic_variables0.html
A variable is an argument that stores a value or result.
Choose the names of variables freely. They are case sensitive so do use underscore “_” instead of dot.
Numbers are allowed except for at the beginning of the variable.
Typing the variables in the console creates a change in the environment pane after each ENTER.
The symbol “<-” is the assignment operator that puts a value/number/… to a variable.
fe: x <- 5 puts the number 5 in variable x
(one can also use the = sign but this is less common and one has to be consistent)
if the variable is the outcome of a calculation: x <- (calculation)

The variables are stored in R memory and RStudio shows them in the Environment pane (top right).
The scripts provide reproducibility (shows the calculation + answer).

PROJECTS (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S01L05l_basic_projects0.html
At the start of every new data analysis project a new project is created in RStudio.
For every data analysis multiple files (input files/scripts/reports) have to be placed in the new R project folder
in order to use these.
Create a new project: Menu → File → new project → new directory → new project
- Give the Directory a name
- Choose where to create/store the directory
- (or→ New project: right upper corner)
Copy input files:
- download the datafiles to your computer
- go to verkenner and copy-paste the downloaded files into the project map
- the downloaded files should appear in the bottomright pane. (or otherwise search for the files in
‘files’ panel)


SCRIPTS (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S01L06l_basic_scripts0.html
R script is a simple text file containing commands written in R language.
R Markdown document
An R Markdown document is an extension of an R script that enables the development of elegant and
reproducible reports.
The file is created through:
1. Menu → File → New file - R markdown - name it, html → (switch old/new view with the letter A top
right)
2. Save the file with name in the same map as the project map (and it will be shown in the file panel)
3. Knit the document: converts an R markdown document into a report file which is shown in the
bottom right pane.
→ When typing in console: R script stores all executed commands (typed in console) in a text
file directly in memory of R (local machine/computer). Sourcing = running a file in R
language.
→ When typing in markdown it does not appear in R environment; when pressing knit it is
copied in another document on computer.
Environment and knitting are not combined/do not communicate with each other
Knitting is the process of recalculating all lines/steps and checks the reproducibility (but is
very slow) and allows you to create a report (text document with commands in R language
and free text).

, - (Make sure tidyverse is installed)
R markdown is mixture of pretext and R code:
● Titles are indicated with # 1
● Subtitles are indicated with ## or ###
● - (minus) introduces bullet list
● Hyperlinked words [word](link) enables the isnertion of a datalink.
● ** … ** = bold words
● R code is typed in chunks
○ insert chunk (shortcut: Cntr + Alt + I)
○ Start with ```{r} and end with ``` (in between areas should become grey)
○ Vectors: v <-
○ Library(…)
■ Every time you want to run a program/package (make sure tidyverse is installed!!)
○ ```{r warning=FALSE,message=FALSE} to no longer see error messages
○ To ‘open’ csv files in markdown: read_csv (or read.csv)
○ To make new csv files use: write_csv(...)




Data structures

INSTALL/USE PACKAGES (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S02L04l_packages0.html
R packages are a collection of related functions, possibly with data, built to tackle a specific problem. R comes
with several pre-installed packages such as base, stats, datasets etc. that cover basic data science exercises.
The Comprehensive R Archive Network, CRAN for short, is possibly the only one you need to know for now.
How to install packages
- type it into console pane: install.packages(“your-package-name”)
- or look it up in lower right pane ‘package’ searchbar
- or look at menu-tools and install package
How to load a package
- type in command in console: library(haven)

VECTORS (INTRODUCTION) (15/11)
https://lumc.github.io/rcourse/B3DS_202111/S02L03l_basic_vectors0a.html
A vector is a container of (multiple) elements at the same time:
- all elements are of the same type
- elements are kept at numbered positions
- elements might be given names
Types of data:
- Numerical: a vector of numbers (height of students)
- Character: a vector of texts (names of students)
- Logical: a vector of FALSE/TRUE values (if students have siblings)
- Factor: a vector of values from a limited choice list (eye color of students)

General:



1

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
KaleyRozemarijn Universiteit Leiden
Follow You need to be logged in order to follow users or courses
Sold
48
Member since
6 year
Number of followers
42
Documents
6
Last sold
2 months ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions