100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary of practicals

Beoordeling
-
Verkocht
-
Pagina's
16
Geüpload op
07-04-2021
Geschreven in
2020/2021

Alle practica van MDCE samengevat.











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
7 april 2021
Aantal pagina's
16
Geschreven in
2020/2021
Type
Samenvatting

Voorbeeld van de inhoud

Notes on practicals
Practical 1

We know it is a function in R, because of the parenthesis.

If you are creating a dataframe in the way
dat3a <- as.data.frame(mat3)
from an abject in which the propreties are not correct, the resulting dataframe is not correct.
Therefore, you should create a dataframe from the data itself
dat3b <- data.frame(V1 = vec1, V2 = vec2)
when your objects are both numerical and characters
vec 1 <- 1, 2, 3, 4, 5, 6
vec2 <- A, B, C, D, E, F

Factor = a categorical variable with a numerical representation
With the factor function you can change the labels of your factors, assign ‘Utrecht’ to 1.


Overview of the dimensions of a dataset, rows and columns
dim(boys)
With the head and tail function, you get the first or last 6 cases. Way of inspecting your
dataset.

Labels for missing data: <NA>(non-numeric data) or NA (numeric data) means not available

Using the exclamation mark (!), turns TRUE into FALSE and FALSE into TRUE.

To inspect your data you can use different functions:
The structure function gives you an overview of the measurement levels, of the head of the
data (first few variables), and the class of the variables
str(boys)
The summary function gives you information about the distribution for numeric data, and the
table for categorical data on all the variables.
summary(boys)
If you want to explore a certain dimension, you use the dollar sign ($). For example the
standard deviation of age in the dataset boys.
sd(boys$age)
We cannot calculate a standard deviation without telling R how to deal with the missingness.
na.rn = TRUE
means remove the missing values. So, then you will only calculate the standard deviation on
the observed data.

If you want to ask for data with two combined variables, we need two separate evaluations.
mean(subset(boys, age < 15 & reg 1= “north”)$age, na.rn = TRUE)
Within the subset you specify your two dimensions, and then you only use the subset age.

When you load a dataset you can open a help-screen with
?mammalsleep
and it gives you information about the variables names.

,The input for a correlation function for each complete observed pair is
cor(sleepdata, use = “pairwise.complete.obs”)
Exclude the categorical columns, for example column one, by using
cor(sleepdata(,-1), use = “pairwise.complete.obs”)
However, the correlationmatrix has many decimals, so take this into account with the round
function. You can for example round the correlations to two decimals
round(cor(sleepdata(,-1), use = “pairwise.complete.obs”), 2)

Convenient functions, any object in the workspace can be saved.
save.image(“Practical_X.RData”)
save(sleepdata, file “Sleepdata.RData”)

If you want to exclude variables, you can do this with the names of the variables
exclude <- c(“Echida”, “Lesser short-tailed shrew”, “Musk shrew”)
which <- sleepdata$species %in% exlcude
The which is a vector with the same length of the data and when you apply this you only get
the names back by default for which it says TRUE. So your new dataset with the excluded
variables would be
sleepdata2 <- sleepdata(!which, )

When plotting your variables, you use ~ which indicates that you want to model something,
based on something else. It separates the outcome part from the predictor, allowing for a
visual representation.
plot(brw ~ species, data = sleepdata2)

If you want to find all your cases that are higher/lower than one standard deviation above the
mean, you take several steps
sd.brw <- sd(sleepdata2$brw)
mean.brw <- mean(sleepdata2$brw)
which <- sleepdata2$brw > (mean.brw + (1 * sd.brw))
as.character(sleepdata2$species[which])
So, you calculate the standard deviation and the mean of brain weight, then you make a new
object (this overrides your last used code under which). With which you calculate the
variables bigger than one standard deviation above the mean, and expose the species for
which which holds as a character.

Practical 2

Objects in R are case-sensitive. This means that
a <- 100
A <- 200
are different characters with each their own value.

To learn more about the data, use one of the two following help commands
help(nhanes)
?nhanes
To get an overview of the data, use
summary(nhanes)

, When you want to explore the missingness in the dataset you can use the summary command,
or
apply(nhanes, MARGIN = 2, FUN = function(x) sum(is.na(x)))
The code ‘applies’ the function that calculates the sum (sum()) over the missings (is.na) on a
set of data (x). The nice thing about apply is that you can apply functions on two-dimensional
objects. In this case you execute a function that calculates the sum of missings (FUN =
function(x) sum(is.na(x))) over the columns (MARGIN = 2) of object nhanes. If you would
change MARGIN = 2 to MARGIN = 1, you would do the same, but over the rows of nhanes.

The function colMeans()calculates the mean of numerical columns
colMeans(nhanes, na.rm=TRUE)
However, you have to specify how you would like to handle the missing values. By using
na.rm=TRUE
it tells R that you would like to remove (rm) the missings (na).

To determine how many cases would be available if only the complete cases were used, there
are multiple ways
1 You could look at the data and determine the number of completely observed cases
2 You could use the missing data pattern to deduce the number of cases for which the pattern
1 1 1 1 (everything observed) holds.
3 You could use code to determine the number of cases (rows) that have no missings. For
example:
nrow(na.omit(nhanes))
It performs listwise deletion on the object you use the function on. In other words, it removes
any incomplete row.

To check the missing data patter, use
md.pattern(nhanes)
Looking at the missing data pattern is always useful (but may be difficult for datasets with
many variables). It can give you an indication on how much information is missing and how
the missingness is distributed.

If you want to create a missingness indicator to indicate if your variable is missing or not
missing you create a new vector
rbmi <- is.na(nhanes$bmi)
rbmi
You create a new vector rbmi (you can see it as a variable) that indicates whether bmi is
missing (TRUE) or not missing (FALSE), with the same length as the old variable.

To test if the missingness in one factor depends on another factor perform a t-test with
t.test(age ~ rbmi, data=nhanes)
You test here whether the missingness in bmi depends on age.

With a bivariate dataset you can calculate the correlation between the variables with the
following code
cor(data)

With partially incomplete data you can use ad hoc imputation methods to impute the missing
variables.
First you need to evaluate the means and correlation of the incomplete data set.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
willemijnvanes Universiteit Utrecht
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
63
Lid sinds
7 jaar
Aantal volgers
53
Documenten
6
Laatst verkocht
7 maanden geleden

4,4

5 beoordelingen

5
3
4
1
3
1
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen