Samenvatting

Summary of practicals

Beoordeling

Verkocht

Pagina's

Geüpload op

07-04-2021

Geschreven in

2020/2021

Alle practica van MDCE samengevat.

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Universiteit Utrecht (UU)
Studie: Minor methoden en statistiek
Vak: Missing Data Theory and Causal Effects

Alle documenten voor dit vak (2)

Documentinformatie

Geüpload op: 7 april 2021
Aantal pagina's: 16
Geschreven in: 2020/2021
Type: Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Notes on practicals
Practical 1

We know it is a function in R, because of the parenthesis.

If you are creating a dataframe in the way
dat3a <- as.data.frame(mat3)
from an abject in which the propreties are not correct, the resulting dataframe is not correct.
Therefore, you should create a dataframe from the data itself
dat3b <- data.frame(V1 = vec1, V2 = vec2)
when your objects are both numerical and characters
vec 1 <- 1, 2, 3, 4, 5, 6
vec2 <- A, B, C, D, E, F

Factor = a categorical variable with a numerical representation
With the factor function you can change the labels of your factors, assign ‘Utrecht’ to 1.

Overview of the dimensions of a dataset, rows and columns
dim(boys)
With the head and tail function, you get the first or last 6 cases. Way of inspecting your
dataset.

Labels for missing data: <NA>(non-numeric data) or NA (numeric data) means not available

Using the exclamation mark (!), turns TRUE into FALSE and FALSE into TRUE.

To inspect your data you can use different functions:
The structure function gives you an overview of the measurement levels, of the head of the
data (first few variables), and the class of the variables
str(boys)
The summary function gives you information about the distribution for numeric data, and the
table for categorical data on all the variables.
summary(boys)
If you want to explore a certain dimension, you use the dollar sign ($). For example the
standard deviation of age in the dataset boys.
sd(boys$age)
We cannot calculate a standard deviation without telling R how to deal with the missingness.
na.rn = TRUE
means remove the missing values. So, then you will only calculate the standard deviation on
the observed data.

If you want to ask for data with two combined variables, we need two separate evaluations.
mean(subset(boys, age < 15 & reg 1= “north”)$age, na.rn = TRUE)
Within the subset you specify your two dimensions, and then you only use the subset age.

When you load a dataset you can open a help-screen with
?mammalsleep
and it gives you information about the variables names.

,The input for a correlation function for each complete observed pair is
cor(sleepdata, use = “pairwise.complete.obs”)
Exclude the categorical columns, for example column one, by using
cor(sleepdata(,-1), use = “pairwise.complete.obs”)
However, the correlationmatrix has many decimals, so take this into account with the round
function. You can for example round the correlations to two decimals
round(cor(sleepdata(,-1), use = “pairwise.complete.obs”), 2)

Convenient functions, any object in the workspace can be saved.
save.image(“Practical_X.RData”)
save(sleepdata, file “Sleepdata.RData”)

If you want to exclude variables, you can do this with the names of the variables
exclude <- c(“Echida”, “Lesser short-tailed shrew”, “Musk shrew”)
which <- sleepdata$species %in% exlcude
The which is a vector with the same length of the data and when you apply this you only get
the names back by default for which it says TRUE. So your new dataset with the excluded
variables would be
sleepdata2 <- sleepdata(!which, )

When plotting your variables, you use ~ which indicates that you want to model something,
based on something else. It separates the outcome part from the predictor, allowing for a
visual representation.
plot(brw ~ species, data = sleepdata2)

If you want to find all your cases that are higher/lower than one standard deviation above the
mean, you take several steps
sd.brw <- sd(sleepdata2$brw)
mean.brw <- mean(sleepdata2$brw)
which <- sleepdata2$brw > (mean.brw + (1 * sd.brw))
as.character(sleepdata2$species[which])
So, you calculate the standard deviation and the mean of brain weight, then you make a new
object (this overrides your last used code under which). With which you calculate the
variables bigger than one standard deviation above the mean, and expose the species for
which which holds as a character.

Practical 2

Objects in R are case-sensitive. This means that
a <- 100
A <- 200
are different characters with each their own value.

To learn more about the data, use one of the two following help commands
help(nhanes)
?nhanes
To get an overview of the data, use
summary(nhanes)

, When you want to explore the missingness in the dataset you can use the summary command,
or
apply(nhanes, MARGIN = 2, FUN = function(x) sum(is.na(x)))
The code ‘applies’ the function that calculates the sum (sum()) over the missings (is.na) on a
set of data (x). The nice thing about apply is that you can apply functions on two-dimensional
objects. In this case you execute a function that calculates the sum of missings (FUN =
function(x) sum(is.na(x))) over the columns (MARGIN = 2) of object nhanes. If you would
change MARGIN = 2 to MARGIN = 1, you would do the same, but over the rows of nhanes.

The function colMeans()calculates the mean of numerical columns
colMeans(nhanes, na.rm=TRUE)
However, you have to specify how you would like to handle the missing values. By using
na.rm=TRUE
it tells R that you would like to remove (rm) the missings (na).

To determine how many cases would be available if only the complete cases were used, there
are multiple ways
1 You could look at the data and determine the number of completely observed cases
2 You could use the missing data pattern to deduce the number of cases for which the pattern
1 1 1 1 (everything observed) holds.
3 You could use code to determine the number of cases (rows) that have no missings. For
example:
nrow(na.omit(nhanes))
It performs listwise deletion on the object you use the function on. In other words, it removes
any incomplete row.

To check the missing data patter, use
md.pattern(nhanes)
Looking at the missing data pattern is always useful (but may be difficult for datasets with
many variables). It can give you an indication on how much information is missing and how
the missingness is distributed.

If you want to create a missingness indicator to indicate if your variable is missing or not
missing you create a new vector
rbmi <- is.na(nhanes$bmi)
rbmi
You create a new vector rbmi (you can see it as a variable) that indicates whether bmi is
missing (TRUE) or not missing (FALSE), with the same length as the old variable.

To test if the missingness in one factor depends on another factor perform a t-test with
t.test(age ~ rbmi, data=nhanes)
You test here whether the missingness in bmi depends on age.

With a bivariate dataset you can calculate the correlation between the variables with the
following code
cor(data)

With partially incomplete data you can use ad hoc imputation methods to impute the missing
variables.
First you need to evaluate the means and correlation of the incomplete data set.

€5,99

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

willemijnvanes

4,4

(5)

Maak kennis met de verkoper

willemijnvanes Universiteit Utrecht

Bekijk profiel

Volgen

Verkocht

Lid sinds

7 jaar

Aantal volgers

Documenten

Laatst verkocht

9 maanden geleden

4,4

5 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper willemijnvanes. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 57595 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary of practicals

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Universiteit Utrecht (UU) > Minor methoden en statistiek

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?