Summary

Summary Advanced biological data analysis theory and codes

Name: Advanced biological data analysis theory and codes
SKU: doc_2259392
Rating: 4.00 (2 reviews)
Author: lauravandenend

Rating

4.0

(2)

Sold

Pages

Uploaded on

15-01-2023

Written in

2022/2023

This summary contains the theory given in the lectures and the codes used in the practical sessions. Since notes are allowed on the examen, this is al the information needed to answer the questions.

Institution

Course

Content preview

Laura van den End

0. Introduction
Cases and variables Variance and standard deviation
Cases: sampling unit - individuals Variance: σ2 (pop variance) or s2 (sample variance)
- Average squared deviation form the mean
Response variable: dependent outcome
- Measured variable you want to explain in function Length Dev from mean Squared dev
of the predictor variables - species abundances, from mean
gene expression, mortality
5 2 4
Predictor variable: independent variable
- Measured variable to help explain variation in 2 -1 1
response variable - pH, nutrient abundance, 2 -1 1
environmental conditions, body size, age
2 0 3
Types of variables
Categorical: non-numerical, factors 3 0 2
- Exp. treatment, sex → have discrete levels
Standard deviation: σ (pop) or s (sample) = √variance
Continuous: scale
- Body size, weight, pH, concentration, time
Percentiles
Value of variable below which x% of values lie
Count: integer - e.g 25% of the data lay below the 25th percentile
- Number of offspring, species abundance - Interquartile range: range between 25th and 75th
percentile
Ordinal:
- Preference on a scale from 1-7
The normal distribution
- Common distribution for continuous data
Descriptive vs inferential statistics - Bell-shaped, symmetrical around µ= x
Descriptive statistics: describe the data - Mean µ ± 1.96 * σ includes 95% of the observations
- Mean, standard deviation, correlation coefficient - Probability density function:
- Distribution of data, histograms, box plots

Inferential statistics: make inferences about a Skewness and kurtosis
population based on a sample Skewness: measure of asymmetry of distribution - 3rd
- Testing hypotheses with statistical tests standardized moment (mean = 1st moment, standard
- Calculating confidence intervals deviation = 2nd).
- Drawing conclusions
Kurtosis: pointless of the distribution - 4 th
standardized moment.
Descriptive statistics
(arithmetic) mean
- All values summed divided by # of observations The standard normal distribution
- Not informative for multimodal or asym distribut. A normal distribution with mean 0 and standard
- Sensitive to outliers deviation 1

Median ‘Standardizing’ your data means:
- Middle value if all values are ordered - Subtracting the mean
- Better summary statistic for asym distributed data - Dividing by the st.deviation
- Not sensitive to outliers - The resulting numbers are the
‘z-scores’ of your data points
Mode
- Value that appears most frequently in a data set

Advanced biological data analysis

, Laura van den End

Inferential statistics
We want to draw general conclusions about a
population based on sample
- Sample: part of pop that you studied
- Pop: all cases you could have studied

Standard error
When we calculate a statistic of a sample (e.g. the
mean), this is an estimate of that statistic for the
population. If we would sample again, we would get
a slightly different estimate every time. The standard
error is the standard deviation of that statistic across
our different samples

This is a measure of the precision that we have in
estimating the actual population statistic. We can
actually calculate this standard error based on just a
single sample: with n = Sample size.

Standard deviation vs standard error
The standard deviation is a measure of spread in our
sample ~ higher = more variability in the data.

The standard error is a measure of precision ~ higher
= the lower confidence in the accuracy of estimate.
- More data (the higher n) = lower the SE
- Confidence intervals are based on the SE

Using statistics to test hypotheses
H0: no effect, Q: can we reject H0 → when small
change to get our data, assuming H0 is true

Types of errors
Type I error (false positive) - we reject a true H0
- This is expected to happen in 5% of the cases!
- Multiple testing increases frequency

Type II error (false negative) - don’t reject false H0
- e.g. because sample size is too low (not enough
statistical power)

Note: we never accept or confirm H0 – we only do or
do not reject it

Advanced biological data analysis

, Laura van den End

1. Linear models
Continuous predictors Testing assumptions
STEP 1: visual inspection of raw data
> plot(body.length~heavy.metal.conc, data=caterpillars) Homogeneity of variances
STEP 2: regression line VISUALLY
- Draw the line → minimize the sum of squares of >spreadLevelPlot(fit3)
the difference between a datapoint and its - high absolute residuals = far away from reg. line
prediction - Low absolute residuals = close to regression line
- OLS - ordinaire least squares regression - We want equally distance. If the blue line is more
- Resulting line is given by 2 numbers: intercept and or less straight we have no problem.
slope:
TEST
>ncvTest(fit2)
STEP 3: fit a model → gives slope and intercept - If the p value is above 0.05 OK (no significant
> fit2 <- lm(body.length~heavy.metal.conc, data = data) deviation from homogeneous variances.
> summary(fit2)
NOT OK?
STEP 4: visualize results with effect plot - Transform data
>plot(allEffects(fit4), multiline = T, confint = list (style = - See if outliers
"auto")) - Use a model that allows for non-homogeneous
variances (gls)
STEP 5: hypothesis testing
- Take the summary table
- Take our confidence level given by SE Normality of residuals
- T value (estimate divided by SE) → more extreme
= less likely to get data if H0 is true VISUALLY
hist(rstudent(fit4), probability=T, ylim=c(0,0.5),
main="Distribution of Studentized Residuals",
Categorical predictors xlab="Studentized residuals”)
- Histogram of the studentized residuals of the
2 levels model
STEP 1 + 2 + 3 + 5: same
xfit=seq(-3,3, length=100)
STEP 4: same - Create a vector of X values for the normal
- R standard: ‘treatment coding’ = 1st alphabetical as distribution from -3 to 3
the reference level
- Sum coding → mean of all levels as reference level yfit=dnorm(xfit)
- Useful if collinearity in the data lines(xfit, yfit, col=“red”,lwd=2)
- Calculate and put values for a standard normal
More than 2 levels distribution of the range of x values given above
STEP 1 + 2 + 3 + 4: same
TEST
>shapiro.test(residuals(fit4))
STEP 5: check anova table for overall effect on the
- If W > 0.9 is OK
categorical predictor with more than 2 levels
> Anova(fit4, type=“III”)
Linearity
STEP 6: post-hoc comparisons >residualPlots(fit2)
- which levels of our predictor are different from - No strong relation is OK
each other?
> emmeans(fit4, ~samp.loc) Outliers and in uential observations
> contrast(emmeans(fit4, ~samp.loc), method='pairwise', > outlierTest(fit2) > cd <- cooks.distance(fit2)
adjust=‘Tukey’) > inflobs=which(cd>1);inflobs

Advanced biological data analysis

fl

Report Copyright Violation

Written for

Institution: Katholieke Universiteit Leuven (KU Leuven)
Study: Biologie
Course: Advanced biological data analysis (G0F87A)

All documents for this subject (1)

Document information

Uploaded on: January 15, 2023
Number of pages: 25
Written in: 2022/2023
Type: SUMMARY

Subjects

data
analysis

$8.32

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

lauravandenend

4.2

(5)

Reviews from verified buyers

Showing all 2 reviews

yassinsabori1 Biochemie En Biotechnologie · 2 reviews

1 month ago

valentinicarlotta Biochemie · 4 reviews

1 year ago

4.0

2 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

lauravandenend Katholieke Universiteit Leuven

View profile

Sold

Member since

4 year

Number of followers

Documents

Last sold

2 months ago

4.2

5 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lauravandenend. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.32. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 50030 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Advanced biological data analysis theory and codes

Content preview

Written for

Document information

Subjects

Reviews from verified buyers

Get to know the seller

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?