100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Resumen

Samenvatting Data Science In Biomedicine (WMBM023-05)

Puntuación
-
Vendido
4
Páginas
9
Subido en
09-10-2022
Escrito en
2022/2023

De lectures zijn duidelijk samengevat en bevatten alles wat je moet weten voor het tentamen.

Institución
Grado









Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
9 de octubre de 2022
Número de páginas
9
Escrito en
2022/2023
Tipo
Resumen

Temas

Vista previa del contenido

Summary Data Science in Biomedicine
Lecture 1: Introduction
Using next generation sequencing (NGS), we can sequence whole genomes very quickly, creating a lot of
data as output. These huge datasets are analyzed with programming languages like R or python. It can be
used to retrieve data from a database, apply statistical analyses, and visualize results. R is very dedicated
to statistics and very popular in research. As opposed to Excel, R data cannot be edited. Data is plotted
using the ggplot() function, which allows easy plotting of subsets, multiple graphs in 1 plot, and way more
useful functions.


Lecture 2: Statistics 1 à P-values, T-tests, and linear regression
P-value
Measurements show variation. Based on the main source of the
variation, you might want to re-think your experiment. P-values
are the probabilities of an observed result. Often a cutoff of 5% is
used. However, in some cases, it is important to include the
‘impact of risk’. P-values do not tell you if it’s good or bad:
evaluate the starting point of 0.05 (ethical discussions).
- H0 (null hypothesis): thing we are trying to provide
evidence against (often something like ‘no effect’ or ‘no
difference’.
- Ha (alternative hypothesis): what we are trying to prove.
- If using a significance of p = 0.05: p < 0.05, H0 can be rejected.

T-tests
But how can we calculate the p-value? T-statistics compare data sets and tell you if they are different from
each other (e.g. a group with drug and group with placebo). There are different t-tests:
1. Independent samples: compares the means for two independent groups
a. Students from different universities
2. Paired samples: compares means from the same groups
a. Different time points (before and after)
3. One sample: test the mean of a single group against a known mean
a. Alcohol consumption of a group higher than the average

Paired T-test
If we test the same sample or patient before and after treatment: null hypothesis is
that there is no difference. We can check for a significant difference in R, using for example boxplots or
vioplots. However, it can also be done by hand, with the formula on the right. You can calculate the t-
value, and ΣD is the sum of the differences (before – after) and N is
the number of samples. When using this formula, and getting e.g.
the value of t = -2.77 (but we disregard the minus sign), we look at
the T-distribution table, use our set cutoff of 0.05, and the degrees
of freedom (which is the sample size -1). The value that is found in
the table forms the borders of the rejection zone. If the value in the


1

, table is smaller than the t-value, we can reject the null hypothesis (they are not equal). This can easily be
calculated in R.

Independent T-test
If we compare the means of two sets of independent data
(categorical groups like females vs males), this test is used. The
formula is slightly more complicated (see on the right) but it still gives
a t-value. Also different numbers of samples can be used. The only
different character used is μ, which is the mean of the data set. Degrees of freedom is calculated by nA-1 +
nB-1. Using the cutoff and the degrees of freedom, we can find a value in the T-distribution table (again
forming the – and + borders of the rejection area). If the t-value lies within
these borders, the null hypothesis cannot be rejected.

Sometimes linear regression (y = ax + b) is used to predict the value of a
variable based on the value of another variable. If for example looking at
cells that double each cycle, a log base can be used (gives a straight line).


Lecture 3: Statistics 2 à outliers, permutation, FDR,
Fischer’s, Chi-squared
Outliers
One outlier in a (small) data set can drastically change the outcome of statistical tests (different t-value, or
different means). For t-tests, we want reliable means, and therefore we remove outliers. A universal
method for outlier detection is based on the interquartile range. Q1 is the
middle between the smallest number and the median of the data set. Q2 is
the median (literally the middle number), and Q3 is the middle number
between the largest number and the median of the data set (N-Q1+1). The
IQR = Q3 – Q1. The solution for outliers: remove all values < Q1 – 1.5*IQR,
and remove all values > Q3 + 1.5*IQR (see example).




Permutation testing
T-tests assume that the data is normally distributed. By permutation testing, you can test the distribution
of the data. For paired t-tests, we pick all our data (ignore before and after), and randomly divide this over
A and B. This is done 1000 to 10000 times, and each time the p-value is calculated. For independent
samples, the same is done (and the categories are ignored). If the original p-value was correct, we expect
that the p-values of the randomized values are higher (95% of the p-values >= the original p-value).

Multiple testing correction (FDR testing)
If a p-value is lower or equal to 0.05, there is a 95% certainty that the claim (alternative hypothesis) is true.
However, 0.05 cannot be used in every situation. Especially if there are a lot of samples (typically in
transcriptomics, genomics, and proteomics), a huge number of samples will show false positive. Therefore,
multiple testing correction is required:

2
$5.47
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
sarajasmijn84 Rijksuniversiteit Groningen
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
52
Miembro desde
4 año
Número de seguidores
30
Documentos
17
Última venta
2 semanas hace

4.8

4 reseñas

5
3
4
1
3
0
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes