100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Big Data in Biomedical Sciences Summary

Rating
-
Sold
1
Pages
28
Uploaded on
23-10-2025
Written in
2023/2024

This summary contains all information from the lectures, important for the exam of Big Data in Biomedical Science at the Vrije Universiteit Amsterdam.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 23, 2025
Number of pages
28
Written in
2023/2024
Type
Summary

Subjects

Content preview

Big Data in Biomedical Science
Week 1
Lecture: what is data
Volume, Variety, Velocity and Value

Big data is getting bigger and cheaper, availability of data is more and there is fast growing computer power
these days, making it attractive to have big data. Thus: there is more data, after the accumulation there needs
to be analysis and hypothesis driven research has limitations to it. This is why big data is getting more and more
popular.




Week 2
Lecture 3: Genetics
In the past decades there was a rapid change with new technologies, methods, large scale collaborations and
novel disease insights.

There still are issues in human genetics.
- Relative influence of G(enes) and E(nvironment) still under debate.
o We need more reliable estimates + overview across age, sex, population
- Nature of G influence still under debate
o Do they all act in additive way or is there an non-additive functionality?
- Determining causal mechanisms
o This is a challenge to polygenic traits. We still have questions like how do we detect and
interpret the findings?

Twin studies were largely used to determine relative influence of genes and environment. Both twins used:
- MZ (monozygotic): share 100% of genes, 100% of shared environment, 0% non-shared environment
- DZ (dizygotic): share average of 50% of genes, 100% of shared environment, 0% of non-shared
environment
Lets say a trade is….
 100% heritable and additive: similarity in MZ twice as high as in DZ twins:
o 2Rmz= rDZ
 100% environment influences differences between individuals: similarity between MZ and DZ in
more or less similar (because their environment was 100% the same):
o RMz=RDZ
Posthuma wanted to do a meta-analysis to combine all the various twin studies done between 1900-2012. They
extracted: sample size (N) and correlation (r) from MZ and DZ. And they also extracted estimated influences of
genes (h2) and influences of environment (c2). If they could, they extracted separately for male/female, four age
groups and populations. Then they standardized trait classification, because every study defined IQ as
something different.
 Almost 3000 twin studies done from 1958 until 2012.
 Reported 17 804 traits
 Total sample size was 14,5 million twin pairs (dependent)
They combined all of the traits together. The question they wanted to answer was: how heritable is any
measurable trait in a person?
 The estimated heritability for any trait that you can measure in MZ twins, this was 0.64 and in DZ, this
was 0.34.
 Trait of interest is any trait that we can measure: The estimated heritability of that trait is almost 50%.
So why we are different in those genes, is because we differ 50% in genes. Only 17% was explained by
shared environment.
Main conclusions:
- All traits heritable to some extend
- Influence of c (shared environment) is relatively small

, - Majority of traits are consistent with a model where all genetic variance is additive.

Heritability
- Proportion of trait variance attributable to genetic variance
- The extend of which observed individual differences can be traced back to genetic differences
- Causes genetically related individuals to correlate on a trait
- Suggests that variations in genes underlie trait differences between individuals
Two important discoveries in genome:
- Structure of DNA (1953)
- Sequencing of the human genome (3*109 base pairs) (2002).

DNA facts
 23 sets of chromosomes in each cell
 Chromosomes are transmitted from parent to off spring via meiosis
 Each single chromosome is a DNA-molecule
 DNA molecule consists of nucleuotides (ACGT)
 DNA is a double helix with A+T and G+C
 Codon: three bases. Transcription and translation make up amino-acids.
o Multiple codons code for the same aminoacid: DNA is robust against errors.
 Codons between start and end sites: genes, provide blueprints for proteins
 One chromosome consists of non-genic (90%) and genic regions (10%)
 Humans have around 22-24 thousand genes
 Not every gene is expressed in a cell
 Specific set of genes expressed in a cell, determines its cell type

We share 87,5% of our DNA with a mouse, 99% with a chimpanzee and 99.9% with an individual. A million sites
in our DNA differs between individuals, which results in phenotypic differences.
 Genetic variations, SNPs can occur:
o In gene: protein coding, regulatory region, exonic, intronic
o Outside genes: regulatory or of unknown function.
They can be:
o Harmless (small, no harmful change in phenotype)
o Harmful
o Latent: dependent on another factor
o Silent
 Causes of variations:
o Mutation: level of base pairs
o Recombination: level of parts of the chromosome
o Segregation: level of combination of chromosomes

Monogenic vs polygenic disorders
 Monogenic: influences by one gene. Most genetic causes already known.
 Polygenic disorders/traits: influenced by > genes, of which the causes are mostly unknown. They are
often very complex, because they are caused my multiple genetic and environmental factors with
possible interaction.

Why do we want to find genetic variants linked to disease?
- Novel biological insights  clinical advances  therapeutic targets, biomarkers, prevention
- Improved measures of individual aetiological processes  personalized medicine  diagnostics,
prognostics, therapeutic optimization.

Association study design: we have a control group, 5% has variant X. we have a case group, 21% has variant X.
Candidate gene studies (until 1990): preselect several genes based upon knowledge and convenience. Then you
test for association with a trait.

Genome Wide Association Studies:
Microarrays can contain more than 1 million tagging SNPs covering the genome in high density.

, Strategy:
- Genotype of large set of individuals (cases+controls) on ~1 million SNPs.
o We don’t need to do all 3*109 SNPs, since close by SNPs are often correlated
- For each SNP: compare allele frequency across cases and controls, conduct a statistical test for a
difference in frequency.

The first outcome of GWAS is a Manhattan plot.
- Every dot: association test of one single SNP in a genome
Advantages:
- May identify several possible loci as spans whole genome
- Relationships between loci may identify new biological pathways
- Results from multiple studies can be integrated, aiding prioritization of genes for replications and
increasing statistical confidence
Disadvantages
- Increased likelihood of false positives
- Population stratification
- Large number of samples needed
- Vast amounts of data analysed and produced, need cluster computers.

The initial sample sizes for GWAS were 5 to 10 thousands, where they actually needed 100 thousands or
millions to ultimately allow the detection of genetic variants with very small effects.
Schizophrenia:
- N= 3300 cases/3500 controls: 2 risk genes found in whole genome
- N= 9000 cases/12 000 controls: 7 risk genes found in whole genome
- N= 14 000 cases/18 000 controls: 22 risk genes found in whole genome
- N= 40 000 cases/50 000 controls: 131 risk genes found  current largest GWAS of Schizophrenia
o However: these genes explain <3% of the liability of schizophrenia
o However: risk associated to 8300 genetic variants explains ~32% of liability, so this includes
variants found in the study but not statistical significant
Huge sample sizes were needed and currently reached. GWAS results is found to be very reliable, many genes
now are discovered. GWAS of their own group had 1 million individuals.

Many complex traits are highly heritable. GWAS has detected some genetic variants, but as is said, this is a very
small portion of the whole genetic variance (<2%).
- Effect sizes are small, so large samples are needed
- No functional information is gotten: most findings are outside of the genes and not inside.
The majority of human complex traits are probably caused by thousands of genes of very small effect.

How do we biologically interpret GWAS results?  gain mechanistic insight
- GWAS provides associated variant genes, yet no functionality of the variant
- Current GWAS results are difficult to interpret, complicating the formulation of hypotheses that can be
tested in functional experiments
There are 4 issues with GWAS hits for polygenic traits:
1. They are mostly outside genes or in non-coding genic regions, with likely regulatory functions that are
currently unknown
2. They have small effects
3. SNPs are correlated, which complicated pinpointing the causal SNP
4. There are 100s of genes involved in polygenic traits: a single gene will not provide the whole picture

What about the SNPs that are correlated?
GWAS first output is a Manhattan plot. A loci is a peak of the plot. When we zoom in on a locus, there are
multiple SNPs found on one locus that are all statistically significant. The top SNP is a diamond in the plot, but
it’s not most significant. The colour of the other SNPs denote how strongly they are correlated with the top SNP.
When they are closely located to this SNP, the genotype is correlated, meaning a very small p-value. In the area
there are many genes. If there would have been only one, it would’ve been easier to say which gene is
associated with which phenotype. However this is not the case, because multiple genes and SNPs are found in
the locus, it is hard which one is the one that is most likely to cause the phenotype?
$9.06
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
ines11

Get to know the seller

Seller avatar
ines11 Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
2
Member since
1 month
Number of followers
0
Documents
7
Last sold
5 days ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions