YEAR 1
QUARTER 4
2019
Genomics and Big
Data
SUMMARY OF THE COURSE GENOMICS AND BIG DATA
NWI-BP031
ELISE REUVEKAMP
,Content
Lecture 1: Introduction.......................................................................................................................2
Lecture 2: Genomes............................................................................................................................2
Lecture 3: Visualization.......................................................................................................................5
Lecture 4: Exploratory data analysis and hypothesis testing..............................................................7
Lecture 5a: Microbial Metagenomics.................................................................................................8
Lecture 5b: Reads, mapping, epigenomics.........................................................................................9
Lecture 6: Genomics going forward..................................................................................................13
Lecture 7: RNA sequencing in the clinic............................................................................................14
Open (data) and reproducibility...................................................................................................14
Visualization.....................................................................................................................................15
Eukaryotic genome organisation......................................................................................................16
Genomics..........................................................................................................................................17
Statistics...........................................................................................................................................22
, Genomics and Big Data
Lecture 1: Introduction
The 5 V’s of big data:
1) Volume: Scale of data
2) Velocity: Analysis of streaming data
3) Variety: Different forms of data
4) Veracity: Uncertainty of data
5) Value: What are the results we get out?
- Data science is the process by which data becomes understanding knowledge and insight
- Data wrangling involves obtaining data, transforming data in the right format, cleaning data
and integrating data from different sources
- Data visualisation is what you do to show your data
- Hypothesis data science is to test an assumption
- Fishing data science is looking in the data and see what we find
- Exploratory research is something you do to better understand your data, you also do
exploratory data analysis including visualisation and descriptive statistics to discover new
patterns and generate new hypothesis.
Lecture 2: Genomes
Humans are diploid, but contain some haploid cells (gametes) and some cells are polyploid (liver
cells, bone marrow, cancer cells, placenta). Not all eukaryotes are diploid, polyploidy is very common
with plants and some vertebrates can be polyploid too.
DNA structure:
1) Double strand helix structure
2) Complex with histones
3) DNA and histone complex coils up to a thick fibre
4) Loops of the coiled fibre
5) Compressed and folded
6) Coiling again to chromatids of the chromosome
Sanger sequencing → first generation sequence
1. Fragment the genome
2. Clone small fragments into plasmids, lots of colonies
3. Use a primer to sequence the plasmid into genomic
DNA
4. Assemble the genome from overlapping reads
The sequence the genome first has been a competition between universities and Celera. The
universities had a fragment hierarchical method and the Celera corporation cut the whole genome.