1. Databases & alignment.......................................................................................................................4
Section 1: databases...........................................................................................................................4
Exercise 1.1: NCBI databases..........................................................................................................4
Exercise 1.2: UniProt.......................................................................................................................9
Section 2: pairwise alignment...........................................................................................................10
Exercise 2.1: getting familiar with dot plots..................................................................................10
Exercise 2.2: studying the influence of the word size on dot plots...............................................13
Exercise 2.3: Read a dot plot........................................................................................................14
Excercise 2.4 Exploring the NW and SW algorithms.....................................................................14
Excercise 2.5 Selecting the proper alignment tool........................................................................15
Excercise 2.6: Studying the effect of the parameters on pairwise alignments..............................17
Section 3: Multiple sequence alignment..........................................................................................20
Exercise 3.1: progressive versus iterative methods......................................................................20
Excercise 3.2 Comparison of orthologous gene products.............................................................21
Overview question............................................................................................................................22
Homework........................................................................................................................................23
Q1: Extracting phenotype information from DNA........................................................................23
Q2: Drawing a dotplot by hand.....................................................................................................24
Q3: Describe briefly how you would solve following problems...................................................25
2. Search & motifs................................................................................................................................27
Section 4: Sequence similarity search...............................................................................................27
Exercise 4.1: Getting familiar with sequence similarity search: word size....................................27
Excercise 4.2: getting familiar with sequence similarity search: BLAST........................................28
Section 5: Sequence motifs..............................................................................................................33
Excercise 5.1: basic motif exercises..............................................................................................33
Excercise 5.2: motif visualization and searching...........................................................................35
3. Phylogenetic trees, protein structure & gene ontology....................................................................38
Section 6: Phylogenetic trees...........................................................................................................38
Excercise 6.1 Unkown organisms and its position in the tree of life.............................................38
Excercise 6.2: orthologs and paralogs: the evolution of hemoglobin...........................................40
Section 7. Protein structure..............................................................................................................43
Excercise 7.1: primary sequence based information....................................................................43
Excercise 7.2: Secondary structure prediction..............................................................................46
Excercise 7.3: viewing 3D protein structures................................................................................47
Section 8. Gene ontology..................................................................................................................48
Excercise 8.2: searching the GO database (GO terms)..................................................................49
, Excercise 8.3 Searching the GO database (protein)......................................................................49
Overview exercise: analysis of SARS-CoV-2 and other coronaviruses...............................................50
Coronavirus phylogeny.................................................................................................................50
Infectivity of SARS-CoV-2..............................................................................................................50
Assessing the infection risk in other vertebrates..........................................................................51
4. Next Generation Sequencing............................................................................................................53
Nota’s...............................................................................................................................................53
Introduction to the practical excercises........................................................................................57
Section 9: genome analysis...............................................................................................................59
Exercise 9.1: on FASTQ file format................................................................................................59
Appendix: description of file types...................................................................................................67
SAM/BAM files..............................................................................................................................67
VCF/BCF files.................................................................................................................................67
TESTJES PRACTICA 1-4..........................................................................................................................70
,PRACTICUM 1: databases &
alignment
, 1. Databases & alignment
Section 1: databases
Exercise 1.1: NCBI databases
1.1.1 Gene database
NCBI is a US-based organisation that maintains several reference databases for biological and
molecular data. We are interested in the Gene database right now, which you can reach by
navigating to Genes & Expression and then Gene. Underneath the search box you can click Advanced
to create a custom query. Try to look for the human version of the insulin gene.
1. Which diseases are associated with problems of insulin metabolism?
Insulin-dependent diabetes mellitus, permanent neonatal diabetes mellitus, maturity-onset diabetes
of the young type 10 and hyperproinsulinemia.
- De eerste kolom geeft de ID van het gen in de database
- De tweede kolom geeft de beschrijving van het gen
- De derde kolom geeft het chromosoom en de locatie in het chromosoom
Onder fenotypes en dan ‘associated conditions’ vindt men de geassocieerde ziekten met insulin. Hier
zijn dit verschillende vormen van diabetes. MedGen, een andere NCBI database kan gebruikt worden
om meer informatie te vergaren over de oorzaak en het fenotype van de ziekte, relevante
publicaties,…