Inhoud
Hoofdstuk 1 ............................................................................................................................................. 6
Getting started .................................................................................................................................... 6
Compare sequences ........................................................................................................................ 6
Search sequence databases ............................................................................................................ 6
Sequence evolution ......................................................................................................................... 6
Discover sequence patterns ............................................................................................................ 6
Sequence function ........................................................................................................................... 6
A closer look at the humane genome ................................................................................................. 6
Sequence technology and challenges ................................................................................................. 6
Science in a world of sequenced genomes ......................................................................................... 6
From DNA to understanding biological complexity ............................................................................ 6
DNA integration towards system biology............................................................................................ 7
What is bioinformatics? ...................................................................................................................... 7
Hoofdstuk 2 ............................................................................................................................................. 8
Databases ............................................................................................................................................ 8
Algorithm ............................................................................................................................................. 8
A simple DNA translation algorithm ................................................................................................ 8
Principles of classification ................................................................................................................... 8
A classification algorithm ................................................................................................................ 8
The confusion matrix ....................................................................................................................... 8
Graph theory ....................................................................................................................................... 9
Graph representation .......................................................................................................................... 9
Definitions ........................................................................................................................................... 9
Computational complexity .................................................................................................................. 9
Solving a complex problem ............................................................................................................. 9
Big-O notation ................................................................................................................................. 9
Complexity classes......................................................................................................................... 10
Heuristic versus exhaustive approaches ....................................................................................... 10
Numerical optimization ..................................................................................................................... 10
A few typical optimisation methods ............................................................................................. 11
Oefeningen ........................................................................................................................................ 11
Hoofdstuk 3 ........................................................................................................................................... 12
Introduction....................................................................................................................................... 12
Substitution matrix ........................................................................................................................ 12
, A few more terms .......................................................................................................................... 12
Dot matrix method ............................................................................................................................ 13
Back to basics ................................................................................................................................ 13
Global alignment: NW ....................................................................................................................... 13
Needleman-Wunsh........................................................................................................................ 13
Local alignement: SW ........................................................................................................................ 14
Smith-Waterman ........................................................................................................................... 14
Substitution matrices ........................................................................................................................ 14
Building the substitution matrix .................................................................................................... 14
Log odds calculation ...................................................................................................................... 14
Dayhoff PAM ................................................................................................................................. 14
Relative mutability......................................................................................................................... 14
PAM ............................................................................................................................................... 14
BLOSUM......................................................................................................................................... 15
PAM versus BLOSUM..................................................................................................................... 16
Oefeningen ........................................................................................................................................ 16
Hoofdstuk 4 ........................................................................................................................................... 19
Introduction....................................................................................................................................... 19
Dynamic ............................................................................................................................................. 20
The Lipman MSA algorithm ........................................................................................................... 20
Progressive ........................................................................................................................................ 20
Feng & Doolittle............................................................................................................................. 21
CLUSTAL ......................................................................................................................................... 21
Conclusie ....................................................................................................................................... 21
Iterative ............................................................................................................................................. 22
Waarom? ....................................................................................................................................... 22
MUSCLE ......................................................................................................................................... 22
MAFT ............................................................................................................................................. 22
SAGA .............................................................................................................................................. 22
Consistency-based ............................................................................................................................. 22
ProbCons ....................................................................................................................................... 22
T-Coffee ......................................................................................................................................... 23
Phylogeny-aware ............................................................................................................................... 23
Wrap-up............................................................................................................................................. 23
Two pitfalls .................................................................................................................................... 23
What is the optimal alignement? .................................................................................................. 23
, Further optimalization................................................................................................................... 23
Editing multiple alignement .......................................................................................................... 23
Tips ................................................................................................................................................ 24
Which method? ............................................................................................................................. 24
Oefening ............................................................................................................................................ 24
Hoofdstuk 5 ........................................................................................................................................... 25
Introduction....................................................................................................................................... 25
Some search applications .............................................................................................................. 25
Stappenplan................................................................................................................................... 25
Dynamic vs Heuristic ..................................................................................................................... 25
Smith-Waterman ............................................................................................................................... 25
FASTA................................................................................................................................................. 25
Stappenplan................................................................................................................................... 26
Significance .................................................................................................................................... 26
BLAST ................................................................................................................................................. 26
BLAST vs FASTA.............................................................................................................................. 27
Stappenplan................................................................................................................................... 27
E-values and p-values .................................................................................................................... 28
Oefeningen ........................................................................................................................................ 28
Hoofdstuk 6 ........................................................................................................................................... 29
Introduction....................................................................................................................................... 29
Frequent words problem............................................................................................................... 29
Conclusion so far ........................................................................................................................... 29
Sequence motifs ............................................................................................................................ 29
Common definitions ...................................................................................................................... 29
Motif representation ......................................................................................................................... 30
Consensus string............................................................................................................................ 30
Regular expression ........................................................................................................................ 30
Probabilistic representation .......................................................................................................... 30
Motif discovery.................................................................................................................................. 32
Motif discovery methods .............................................................................................................. 32
Deterministic methods .................................................................................................................. 32
Probabilistic methods: Gibbs sampling ......................................................................................... 32
Oefeningen ........................................................................................................................................ 33
Hoofdstuk 7 ........................................................................................................................................... 34
Introduction....................................................................................................................................... 34
, Evolution........................................................................................................................................ 34
Phylogeny ...................................................................................................................................... 34
Definitions ..................................................................................................................................... 34
Tree nomenclature ........................................................................................................................ 35
Sequence selection............................................................................................................................ 35
Role of a synonymous substitution ............................................................................................... 35
Non-coding regions contain additional, unique information ........................................................ 35
Sometimes proteins are more informative ................................................................................... 36
Multiple alignement .......................................................................................................................... 36
Substitution models .......................................................................................................................... 36
Distance estimation ....................................................................................................................... 36
Modelling evolution as Markov proces ......................................................................................... 36
Jukes & Cantor ............................................................................................................................... 36
Kimura substitution model (α≠β) .................................................................................................. 36
Tamura substitution model ........................................................................................................... 37
Generalised time-reversible model (GTR) ..................................................................................... 37
Conclusie ....................................................................................................................................... 37
Tree construction .............................................................................................................................. 37
Tree & branch length..................................................................................................................... 37
Tree construction .......................................................................................................................... 37
Distance based .............................................................................................................................. 37
Charachter based .......................................................................................................................... 38
Tree evaluation.................................................................................................................................. 40
Bootstrapping ................................................................................................................................ 40
Oefeningen ........................................................................................................................................ 40
Hoofdstuk 8 ........................................................................................................................................... 42
Introduction....................................................................................................................................... 42
Read alignement............................................................................................................................ 42
De novo assembly.......................................................................................................................... 42
Sequence assembly theory................................................................................................................ 42
The assembly problem .................................................................................................................. 42
First law of assembly ..................................................................................................................... 42
Second law of assembly ................................................................................................................ 42
Overlap graphs .............................................................................................................................. 42
Shortest common superstring problem ........................................................................................ 42
Third law of assembly .................................................................................................................... 43
Hoofdstuk 1 ............................................................................................................................................. 6
Getting started .................................................................................................................................... 6
Compare sequences ........................................................................................................................ 6
Search sequence databases ............................................................................................................ 6
Sequence evolution ......................................................................................................................... 6
Discover sequence patterns ............................................................................................................ 6
Sequence function ........................................................................................................................... 6
A closer look at the humane genome ................................................................................................. 6
Sequence technology and challenges ................................................................................................. 6
Science in a world of sequenced genomes ......................................................................................... 6
From DNA to understanding biological complexity ............................................................................ 6
DNA integration towards system biology............................................................................................ 7
What is bioinformatics? ...................................................................................................................... 7
Hoofdstuk 2 ............................................................................................................................................. 8
Databases ............................................................................................................................................ 8
Algorithm ............................................................................................................................................. 8
A simple DNA translation algorithm ................................................................................................ 8
Principles of classification ................................................................................................................... 8
A classification algorithm ................................................................................................................ 8
The confusion matrix ....................................................................................................................... 8
Graph theory ....................................................................................................................................... 9
Graph representation .......................................................................................................................... 9
Definitions ........................................................................................................................................... 9
Computational complexity .................................................................................................................. 9
Solving a complex problem ............................................................................................................. 9
Big-O notation ................................................................................................................................. 9
Complexity classes......................................................................................................................... 10
Heuristic versus exhaustive approaches ....................................................................................... 10
Numerical optimization ..................................................................................................................... 10
A few typical optimisation methods ............................................................................................. 11
Oefeningen ........................................................................................................................................ 11
Hoofdstuk 3 ........................................................................................................................................... 12
Introduction....................................................................................................................................... 12
Substitution matrix ........................................................................................................................ 12
, A few more terms .......................................................................................................................... 12
Dot matrix method ............................................................................................................................ 13
Back to basics ................................................................................................................................ 13
Global alignment: NW ....................................................................................................................... 13
Needleman-Wunsh........................................................................................................................ 13
Local alignement: SW ........................................................................................................................ 14
Smith-Waterman ........................................................................................................................... 14
Substitution matrices ........................................................................................................................ 14
Building the substitution matrix .................................................................................................... 14
Log odds calculation ...................................................................................................................... 14
Dayhoff PAM ................................................................................................................................. 14
Relative mutability......................................................................................................................... 14
PAM ............................................................................................................................................... 14
BLOSUM......................................................................................................................................... 15
PAM versus BLOSUM..................................................................................................................... 16
Oefeningen ........................................................................................................................................ 16
Hoofdstuk 4 ........................................................................................................................................... 19
Introduction....................................................................................................................................... 19
Dynamic ............................................................................................................................................. 20
The Lipman MSA algorithm ........................................................................................................... 20
Progressive ........................................................................................................................................ 20
Feng & Doolittle............................................................................................................................. 21
CLUSTAL ......................................................................................................................................... 21
Conclusie ....................................................................................................................................... 21
Iterative ............................................................................................................................................. 22
Waarom? ....................................................................................................................................... 22
MUSCLE ......................................................................................................................................... 22
MAFT ............................................................................................................................................. 22
SAGA .............................................................................................................................................. 22
Consistency-based ............................................................................................................................. 22
ProbCons ....................................................................................................................................... 22
T-Coffee ......................................................................................................................................... 23
Phylogeny-aware ............................................................................................................................... 23
Wrap-up............................................................................................................................................. 23
Two pitfalls .................................................................................................................................... 23
What is the optimal alignement? .................................................................................................. 23
, Further optimalization................................................................................................................... 23
Editing multiple alignement .......................................................................................................... 23
Tips ................................................................................................................................................ 24
Which method? ............................................................................................................................. 24
Oefening ............................................................................................................................................ 24
Hoofdstuk 5 ........................................................................................................................................... 25
Introduction....................................................................................................................................... 25
Some search applications .............................................................................................................. 25
Stappenplan................................................................................................................................... 25
Dynamic vs Heuristic ..................................................................................................................... 25
Smith-Waterman ............................................................................................................................... 25
FASTA................................................................................................................................................. 25
Stappenplan................................................................................................................................... 26
Significance .................................................................................................................................... 26
BLAST ................................................................................................................................................. 26
BLAST vs FASTA.............................................................................................................................. 27
Stappenplan................................................................................................................................... 27
E-values and p-values .................................................................................................................... 28
Oefeningen ........................................................................................................................................ 28
Hoofdstuk 6 ........................................................................................................................................... 29
Introduction....................................................................................................................................... 29
Frequent words problem............................................................................................................... 29
Conclusion so far ........................................................................................................................... 29
Sequence motifs ............................................................................................................................ 29
Common definitions ...................................................................................................................... 29
Motif representation ......................................................................................................................... 30
Consensus string............................................................................................................................ 30
Regular expression ........................................................................................................................ 30
Probabilistic representation .......................................................................................................... 30
Motif discovery.................................................................................................................................. 32
Motif discovery methods .............................................................................................................. 32
Deterministic methods .................................................................................................................. 32
Probabilistic methods: Gibbs sampling ......................................................................................... 32
Oefeningen ........................................................................................................................................ 33
Hoofdstuk 7 ........................................................................................................................................... 34
Introduction....................................................................................................................................... 34
, Evolution........................................................................................................................................ 34
Phylogeny ...................................................................................................................................... 34
Definitions ..................................................................................................................................... 34
Tree nomenclature ........................................................................................................................ 35
Sequence selection............................................................................................................................ 35
Role of a synonymous substitution ............................................................................................... 35
Non-coding regions contain additional, unique information ........................................................ 35
Sometimes proteins are more informative ................................................................................... 36
Multiple alignement .......................................................................................................................... 36
Substitution models .......................................................................................................................... 36
Distance estimation ....................................................................................................................... 36
Modelling evolution as Markov proces ......................................................................................... 36
Jukes & Cantor ............................................................................................................................... 36
Kimura substitution model (α≠β) .................................................................................................. 36
Tamura substitution model ........................................................................................................... 37
Generalised time-reversible model (GTR) ..................................................................................... 37
Conclusie ....................................................................................................................................... 37
Tree construction .............................................................................................................................. 37
Tree & branch length..................................................................................................................... 37
Tree construction .......................................................................................................................... 37
Distance based .............................................................................................................................. 37
Charachter based .......................................................................................................................... 38
Tree evaluation.................................................................................................................................. 40
Bootstrapping ................................................................................................................................ 40
Oefeningen ........................................................................................................................................ 40
Hoofdstuk 8 ........................................................................................................................................... 42
Introduction....................................................................................................................................... 42
Read alignement............................................................................................................................ 42
De novo assembly.......................................................................................................................... 42
Sequence assembly theory................................................................................................................ 42
The assembly problem .................................................................................................................. 42
First law of assembly ..................................................................................................................... 42
Second law of assembly ................................................................................................................ 42
Overlap graphs .............................................................................................................................. 42
Shortest common superstring problem ........................................................................................ 42
Third law of assembly .................................................................................................................... 43