BIOINFORMATICA
INHOUDSOPGAVE
1 – Introductie .............................................................................................................................. 4
Getting started ................................................................................................................................ 4
Expectations ................................................................................................................................... 8
(Not) in this course .......................................................................................................................... 9
2 – Computer science background ............................................................................................... 10
Databases..................................................................................................................................... 10
Algoritmes ..................................................................................................................................... 11
Principes van classificatie .............................................................................................................. 13
Graph theory (graaf theorie)............................................................................................................ 15
Computationele complexiteit ......................................................................................................... 17
Oefeningen ................................................................................................................................... 23
3 – Pairwise alignment ................................................................................................................. 25
Introductie .................................................................................................................................... 25
Dot matrix methode ....................................................................................................................... 31
Global alignment: NW .................................................................................................................... 32
Local alignment: SW ...................................................................................................................... 36
Substitution matrices..................................................................................................................... 40
PAM ........................................................................................................................................... 43
BLOSUM .................................................................................................................................... 47
Oefeningen ................................................................................................................................... 51
4 – Multiple alignment .................................................................................................................. 53
Introductie .................................................................................................................................... 53
Dynamic progamming .................................................................................................................... 56
Progressief .................................................................................................................................... 59
Iteratief ......................................................................................................................................... 62
Consistency-based........................................................................................................................ 64
Phylogeny-aware ........................................................................................................................... 65
Wrap-up ........................................................................................................................................ 65
Oefeningen ................................................................................................................................... 67
5 – Sequence search .................................................................................................................... 68
1
, Introductie .................................................................................................................................... 68
Smith-Waterman ........................................................................................................................... 69
FASTA ........................................................................................................................................... 70
BLAST ........................................................................................................................................... 74
Oefeningen ................................................................................................................................... 81
6 – Sequentiepatronen ................................................................................................................. 82
Introductie .................................................................................................................................... 82
Motief representatie ...................................................................................................................... 84
Positiespecifieke scorematrix (PSSM) ......................................................................................... 85
Hidden Markov Model (HMM) ...................................................................................................... 89
Motief discovery ............................................................................................................................ 99
Oefeningen ..................................................................................................................................102
7 – Fylogenie .............................................................................................................................. 105
Introductie ...................................................................................................................................105
Sequentie selctie ..........................................................................................................................111
Multiple alignment ........................................................................................................................112
Substitutiemodellen .....................................................................................................................113
Boom constructie .........................................................................................................................116
Afstandsmethoden................................................................................................................... 119
Karakter-gebaseerde methoden................................................................................................ 125
Boom evaluatie ............................................................................................................................129
Oefeningen ..................................................................................................................................131
8 – Sequence assembly .............................................................................................................. 132
Introductie ...................................................................................................................................132
Sequence assembly theorie ..........................................................................................................132
Oefeningen ..................................................................................................................................139
10 – Mass spectrometry proteomics ........................................................................................... 140
Introductie ...................................................................................................................................140
Een primer voor MS .......................................................................................................................140
Eiwit en peptide MS.......................................................................................................................141
Spectrum identificatie...................................................................................................................145
Sequentie databank search ...................................................................................................... 146
Spectral library search.............................................................................................................. 147
De novo sequencing ................................................................................................................. 147
Statistische overwegingen.............................................................................................................148
2
,11 – Netwerk biologie ................................................................................................................. 149
Introductie ...................................................................................................................................149
Moleculair netwerk: types .............................................................................................................150
Moleculair netwerk: eigenschappen ..............................................................................................152
Moleculair netwerk: inferentie .......................................................................................................156
Moleculair netwerk: analyse ..........................................................................................................158
Oefeningen ..................................................................................................................................160
3
, 1 – INTRODUCTIE
GETTING STARTED
• Wat kunnen we leren uit sequentie-informatie en hoe doet men dat =
bioinformatica
• Sequentieverschillen moeilijk met blote oog te vinden, moet mbv computer
• Databanken doorzoeken: sequentie vergelijken met wat we ergens in deze
wereld al eerder gezien hebben
• Afleiden van evolutionaire verbanden: relaties tussen verschillende organismen
o Bvb hoe verschillende varianten van virussen zich tov elkaar verhouden
• Ontdekken van frequentiepatronen
o Patronen die regelmatig voorkomen in sequentie
o Stukje sequentie dat in kort stuk DNA verschillende keren voorkomt is
biologisch gezien betekenisvol, kan geen toevam zijn
• Voorspellen eiwitfct
o Verschillende technieken hiervoor
• Menselijk genoom
o Elke cel bevat exacte kopij
o 23 chromosomen
o Complementaire helix
o Enkele streng kan je neerschrijven in
deze vorm: zie afb
• DNA geëxtraheerd uit weefsel (staal organisme) à
kopij maken DNA: amplificatie à in korte
fragmenten knippen à parallel lezen op machine =
sequencer à elk van dotjes (zie afb) is 1 van
fragmentjes, elke kleur is 1 base die ervan gelezen
wordt (dmv excitatie)
o Resultaat: verzameling korte sequenties
o Vergeklijking: 5000 boeken, 30x kopiëren, versnipperen à lezen
• Genome assembly: zoeken naar overlap tussen stukjes à als er overlap is
kunnen het buren zijn, naburige stukjes DNA
o Er kunenn fouten gebeuren, soms onzekerheid
• 2 sequenties: ene gezond, ene ziek à wat onderscheidt gezonde van zieke
individuen
o Voor klein stukje sequentie niet heel moeilijk maar voor heel DNA wel
o Soms kan 1 letter verschil bepalend zijn, maar ziektes zijn vaak complexer
• In 2000 kostte een genoom uitlezen miljarden dollars, nu al technieken van maar
honderden dollars
4
INHOUDSOPGAVE
1 – Introductie .............................................................................................................................. 4
Getting started ................................................................................................................................ 4
Expectations ................................................................................................................................... 8
(Not) in this course .......................................................................................................................... 9
2 – Computer science background ............................................................................................... 10
Databases..................................................................................................................................... 10
Algoritmes ..................................................................................................................................... 11
Principes van classificatie .............................................................................................................. 13
Graph theory (graaf theorie)............................................................................................................ 15
Computationele complexiteit ......................................................................................................... 17
Oefeningen ................................................................................................................................... 23
3 – Pairwise alignment ................................................................................................................. 25
Introductie .................................................................................................................................... 25
Dot matrix methode ....................................................................................................................... 31
Global alignment: NW .................................................................................................................... 32
Local alignment: SW ...................................................................................................................... 36
Substitution matrices..................................................................................................................... 40
PAM ........................................................................................................................................... 43
BLOSUM .................................................................................................................................... 47
Oefeningen ................................................................................................................................... 51
4 – Multiple alignment .................................................................................................................. 53
Introductie .................................................................................................................................... 53
Dynamic progamming .................................................................................................................... 56
Progressief .................................................................................................................................... 59
Iteratief ......................................................................................................................................... 62
Consistency-based........................................................................................................................ 64
Phylogeny-aware ........................................................................................................................... 65
Wrap-up ........................................................................................................................................ 65
Oefeningen ................................................................................................................................... 67
5 – Sequence search .................................................................................................................... 68
1
, Introductie .................................................................................................................................... 68
Smith-Waterman ........................................................................................................................... 69
FASTA ........................................................................................................................................... 70
BLAST ........................................................................................................................................... 74
Oefeningen ................................................................................................................................... 81
6 – Sequentiepatronen ................................................................................................................. 82
Introductie .................................................................................................................................... 82
Motief representatie ...................................................................................................................... 84
Positiespecifieke scorematrix (PSSM) ......................................................................................... 85
Hidden Markov Model (HMM) ...................................................................................................... 89
Motief discovery ............................................................................................................................ 99
Oefeningen ..................................................................................................................................102
7 – Fylogenie .............................................................................................................................. 105
Introductie ...................................................................................................................................105
Sequentie selctie ..........................................................................................................................111
Multiple alignment ........................................................................................................................112
Substitutiemodellen .....................................................................................................................113
Boom constructie .........................................................................................................................116
Afstandsmethoden................................................................................................................... 119
Karakter-gebaseerde methoden................................................................................................ 125
Boom evaluatie ............................................................................................................................129
Oefeningen ..................................................................................................................................131
8 – Sequence assembly .............................................................................................................. 132
Introductie ...................................................................................................................................132
Sequence assembly theorie ..........................................................................................................132
Oefeningen ..................................................................................................................................139
10 – Mass spectrometry proteomics ........................................................................................... 140
Introductie ...................................................................................................................................140
Een primer voor MS .......................................................................................................................140
Eiwit en peptide MS.......................................................................................................................141
Spectrum identificatie...................................................................................................................145
Sequentie databank search ...................................................................................................... 146
Spectral library search.............................................................................................................. 147
De novo sequencing ................................................................................................................. 147
Statistische overwegingen.............................................................................................................148
2
,11 – Netwerk biologie ................................................................................................................. 149
Introductie ...................................................................................................................................149
Moleculair netwerk: types .............................................................................................................150
Moleculair netwerk: eigenschappen ..............................................................................................152
Moleculair netwerk: inferentie .......................................................................................................156
Moleculair netwerk: analyse ..........................................................................................................158
Oefeningen ..................................................................................................................................160
3
, 1 – INTRODUCTIE
GETTING STARTED
• Wat kunnen we leren uit sequentie-informatie en hoe doet men dat =
bioinformatica
• Sequentieverschillen moeilijk met blote oog te vinden, moet mbv computer
• Databanken doorzoeken: sequentie vergelijken met wat we ergens in deze
wereld al eerder gezien hebben
• Afleiden van evolutionaire verbanden: relaties tussen verschillende organismen
o Bvb hoe verschillende varianten van virussen zich tov elkaar verhouden
• Ontdekken van frequentiepatronen
o Patronen die regelmatig voorkomen in sequentie
o Stukje sequentie dat in kort stuk DNA verschillende keren voorkomt is
biologisch gezien betekenisvol, kan geen toevam zijn
• Voorspellen eiwitfct
o Verschillende technieken hiervoor
• Menselijk genoom
o Elke cel bevat exacte kopij
o 23 chromosomen
o Complementaire helix
o Enkele streng kan je neerschrijven in
deze vorm: zie afb
• DNA geëxtraheerd uit weefsel (staal organisme) à
kopij maken DNA: amplificatie à in korte
fragmenten knippen à parallel lezen op machine =
sequencer à elk van dotjes (zie afb) is 1 van
fragmentjes, elke kleur is 1 base die ervan gelezen
wordt (dmv excitatie)
o Resultaat: verzameling korte sequenties
o Vergeklijking: 5000 boeken, 30x kopiëren, versnipperen à lezen
• Genome assembly: zoeken naar overlap tussen stukjes à als er overlap is
kunnen het buren zijn, naburige stukjes DNA
o Er kunenn fouten gebeuren, soms onzekerheid
• 2 sequenties: ene gezond, ene ziek à wat onderscheidt gezonde van zieke
individuen
o Voor klein stukje sequentie niet heel moeilijk maar voor heel DNA wel
o Soms kan 1 letter verschil bepalend zijn, maar ziektes zijn vaak complexer
• In 2000 kostte een genoom uitlezen miljarden dollars, nu al technieken van maar
honderden dollars
4