100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
College aantekeningen

samenvatting Bioinformatica (Genomica) Biologie UU

Beoordeling
-
Verkocht
1
Pagina's
18
Geüpload op
03-07-2021
Geschreven in
2020/2021

Aantekeningen van colleges van Bioinformatica, met bijbehorende slides en tabellen.

Instelling
Vak










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
3 juli 2021
Aantal pagina's
18
Geschreven in
2020/2021
Type
College aantekeningen
Docent(en)
-
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

Leerdoelen Genomica
HC1: Intro, BLAST
Why study bioinformatics?
 Explain why a biologist should know Bioinformatic Data Analysis

 Describe the ‘omics: (meta-) genomics, (meta-) transcriptomics, (meta-) proteomics,
metabolomics, etc.

Genomics: Sequence all of the DNA of one organism

Transcriptomics: Sequence all of the mRNA in an organism/tissue/cell

Proteomics: Sequence all of the proteins in an organism/tissue/cell

Metagenomics: Sequence the DNA of all organisms in a sample

Metatranscriptomics: Sequence the mRNA of all organisms in a sample

Metaproteomics: Sequence the proteins of all organisms in a sample

 Explain the biology behind the ‘omics revolution: reduce bias by measuring all of a thing
Omics solves a major problem in science: biases
- People are mostly interested in: 1. Their diseases 2. Their food 3. Themselves
- This causes biases in our general understanding of biology, and biases in our databases
- For example, most studied bacteria are associated with humans

 Compare the two ways a bioinformatician exploits existing data to make new discoveries
(top-down and bottom-up)

Sequence similarity searches
 Explain what a sequence alignment is and the difference between a global and local
sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or
protein to identify regions of similarity that may be a consequence of functional, structural,
or evolutionary relationships between the sequences. Aligned sequences of nucleotide or
amino acid residues are typically represented as rows within a matrix. Gaps are inserted
between the residues so that identical or similar characters are aligned in successive
columns.

Local alignment – Finds the optimal sub-alignment within two sequences – Partial homologs
Global alignment – Aligns two sequences from end to end – If you know two sequences are
full homologs, e.g. resulting from gene duplication.

 Explain the BLAST algorithm
1. Identifies all words (length W) in the query – Default lengths: W = 3 for protein, W = 11
for DNA
– Based on substitution scores
2. Quickly finds similar words in the database – “Similar” words are defined by using the
substitution matrix (e.g. BLOSUM62) – The index quickly locates all potential hit seqs

, 3. Extends seeds in both directions to find HSPs between query and hit – HSP: region that
can be aligned with a score above a certain threshold

 List the factors including heuristics that make BLAST fast
The fastest algorithms generally use heuristics Heuristic: a practical method that is not
guaranteed to be optimal, but sufficient for the present goals.

Running blast
 Evaluate BLAST output/results

 Decide which BLAST flavor to use for your similarity search
BLAST flavors: direct searches
o Nucleotide-nucleotide searches
- Nucleotide database & nucleotide query
- blastn (default: W = 11 nucleotides)
 Find homologous genes in different species
- Megablast (default: W = 28 nucleotides)
 Designed to efficiently find longer alignments between very similar
nucleotide sequences
 Best tool to find highly identical hits for a query sequence • For
example: find sequences from the same species
- Discontiguous Megablast
 Uses discontiguous words (e.g. W = 11 nucleotides: AT-GT-AC-CG-CG-T)
 For example, this can focus the search on codons (the third nucleotide
of codons is less conserved due to the degeneracy of the genetic code)
 Best tool to find nucleotide-nucleotide hits at larger evolutionary
distances for proteincoding query sequences.
o Protein-protein searches
- Protein database & protein query sequences
- blastp (default: W = 3 amino acids)
 Find homologous proteins in different species

BLAST flavors: translated searches

o We can exploit the conservation of protein sequences when aligning DNA sequences, by
using translated searches
o This allows for more sensitive searches that detect homology at greater evolutionary
distances
– For example: homologous genes in distantly related species
o blastx and tblastx first translate the query from nucleotide into protein before identifying
high-scoring words
o tblastn and tblastx use a translated database of nucleotide sequences stored as proteins

, HC 2 Quantifying Sequence Similarity
Evolution
 List the mechanisms of DNA mutation
Nucleotide substitutions
- Replication error
- Physical or chemical reaction
Insertions or deletions (indels)
- Unequal crossing over during meiosis
- Replication slippage
Inversions or rearrangements
Duplications of:
- Partial or whole gene
- Partial (polysomy) or whole chromosome (aneuploidy, polysomy)
- Whole genome (polyploidy)
Horizontal gene transfer (HGT)
- Transfer between individuals of the same generation
 Define homology, similarity, and identity
Homology
- Property of two sequences that have a shared ancestor
- Homology is TRUE or FALSE: either you’re family or you’re not
Identity
- Percentage of identical residues in an alignment
- Used for amino acids or nucleotides.
Similarity
- Percentage of amino acid residues in an alignment with a positive substitution score-
- Not used for DNA
 List four properties of amino acids that might be important in determining their physico-
chemical similarity
Size, polarity, hydrophobicity, preferred protein fold

Probability & Permutation Statistics
 Work with P-values obtained using permutation statistics
P-value: defined as the probability of observing a hit as good as, or better than your score by
chance.
In permutation statistics -> corresponds to the fraction of times that the permuted score is
equal or higher than your score.
Meaningful observation -> low P-value -> if randomly permuted data rarely has a higher
score
The minimum P-value depends on the number of random permutations.
Example: for 100 permutations, the best P-value: <0.01
For 1000 permutations, the best P-value: <0.001
 Explain how permutation statistics help us evaluate the strength of a result
Statistics are not well defined for many bioinformatic analyses. A simple solution is data
permutation:
- Permute (shuffle) the sequences 1000* times
- Make 1000* new alignment matrices
- Register if the alignment score of the permuted sequences is equal or higher than
Your Score
€4,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
milofonville

Maak kennis met de verkoper

Seller avatar
milofonville Universiteit Utrecht
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
3
Lid sinds
5 jaar
Aantal volgers
3
Documenten
2
Laatst verkocht
2 jaar geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen