Hoofdstuk 3: pairwise Alignments
➢ Define homologs, paralogs, orthologs
♥ homology
♪ Similarity attributed to descent from a common ancestor.
♪ Orthologs
❖ Homologous sequences in different species that arose from a common ancestral gene during speciation; may
or may not be responsible for a similar function.
❖
• members of a gene (protein) family in various organisms.
♪ Paralogs
❖ Homologous sequences within a single species that arose by gene duplication.
❖
• members of a gene (protein) family within a species.
♪
➢ Perform pairwise alignments (NCBI BLAST)
♥ Pairwise sequence alignment is the most fundamental operation of bioinformatics
♪ It is used to decide if two proteins (or genes) are related structurally or functionally
♪ It is used to identify domains or motifs that are shared between proteins
♪ It is the basis of BLAST searching
♪ It is used in the analysis of genomes
, ♪
General approach to pairwise alignment
❖ Choose two sequences
❖ Select an algorithm that generates a score
❖ Allow gaps (insertions, deletions)
❖ Score reflects degree of similarity
❖ Alignments can be global or local
❖ Estimate probability that the alignment occurred by chance
♥ protein sequences can be more informative than DNA
♪ protein is more informative (20 vs 4 characters)
❖ many amino acids share related biophysical properties
♪ codons are degenerate: changes in the third position often do not alter the amino acid that is specified
♪ protein sequences offer a longer “look-back” time
♪ DNA sequences can be translated into protein, and then used in pairwise alignments
♪ Many times, DNA alignments are appropriate
❖ --to confirm the identity of a cDNA
❖ --to study noncoding regions of DNA
❖ --to study DNA polymorphisms
❖ example: Neanderthal vs modern human DNA
♥ Definition: pairwise alignment
♪ The process of lining up two sequences to achieve maximal levels of identity (and conservation, in the case of
amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
♥ Definitions: identity, similarity, conservation
♪ Identity: The extent to which two (nucleotide or amino acid) sequences are invariant.
♪ Similarity: The extent to which nucleotide or protein sequences are related. It is based upon identity plus
conservation.
♪ Conservation: Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the
physico-chemical properties of the original residue.
➢ Understand how scores are assigned to aligned amino acids using Dayhoff’s PAM matrices
♥ Calculation of an alignment score
♪
♪ Doel: log odds
❖
♪ Dayhoff’s approach to assigning scores for any two aligned amino acid residues
➢ Define homologs, paralogs, orthologs
♥ homology
♪ Similarity attributed to descent from a common ancestor.
♪ Orthologs
❖ Homologous sequences in different species that arose from a common ancestral gene during speciation; may
or may not be responsible for a similar function.
❖
• members of a gene (protein) family in various organisms.
♪ Paralogs
❖ Homologous sequences within a single species that arose by gene duplication.
❖
• members of a gene (protein) family within a species.
♪
➢ Perform pairwise alignments (NCBI BLAST)
♥ Pairwise sequence alignment is the most fundamental operation of bioinformatics
♪ It is used to decide if two proteins (or genes) are related structurally or functionally
♪ It is used to identify domains or motifs that are shared between proteins
♪ It is the basis of BLAST searching
♪ It is used in the analysis of genomes
, ♪
General approach to pairwise alignment
❖ Choose two sequences
❖ Select an algorithm that generates a score
❖ Allow gaps (insertions, deletions)
❖ Score reflects degree of similarity
❖ Alignments can be global or local
❖ Estimate probability that the alignment occurred by chance
♥ protein sequences can be more informative than DNA
♪ protein is more informative (20 vs 4 characters)
❖ many amino acids share related biophysical properties
♪ codons are degenerate: changes in the third position often do not alter the amino acid that is specified
♪ protein sequences offer a longer “look-back” time
♪ DNA sequences can be translated into protein, and then used in pairwise alignments
♪ Many times, DNA alignments are appropriate
❖ --to confirm the identity of a cDNA
❖ --to study noncoding regions of DNA
❖ --to study DNA polymorphisms
❖ example: Neanderthal vs modern human DNA
♥ Definition: pairwise alignment
♪ The process of lining up two sequences to achieve maximal levels of identity (and conservation, in the case of
amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
♥ Definitions: identity, similarity, conservation
♪ Identity: The extent to which two (nucleotide or amino acid) sequences are invariant.
♪ Similarity: The extent to which nucleotide or protein sequences are related. It is based upon identity plus
conservation.
♪ Conservation: Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the
physico-chemical properties of the original residue.
➢ Understand how scores are assigned to aligned amino acids using Dayhoff’s PAM matrices
♥ Calculation of an alignment score
♪
♪ Doel: log odds
❖
♪ Dayhoff’s approach to assigning scores for any two aligned amino acid residues