100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Fundamentals of Bioinformatics - Summary

Rating
3.0
(1)
Sold
4
Pages
34
Uploaded on
13-10-2020
Written in
2020/2021

Fundamentals of Bioinformatics course on VU and UVA. This is a summary including all lecture material, lecture slides and compulsory reading. It also includes notes from Python tutorials (for the conversion class). It has the same lecture material as for the minor course Fundamentals of Bioinformatics.

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 13, 2020
Number of pages
34
Written in
2020/2021
Type
Summary

Subjects

Content preview

Fundamentals of Bioinformatics




0

, FUNDAMENTALS OF BIOINFORMATICS

INHOUDSOPGAVE
LECTURES 2

Lecture 1 Fundamentals of Bioinformatics – Introduction 2
Lecture 2 Evolution 6
Lecture 3 Introduction to Machine Learning 11
Lecture 4 Protein structures & sequence profiles 13
Lecture 5 Impact prediction: SIFT and PolyPhen-2 15
Lecture 6 DNA Sequencing & Computational Analysis of Massively Parallel Sequencing (MPS) Data
19
Lecture 7 Title 24
Lecture 8 Big Data in Life Sciences 25

TUTORIALS 28

Tutorial 1 28
Tutorial 2 First steps in Python 30
Tutorial 3 Lists and loops 31
Tutorial 4 Functions 31
Tutorial 5 File I/O and dictionaries 32




1

, Lectures
Lecture 1 Fundamentals of Bioinformatics – Introduction
Chapter 1

Keywords of bioinformatics: programming, knowledge molecular biology, machine learning,
translating the research question into methods, designing workflows, general data science skills and
genomics/genetics, algorithms, statistics. Bioinformaticians often are working on workflows.

Bioinformatics: studying the informatic processes in biotic systems.

Sequencing: data analysis flow: images → reads (small read of DNA 200 – 500 nucleotides long) →
align them / match to existing genomes (computationally expensive) → significance values (SNP, …).

Small devices are available for laptops, however the computation stays big and is send to a server.

We measure molecules / variants in the cells. We can measure proteins, DNA, RNA: called omics. So
in essence you get a profile of molecular state of a sample.

Dimensions are often very difficult, e.g. per sample 10000 features for example 150 people.

Applications are found in: biomedicine, pharmacy, eco genomics, plant breeding.

Data sources – Molecular profiling:
- Genome (genomics): point mutations / small variants (exome sequencing, DNAseq),
structural variants (arrayCGH, DNAseq), methylation.
- RNA transcription/expression (transcriptomics): RNAseq, micro arrays, single cell RNAseq.
- Proteins (proteomic): immunohistochemistry, mass spectrometry.
- Metabolites – small molecules (metabolomics): NMR, mass spectrometry.
- Microbes – bacteria/virus/parasite: DNAseq (metagenomics), RNAseq (metatranscriptomics)

Genomics → transcriptomics → proteomics → metabolomics.

40% group project, 30% individual assignment, 30% conversion class.

Project: Benchmark impact prediction methods.
Impact prediction methods: tries to predict what the effect of a mutation is. Polyphen and SIFT are
existing and will be used.

We need to have an annotated set in which we know what the effect is of a mutation. We use the
dataset ClinVar. Use predictions from your method + the gold standard annotations & create a ROC.

What factors make if a mutation has an effect on the organism or not: stop codon, type of amino acid
change, another amino acid, another amino acid from another group, location, the loci (NCR will not
affect the mutation), mutation can affect a promoter heavily, frameshift, nonsense mutations,
deletions/ duplications, frame shift mutations, some factors that have an effect on their organism:
needs to be in a gene, in exon, should affect the amino acid preferably the first two bases of the
amino acid, a mutation that doesn't change the amino acid, a mutation that is not in the active part
of the protein, a mutation in a gene that is not expressed in a certain cell type, a mutation in non-
coding DNA, a mutation in an intron, structure of the protein (mutation which decides structure for
example), location of mutation (intron exon) number of mutations already present, the kind of
mutation (point mutation etc), - DNA repairing; Environmental changes (spontaneous mutations);
Indels changes, Exonic/intronic, Major signalling pathway (y/n), or other system, e.g. DNA repair,
promoter region (y/n), redundancy (gene copies available?), wobble base mutations, start-,stop-

2

, codon, structure of protein, regulation signal, splicing signal, location (IGR, NCR, RNA, repeat
regions, pseudogenes), sort of mutations (deletions/duplications, frame shift), repair mechanisms of
cells, ability to kill cell, epigenetic modification, introns and exons might change, nothing might
happen since it is in in intron, nothing might happen since it is a wobble base, regulatory sequences
might change leading to: over expression, under expression, mutations that might alter the eventual
protein in different ways, introduction of extra stop/start codons.

Bioinformatics is to find which mutations cause a disease. Because you find many, but not all are
important. Aim is to select the biomarker (to make the difference between disease and not).

Biomarker: observation on which to make a clinical decision/diagnosis

We use ClinVar for human SNPs as dataset with gold standard. We will compare the predictions from
the method with the gold standard to create an ROC curve to asses and compare the quality of
predictions by the different methods.

SNV – single nucleotide variant: a position in the DNA where a person/cell has a different nucleotide
than the refence genome. Often are heteroallelic (in only one of two chromosomes).
SNP – single nucleotide polymorphism: a position in the DNA where throughout a population variant
to the common reference may be observed.

To make predictions, we look at conserved regions to see look in the past and see if previous changes
at a certain position were allowed. Impact prediction can reduce the number of SNVs to consider in
biomarker selection.

SNPs occur almost once every 1000 nucleotides, so 4 to 5 million SNPs are present in one genome.

Reading
Central feature of life: ability to reproduce itself.

Three components of an evolutionary process: inheritance, the passing of characteristics from
parents to offspring; variation, the processes that make offspring other than exact copies of their
parents; and selection, the process that differentially favours the reproduction of some organisms,
and hence their characteristics, over others. Evolution is a cumulative process.
Life: the result of evolutionary process taking place on earth.

Reproductive fitness: a measure of how many surviving offspring an organism can produce.

All life is divided into four groups: viruses, archaea, bacteria and eucarya.
Vertebrates (animals with backbones (fish, reptiles, amphibians, birds, mammals)) are 3% of species.

Viruses: small amount of genetic material surrounded by a protein coat.

Sperm and eggs are germ cells (divide during meiosis); all other kinds of cells in the body are somatic.
Differentiated cells cannot reproduce an animal (except reproductive cell).

Membranes: boundaries between the cell and the outside world. All cells have phospholipid (lipids
with a phosphate group attached) cell membrane. The phosphate hydrophilic and the lipid
hydrophobic. The membrane contains all sorts of signal transduction mechanisms.

Proteins: molecules that accomplish most of the functions of the living cell. They can for example
function as enzyme (which catalyses chemical reactions).
Proteins are built up out of 20 naturally occurring amino acids, often as many as 4500 per protein.
Prosthetic groups: groups of atoms to which some proteins bind to function.

3

Reviews from verified buyers

Showing all reviews
3 year ago

3.0

1 reviews

5
0
4
0
3
1
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
lenie22 Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
70
Member since
5 year
Number of followers
45
Documents
14
Last sold
10 months ago

4.1

7 reviews

5
3
4
2
3
2
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions