100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Bio-informatica samenvatting/hulpbestand voor examen

Beoordeling
-
Verkocht
1
Pagina's
33
Geüpload op
26-05-2024
Geschreven in
2023/2024

Samenvatting van slides+notities van bio-informatica. Codes met uitleg en werkzittingen uitwegewerk+uitleg. Uitleg van verschillende databases en veel gebruikte codes.

Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
26 mei 2024
Aantal pagina's
33
Geschreven in
2023/2024
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Bioinformatica
Inhoud
Les 1: intro+ databases ............................................................................................................2
Gendatabase ............................................................................................................................2
Protein database ...................................................................................................................3
Les 2: databases ......................................................................................................................4
Ontologies ............................................................................................................................4
Gene expression ....................................................................................................................5
Phenotypes/Diseases..............................................................................................................6
Model Organism databases ....................................................................................................6
Les 3: genome browsers+ SQL .................................................................................................6
Genome browsers .....................................................................................................................6
Homology.............................................................................................................................7
Database architectures...........................................................................................................8
Les 4: Linux + Jupyter .......................................................................................................... 10
Navigating the file system ..................................................................................................... 10
Additional Jupyternotebooks notes ........................................................................................ 13
Les 5: EMBOSS + BedTools exercises .................................................................................... 14
Les 6: Gene prediction ........................................................................................................... 15
Les 7+8: Python .................................................................................................................... 17
Les 9: Alignment, pattern matching, gene set analysis ............................................................. 19
Werkzitting1 ......................................................................................................................... 25
Information retrieval............................................................................................................ 25
CpG islands ........................................................................................................................ 26
Unknown sequence study...................................................................................................... 28
Werkzitting 2 ........................................................................................................................ 30
Python CpG island .............................................................................................................. 30
miRNA ............................................................................................................................... 32

,Les 1: intro+ databases




Gendatabase
- Entrez gene
o Onderdeel van NCBI: https://www.ncbi.nlm.nih.gov/gene/
o Each line is a transcript isoform (due to alternative promoters, and alternative
splicing); look at the exons, introns, non-coding exons (light greens: 5’UTR,
3’UTR), coding exons (dark green)
o Each transcript has a unique NM_ identifier = RefSeq identifier
o Each NM transcript corresponds to a unique NP_ protein entry
o More details about each NM/NP and links to the sequence in Entrez
Nucleotide are at the bottom of the Gene page
▪ Entrez Nucleotide contains all nucleotide sequences
▪ Search Nucleotide db with NM_000564
▪ (After the dot “.” is the version number)
- Refseq
o https://www.ncbi.nlm.nih.gov/refseq/
o Many sequences were/are represented more than once in GenBank
o RefSeq = curated “secondary” database that aims to provide a
comprehensive, integrated, nonredundant set of sequences
o Goal is to provide a reference sequence for each molecule in the central dogma
(DNA, mRNA, and protein)
o Each RefSeq represents a single, naturally occurring molecule from one
organism
o Nucleotide and protein sequences in RefSeq are explicitly linked to one
another
o Distinct accession number: 2+6 format (2 letters, underscore, six-digit number)
▪ NT_123456 (Genomic contigs), NM_123456 (mRNAs), NP_123456
(Proteins)
▪ XM_123456 (Model mRNAs), XP_123456 (model proteins):
computational predictions

,To visualize the data, download GenBank format (.gb) as textfile and open it in text editor,
such as Visual Studio Code or Jupyternotebooks.
- How to download:
o Click on “Send to” (right upper screen)
o Select “Complete Record” and “File”
o Choose GenBank format or FASTA (no header and features)
- In feature
o Sequence has a coding sequence (CDS) made up of five exons
▪ First exon begins at base 201 and ends at base 224
▪ Then is joined at basepair 1550 until bp 1920, and so forth.
o Each comma in this line represents a splicing event, and each “..” represents
the string of letters between the two coordinates.
o The gene product is eukaryotic initiation factor 4E-II, and the gene name is
eIF4E




EMBL/EBI
o https://www.ebi.ac.uk/
o European database
o DBFETCH provides an easy way to retrieve entries from various databases at
the EMBL-EBI
o Format:https://www.ebi.ac.uk/Tools/dbfetch/db=refseqn;id=NM_000231;form
at=fasta&style=raw
Protein database
- Uniprot: https://www.uniprot.org/
o Gives general feature format (GFF) (text file)
▪ Click download
▪ Choose GFF format
- Protein sequences in databases can be derived from translation of nucleotide
sequences (secondary databases)
o e.g., RefSeq NM_ to RefSeq NP_
o e.g.,TrEMBL
o Go to the protein database, following one of the NP_isoforms
- There are also curated databases: experts enhance the original data by adding new
information

, o e.g., SwissProt (in the UniProt knowledgebase)
▪ Information from literature
▪ Curator-evaluated computational analysis/predictions
- 3D structures
o https://www.ncbi.nlm.nih.gov/structure/ or Uniprot→ structure


Les 2: databases




Ontologies
- Gene ontology (GO)
o https://geneontology.org/ or https://www.ebi.ac.uk/QuickGO/ (human usually
capitalized)
▪ Data downloaden QuickGo
• Click on export
• Choose format: gen association file (then add .txt in the name)
• Adjust the amount of annotations
o Specific purpose: “Annotation of genes and proteins in genomic and protein
databases”
o Facilitate complex queries
o Applicable to all species
o Databases involved:
▪ FlyBase (Drosophila)
▪ MGI (Mouse)
▪ SGD (S. cerevisae)
▪ TAIR (Arabadopsis)
▪ TIGR (microbes including prokaryotes)
▪ SWISS-PROT (several thousand species inc. human)
▪ PSU (P. falciparum)
▪ ZFIN (zebrafish)
▪ PAMGO (plant pathogens)
o GO structure
$19.99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
sisivorst Katholieke Universiteit Leuven
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
30
Lid sinds
2 jaar
Aantal volgers
4
Documenten
17
Laatst verkocht
1 week geleden

5.0

1 beoordelingen

5
1
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen