INHOUD
1. Databanken ............................................................................................................................................................................... 3
Entrez gene (NCBI): https://www.ncbi.nlm.nih.gov/gene/ ............................................................................................................ 3
Entrez nucleotide (NCBI): https://www.ncbi.nlm.nih.gov/nucleotide/ .......................................................................................... 3
Entrez protein (NCBI): https://www.ncbi.nlm.nih.gov/protein ...................................................................................................... 4
3D structuren Entrez Structure (NCBI): https://www.ncbi.nlm.nih.gov/Structure/index.shtml .................................................... 4
EMBL/EBI search (European databank): https://www.ebi.ac.uk/ .................................................................................................. 4
ENA (tegenhanger entrez nucleotide): https://www.ebi.ac.uk/ena/browser/home ..................................................................... 4
PUBMED: https://pubmed.ncbi.nlm.nih.gov/................................................................................................................................. 4
Uniprot: https://www.uniprot.org/ ................................................................................................................................................ 4
Gene ontology: https://geneontology.org/ .................................................................................................................................... 6
MGI (mouse base): https://www.informatics.jax.org/ ................................................................................................................... 6
Flybase: https://flybase.org/ .......................................................................................................................................................... 6
OBO Open Biomedical Ontologies (OBO): https://obofoundry.org/ .............................................................................................. 7
OMIM Online Mendelian Inheritance in Man: https://www.omim.org/ ....................................................................................... 7
Human Phenotype Ontology (HPO): https://hpo.jax.org/ .............................................................................................................. 7
UCSC Genome Browser: https://genome.ucsc.edu/ ...................................................................................................................... 7
Ensembl genome browser (europese versie van de UCSC): https://www.ensembl.org/index.html ........................................... 10
UCSC cell browser: https://cells.ucsc.edu/ ................................................................................................................................... 10
2. Query ....................................................................................................................................................................................... 11
MySQL ........................................................................................................................................................................................... 11
UCSC table browser: https://genome.cse.ucsc.edu/cgi-bin/hgTables ......................................................................................... 12
BioMart: https://www.ensembl.org/biomart/ ............................................................................................................................. 13
3. Linux .............................................................................................................................................................................................. 14
Inloggen ........................................................................................................................................................................................ 14
• Jupyter ..................................................................................................................................................................................... 16
Codes: (shift enter om de code uit te voeren, steeds spatie tussen eerste commando en het vervolg) ..................................... 16
Notebooks ..................................................................................................................................................................................... 19
5. BLAST ............................................................................................................................................................................................ 19
BLAST: https://blast.ncbi.nlm.nih.gov/Blast.cgi............................................................................................................................ 19
BLAT: https://genome.ucsc.edu/cgi-bin/hgBlat ........................................................................................................................... 20
6. Oefeningen in Jupyter ............................................................................................................................................................. 21
LINUX Oefeningen ............................................................................................................................................................................. 21
Intro .............................................................................................................................................................................................. 21
Notebook: Introduction_linux.ipynb & Copying_files ............................................................................................................... 21
Gene ontology ............................................................................................................................................................................... 21
Notebooks: Exercise_Linux_GOdownload.ipynb ....................................................................................................................... 21
1
, HeidiSQL ........................................................................................................................................................................................ 23
Table browser ............................................................................................................................................................................... 24
GO-exercise ................................................................................................................................................................................... 26
Verwerken van een quickGO file in jupyter (zie notebook: GO_download) ................................................................................. 26
Verwerken van sequenties in Jupyter ........................................................................................................................................... 27
Notebook: Exercise_Linux_Emboss.ipynb ................................................................................................................................. 27
2 bestanden overlappen in Jupyter (zie notebook: BEDTools) ..................................................................................................... 28
Notebook: BEDTools_example.ipynb ........................................................................................................................................ 28
Gene predection in Jupyter (2 notebooks) ................................................................................................................................... 28
ORF prediction: reading frame bepalen .................................................................................................................................... 28
Basic codon statistics – ORF prediction: codon usage – codon gebruik ................................................................................... 29
Advanced codon statistics - ORF prediction ............................................................................................................................. 29
Gene ID (ORF prediction + custom track UCSC) ........................................................................................................................ 29
Via GeneID genen voorspellen om een custom track van te maken in UCSC ter controle: ...................................................... 30
Pandas: SQL querry met python ............................................................................................................................................... 34
Dot plot ......................................................................................................................................................................................... 35
Notebook: Les 9 ........................................................................................................................................................................ 35
Alignment ...................................................................................................................................................................................... 35
Notebook: Les 9 ........................................................................................................................................................................ 35
Sequenties vergelijken en repeats vinden met dotplot ............................................................................................................ 35
Alignments + parameters zoeken ............................................................................................................................................. 35
Gene lists from expression analysis .......................................................................................................................................... 37
Pattern matching: patronen in sequenties zoeken ................................................................................................................... 39
7. Oefenzittingen .............................................................................................................................................................................. 41
Oefenzitting 1................................................................................................................................................................................ 41
Information retrieval ................................................................................................................................................................. 41
CpG eilanden (zie notebook werkzittingen 1) ........................................................................................................................... 42
Unknown sequence study ......................................................................................................................................................... 44
Oefenzitting 2................................................................................................................................................................................ 47
CpG eilanden met Python (zie notebook: python CpgIsland excercise) ................................................................................... 47
miRNA target genes discovery: (zie notebook: miRNA exercise) .............................................................................................. 47
2
,1. DATABANKEN
ENTREZ GENE (NCBI): HTTPS://WWW.NCBI.NLM.NIH.GOV/GENE/
=> Opzoeken van een gen, gen structuur annotatie
• Klikken op juiste gen in de lijst (kan filteren op organismen)
• Inhoud:
• Samenvatting
• Visualisatie van de locatie van het gen
o Elk verschillend lijntje is een andere isovorm van het gen (kan tot andere eiwitten lijden) en heeft
een andere NM identifier en een andere NP identifier
o Donker gekleurde blokjes = coding sequences
o Licht gekleurde blokjes = UTR
o Licht gekleurde lijntjes = intronen
• Expressie van het gen (welke organen/weefsels)
• Bibliografie (artikels uit PubMed gelinkt aan het gen)
• Geassocieerde ziekten
• Pathways
• Gene Ontology
• Refseq identifiers van al de mRNA’s en de bijhorende proteïnes (met link naar entrez nucleotide) -> hier
staat voor hoeveel verschillende proteïnes het gen codeert.
• mRNAs/CDS/exon/intron features
ENTREZ NUCLEOTIDE (NCBI): HTTPS://WWW.NCBI.NLM.NIH.GOV/NUCLEOTIDE/
=> databank die vol zit met nucleotide sequenties & alle mRNA’s
• Samenvatting: header
• Publicaties
• Features:
o Lengte
o Exon per exon: mRNA -> zegt waar de coderende regio’s liggen, waar de aparte exonen gelegen zijn
o CDS (van welk tot welk nucleotide de effectieve code van het gen zit -> alles hiervoor = 5’UTR en
alles hierna is 3’UTR tov de totale lengte van het gen)
o Poly A site
o Eiwitsequentie
• Onderaan mRNA sequentie: nucleotiden sequentie
• Downloaden FASTA
o Send to (rechts bovenaan)
o Complete record
o File
o Format: FASTA (enkel sequentie) of GENBANK (header + al de features + sequentie) -> flat tekst file
zijn beide
3
, ENTREZ PROTEIN (NCBI): HTTPS://WWW.NCBI.NLM.NIH.GOV/PROTEIN
=> databank die vol zit met proteïne sequenties
• Samenvatting
• Publicaties
• Features:
o Delen van het eiwit met bepaalde functies (bijvoorbeeld opzoeken DNA bindingsplaats !kan
verschillend zijn in elke isovorm van het gen)
• Onderaan aminozuur sequentie
3D STRUCTUREN ENTREZ STRUCTURE (NCBI):
HTTPS://WWW.NCBI.NLM.NIH.GOV/STRUCTURE/INDEX.SHTML
• Zien van 3D structuren van eiwitten
EMBL/EBI SEARCH (EUROPEAN DATABANK): HTTPS://WWW.EBI.AC.UK/
=> zelfde informatie als in NCBI maar een Europese back-up
• Gene and protein summary
• Link naar ENSEMBL
• DBFETCH: kan je met deze link meteen direct de FASTA file gaan extraheren als je dit mee in je code steekt
als commando
o http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?
db=refseqn;id=NM_005225.3;format=fasta&style=raw
o Kan je gewoon in zoekbalk vanboven ingeven & dan heb je meteen een txt file in je browser
o Je kan format als fasta (= enkel nucleotide) kiezen of als default (= alle informatie)
ENA (TEGENHANGER ENTREZ NUCLEOTIDE): HTTPS://WWW.EBI.AC.UK/ENA/BROWSER/HOME
=> overnacht back-up van entrez gene
• Enkel gebruiken als entrez gene niet werkt.
• Accession number: U… (kan ook ingevoerd worden in Entrez nucleotide)
• Hier haal je EMBL formaten van
PUBMED: HTTPS://PUBMED.NCBI.NLM.NIH.GOV/
=> wetenschappelijke artikels
• Onderaan artikel bij ‘related information’ staat Refseq voor vernoemde genen en proteïnes.
UNIPROT: HTTPS://WWW.UNIPROT.ORG/
=> onderaan Entrez gene onder NM en NP identifiers ook identifier voor uniprot (P…)
• Functie eiwit
• Heel veel informatie over het eiwit
• Downloaden via knop download
o Tekst -> zoals Embl en genbank formaat
o Fasta file
o GFF -> 9 kolommen: eiwit ID, bron, start, stop, en note in de laatste kolom
4