Resumen

Summary Bioinformatics and Systems Biology: Evolution

Puntuación

Vendido

Páginas

Subido en

23-04-2025

Escrito en

2024/2025

Samenvatting van alle commando's gezien in de lessen bij Evolution, voorbeelden, afbeeldingen van outputs, elke code uitgelegd

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Katholieke Universiteit Leuven (KU Leuven)
Estudio: Biomedische Wetenschappen
Grado: Bioinformatics: Sequence, Structure, Evolution

Todos documentos para esta materia (2)

Información del documento

Subido en: 23 de abril de 2025
Número de páginas: 41
Escrito en: 2024/2025
Tipo: Resumen

Temas

bash
jupyter notebooks
commands
r

Vista previa del contenido

Bio informatics: Evolution

Session 1: Database querying with BLAST, multiple sequence alignment
and phylogenetic reconstruction
Jupyter notebook: exerciseYFV.blast.search.AliView
- AliView: not on the exam
Goal of this session:
- Query GenBank database via web interface and through command line
- Look up sequences similar to a query in a database with BLAST via the
command line
- create and edit a multiple sequence alignment
- reconstruct a phylogeny with the NJ algorithm

Workflow:
- Look up patients virus sequence => YFV sequence
- Fetch a set of highly similar sequences
- Create and edit multiple sequence alignment
- Reconstruct NJ tree
The goal is to investigate the genetic relationship of a patient’s genome with publicly
available sequences (focusing on samples from 1 outbreak), to determine whether
the patient’s virus strain is more closely related to strains from outbreak or if it
represents a district lineage from a different region.
1) Load Python modules to be used
= from Bio.Blast import NCBIWWW, NCBIXML
from Bio import Entrez, SeqIO, Phylo, AlignIO
import pandas as pd
import time, os, alv, re
from IPython.display import Image
*Change nothing*
2) Load interface to run R embedded in a Python process:
= %load_ext rpy2.ipython
3) Load R packages to be used:
= %%R

,library("ape")
4) Define variables for input names that will only will have to be changed once
= baseName = "namefile(without .fasta)"
 Base name is main part of the filename without details
= folderName = "exercise_xxx"
 Name of directory where these files are stored

5) Push that variable to R using Rpush
= %Rpush baseName folderName
6) Check if this works:
= %%R
print(paste0("baseName: ", baseName, " -- folderName: ", folderName))
7) Let’s check this for Bash:
= %%bash -s "$baseName" "$folderName"
echo baseName: "$1" -- folderName: "$2"
8) Create a folder where analysis files can be saved:
= %%bash -s "$folderName"
= mkdir "$1"
 If folder already exists, try $mkdir -p "$1
Database querying
First step = look up patient’s sequence in GenBank with accession number MF347613
=> download it => browse to https://www.ncbi.nlm.nih.gov/nuccore and save file as
fasta-format => upload in Jupyter
(1) Go to GenBank: paste accession number in search bar

(2) Download as fasta & change name
(3) Upload in Jupyter

,1) Move uploaded file into the folder for this exercise or upload sequence file to the
exercise folder (skip this command then)
= %%bash -s "$baseName" "$folderName"
mv ${1}.fasta ${2}/${1}.fasta

2) Change directory to exercise’s folder
= os.chdir(folderName)
3) Use query sequence to fetch the 100 most similar sequence available in NCBI
nucleotide database:
= inputFile = baseName+'.fasta' #your sequence file
query = SeqIO.read(inputFile, format="fasta") #Read sequence
search_type = "blastn"
database = "nt"
max_nr_hits = 100 #Adjust if you want more/less results
before = round(time.time(), ndigits=0)
blast_stream = NCBIWWW.qblast(program = search_type,
database = database,
sequence = query.seq,
alignments = 1,
hitlist_size = max_nr_hits)
after = round(time.time(), ndigits=0)
text = f"The BLAST search took {after - before} seconds."
print(text)
 This script performs BLAST search using NCBIWWW.qblast function from
Biopython, which queries the NCBI BLAST database
 Defines input file
 Reads query sequence: uses SeqIO.read() from Biopython to read a FASTA file
& stores the sequence object in query
 Uses BLASTN for nucleotide sequence comparison
 Searches against NCBI’s nucleotide database
 Limits results to 100 matches
 Stores current time before BLAST search starts
 NCBIWWW.qblast() submits the query sequence (query.seq) to NCBI’s online
BLAST service

,  alignments = 1 → Retrieves only the top alignment per hit
 hitlist_size = max_nr_hits → Limits the result list to 100 hits
 captures the time after the BLAST search finishes
 computes time taken for the BLAST search
In Python, a stream refers to a sequence of data elements, often used for input and
output operations. In this case, blast_stream holds the result of our search
4) Save the result to file:
= blastn_result_file = baseName+".blastn.result.xml"
with open(blastn_result_file, 'w') as file:
file.write(blast_stream.read())
 Saves BLAST search results in a XML file
5) Extract the sequences in fasta format from the BLAST search results
=all_data_fasta = baseName+".blastn.result.fasta"

with open(blastn_result_file) as result_handle:
blast_records = NCBIXML.parse(result_handle)
with open(all_data_fasta, 'w') as fasta_handle:
for blast_record in blast_records:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
db_seq = hsp.sbjct
db_id = alignment.accession
fasta_handle.write(f">{db_id}\n{db_seq}\n")
 Extracts the BLAST search results from previously saved XML file and saves
the matched sequences in FASTA format

Multiple sequence alignment, visualization and trimming
1) [Bash]: Check whether required files are present
= %%bash -s "$all_data_fasta"
ls | grep ${1}
 Should give you the .blastn.result.fasta file
2) [Bash]: perform alignment using mafft: mafft –auto inputFile.fas > outputFile.fas

$12.52

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

noavdn

3.5

(2)

Conoce al vendedor

noavdn Katholieke Universiteit Leuven

Ver perfil

Seguir

Vendido

Miembro desde

1 año

Número de seguidores

Documentos

Última venta

22 horas hace

3.5

2 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller noavdn. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $12.52. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now