100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Summary Bioinformatics and Systems Biology: Evolution

Puntuación
-
Vendido
-
Páginas
41
Subido en
23-04-2025
Escrito en
2024/2025

Samenvatting van alle commando's gezien in de lessen bij Evolution, voorbeelden, afbeeldingen van outputs, elke code uitgelegd

Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
23 de abril de 2025
Número de páginas
41
Escrito en
2024/2025
Tipo
Resumen

Temas

Vista previa del contenido

Bio informatics: Evolution



Session 1: Database querying with BLAST, multiple sequence alignment
and phylogenetic reconstruction
Jupyter notebook: exerciseYFV.blast.search.AliView
- AliView: not on the exam
Goal of this session:
- Query GenBank database via web interface and through command line
- Look up sequences similar to a query in a database with BLAST via the
command line
- create and edit a multiple sequence alignment
- reconstruct a phylogeny with the NJ algorithm

Workflow:
- Look up patients virus sequence => YFV sequence
- Fetch a set of highly similar sequences
- Create and edit multiple sequence alignment
- Reconstruct NJ tree
The goal is to investigate the genetic relationship of a patient’s genome with publicly
available sequences (focusing on samples from 1 outbreak), to determine whether
the patient’s virus strain is more closely related to strains from outbreak or if it
represents a district lineage from a different region.
1) Load Python modules to be used
= from Bio.Blast import NCBIWWW, NCBIXML
from Bio import Entrez, SeqIO, Phylo, AlignIO
import pandas as pd
import time, os, alv, re
from IPython.display import Image
*Change nothing*
2) Load interface to run R embedded in a Python process:
= %load_ext rpy2.ipython
3) Load R packages to be used:
= %%R

,library("ape")
4) Define variables for input names that will only will have to be changed once
= baseName = "namefile(without .fasta)"
 Base name is main part of the filename without details
= folderName = "exercise_xxx"
 Name of directory where these files are stored


5) Push that variable to R using Rpush
= %Rpush baseName folderName
6) Check if this works:
= %%R
print(paste0("baseName: ", baseName, " -- folderName: ", folderName))
7) Let’s check this for Bash:
= %%bash -s "$baseName" "$folderName"
echo baseName: "$1" -- folderName: "$2"
8) Create a folder where analysis files can be saved:
= %%bash -s "$folderName"
= mkdir "$1"
 If folder already exists, try $mkdir -p "$1
Database querying
First step = look up patient’s sequence in GenBank with accession number MF347613
=> download it => browse to https://www.ncbi.nlm.nih.gov/nuccore and save file as
fasta-format => upload in Jupyter
(1) Go to GenBank: paste accession number in search bar




(2) Download as fasta & change name
(3) Upload in Jupyter

,1) Move uploaded file into the folder for this exercise or upload sequence file to the
exercise folder (skip this command then)
= %%bash -s "$baseName" "$folderName"
mv ${1}.fasta ${2}/${1}.fasta


2) Change directory to exercise’s folder
= os.chdir(folderName)
3) Use query sequence to fetch the 100 most similar sequence available in NCBI
nucleotide database:
= inputFile = baseName+'.fasta' #your sequence file
query = SeqIO.read(inputFile, format="fasta") #Read sequence
search_type = "blastn"
database = "nt"
max_nr_hits = 100 #Adjust if you want more/less results
before = round(time.time(), ndigits=0)
blast_stream = NCBIWWW.qblast(program = search_type,
database = database,
sequence = query.seq,
alignments = 1,
hitlist_size = max_nr_hits)
after = round(time.time(), ndigits=0)
text = f"The BLAST search took {after - before} seconds."
print(text)
 This script performs BLAST search using NCBIWWW.qblast function from
Biopython, which queries the NCBI BLAST database
 Defines input file
 Reads query sequence: uses SeqIO.read() from Biopython to read a FASTA file
& stores the sequence object in query
 Uses BLASTN for nucleotide sequence comparison
 Searches against NCBI’s nucleotide database
 Limits results to 100 matches
 Stores current time before BLAST search starts
 NCBIWWW.qblast() submits the query sequence (query.seq) to NCBI’s online
BLAST service

,  alignments = 1 → Retrieves only the top alignment per hit
 hitlist_size = max_nr_hits → Limits the result list to 100 hits
 captures the time after the BLAST search finishes
 computes time taken for the BLAST search
In Python, a stream refers to a sequence of data elements, often used for input and
output operations. In this case, blast_stream holds the result of our search
4) Save the result to file:
= blastn_result_file = baseName+".blastn.result.xml"
with open(blastn_result_file, 'w') as file:
file.write(blast_stream.read())
 Saves BLAST search results in a XML file
5) Extract the sequences in fasta format from the BLAST search results
=all_data_fasta = baseName+".blastn.result.fasta"


with open(blastn_result_file) as result_handle:
blast_records = NCBIXML.parse(result_handle)
with open(all_data_fasta, 'w') as fasta_handle:
for blast_record in blast_records:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
db_seq = hsp.sbjct
db_id = alignment.accession
fasta_handle.write(f">{db_id}\n{db_seq}\n")
 Extracts the BLAST search results from previously saved XML file and saves
the matched sequences in FASTA format


Multiple sequence alignment, visualization and trimming
1) [Bash]: Check whether required files are present
= %%bash -s "$all_data_fasta"
ls | grep ${1}
 Should give you the .blastn.result.fasta file
2) [Bash]: perform alignment using mafft: mafft –auto inputFile.fas > outputFile.fas
$12.52
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
noavdn Katholieke Universiteit Leuven
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
17
Miembro desde
1 año
Número de seguidores
0
Documentos
14
Última venta
22 horas hace

3.5

2 reseñas

5
0
4
1
3
1
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes