100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Essay

Bioinformatics essay assignment

Rating
-
Sold
-
Pages
9
Grade
A+
Uploaded on
19-01-2025
Written in
2021/2022

Bioinformatics assignment using data interpretation and graphical analysis on the topic. Submitted for biomedical science, first class grade for final year assignment.










Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
January 19, 2025
Number of pages
9
Written in
2021/2022
Type
Essay
Professor(s)
Unknown
Grade
A+

Content preview

The use of Bioinformatic Tools to Assess the Structure and Function of an
Unidentified Coding and Non-Coding Sequence.19002410


Abstract


Nucleotide sequences 4a and 4b were interpreted with the use of bioinformatic tools, to first find their
identity using NCBI nucleotide BLAST. Aims focused on using a range of bioinformatic tools to
recall the characteristics of each sequence and their function in regulating cellular processes or role in
pathology within the body. Outputs were interpreted relative to the purpose of each tool, providing
information on the structural features of each sequence, or their role in cell signalling. HOTAIR was
revealed to be long non-coding RNA sequence 4a, found to be upregulated in several diseases within
the body. BECN1 was found to be coding sequence 4b, responsible for autophagy, and was closely
associated to other protein subunits within the PI3K complex. Other databases such as
STRING, HuRi, Alphafold, Genevisible and DisGeNET were then used to further characterise
and establish the function of each sequence. Outputs were interpreted relative to the protein-
protein interactions, splice variants, or varying levels of expression of each sequence within a range of
tissues. Prevalence of the gene within each tissue was associated with normal, or abnormal, cellular
function, playing a vital role in medical research. Information provided by
each database informed subsequent investigation into the impact each sequence had within
the Homo sapien genome.


Introduction


Bioinformatics has allowed for a large volume of biological and statistical information to be
processed, stored, and collated in the form of databases. Recall of complex datasets can provide a large
array of information to determine the identity and characteristics of sequences. Of interest was the role
of non-coding and coding nucleotide sequences that were analysed in terms of their homology to a
range of genes, where results demonstrated a variety of differences due to splice variants and their
interactions with structural proteins. Approximately 98% of genetic information can be categorised as
non-coding; demonstrating that an array of information can be explored and deduced from sequences
that do not have the required mechanisms to code for proteins (Perenthaler et al., 2019). A variety
of bioinformatic tools can collectively be used to gain further insight into genomics, whereby
sequences can be analysed to reveal their role in disease presentation, cellular function or signalling
pathways. Notably, the rise of the Human Genome Project has enabled the field of bioinformatics to
gain traction within the scientific community, as an efficient way to store information and establish a
valuable relationship between the related disciplines of mathematics and computer science (Hood,
Rowen, 2013). The availability of a range of bioinformatic tools allowed for a more objective point of
view on the role of a gene within the body, and a clearer understanding of its functional significance.
This is beneficial, as deductions were made from a greater number of resources that had a large
coverage across several databases. The aims of this project were to gain an understanding of the
functionality of two sequences within the human genome, and to depict the key features they
present that lead to their distinction from other elements of the transcriptome. Bioinformatic tools
were used as the basis of research into each sequence, whereby each search result led to further
investigation using other databases, to gain insight into the significance of both non-coding and coding
sequences 4a and 4b.




Methods

1

, A summary of the bioinformatic tools used to obtain structural and functional information
surrounding sequence 4a and 4b during this investigation are summarised in Table 1.

Table 1. List of bioinformatic tools that were used, including their purpose in the characterisation of sequences 4a and 4b. A
range of bioinformatic tools were utilised to determine the structure and function of each sequence, starting with NCBI Nucleotide
BLAST. Each sequence was identified as either coding or non-coding and named according to their homology to
predicted sequences stored within the chosen database. Outputs provided by each tool informed subsequent investigation using
other databases to explore the impact the sequences had in more detail.
Name of Bioinformatic Tool Purpose of Tool
NCBI Nucleotide BLAST To confirm the identity of each sequence compared to several predicted and
experimentally confirmed outputs
STRING To assess the interaction of sequence 4b with other co-regulatory proteins
within protein complexes
HuRI Confirm the coverage of STRING and objectiveness of outputs provided for
sequence 4b by comparing the two databases
Alphafold Visualise the structural features of sequence 4b in relation to its function
Genevisible Used to compare the level of 4a’s expression to an array of tissue types
in healthy samples and in cancer presentation
DisGeNET Compared the role of splice variants within exon and intronic regions
of sequence 4a in disease presentation, whilst determining the chromosome
number it was found on




Results


Non-coding sequence


A total of 11 sequences were recalled from NCBI nucleotide BLAST to confirm the identity of
sequence 4a to be non-coding HOX antisense intergenic RNA (HOTAIR). HOTAIR displayed
100% homology matching the query length of 2158 nucleotides. Searches were refined to genes
specific to the Homo sapien genome only, where only highly similar sequences were selected (Figure
1).




Figure 1. Output from NCBI Nucleotide BLAST using nucleotide sequence 4a (NCBI, 2021). A list of similar sequences to the
query length of sequence 4a were shown and refined to the Homo sapien genome.


Following from the identification of sequence 4a, the extent of HOTAIR expression
within healthy tissue was observed using data from Genevisible (Figure 2).

2
£15.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
elsiebelsy

Get to know the seller

Seller avatar
elsiebelsy Keele University
View profile
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
10 months
Number of followers
0
Documents
9
Last sold
1 week ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions