100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary of Data Science in Biomedicine

Rating
-
Sold
-
Pages
41
Uploaded on
08-10-2022
Written in
2022/2023

Summary for the subject of Data Science in Biomedicine. Every subject compulsory for the test is summarized.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 8, 2022
Number of pages
41
Written in
2022/2023
Type
Summary

Subjects

Content preview

Summary Data Science in Biomedicine

College 1: Introduction to Data Science in Biomedicine
Datum: 26-09-2022

- Patients Data collection -> Biomedical data: Electronic Health Record (EHR) and
Omics -> Personalised Health Data Analysis: Large Volume Data, Data
Management and High Performance Computing

- Translate large data sets to something you can understand and discuss

There are many types of (big) data available:
- Numerical
- Textual
- Categorical
- Imaging
- Clinical
- Demographic
- Psychosocial
- Lifestyle
- Environmental
- Genomic
- DNA
- Genes
- proteins
- RNA
- SNPs
- ncRNA
-Splice variants
-RNA expression levels

Next Generation Sequencing (NGS)
- 1 illimina NovaSeq6000 run will read 6,000,000,000,000 (6,000,000,000 kb,
6,000,000 Mb, 6,000Gb, 6Tb) bases in ~44hr (computers and software is necessary)
- Bioinformatics pipelines, e.g., Analyzing NGS data
Reference mapping -> transcript assembly, comparison, merging -> detection of
differentially expressed genes/transcripts (understand input and output of programs,
know your statistics, modify the graphical output).

Using R or Python
R -> Retrieve data from a database, apply statistical analyses and visualize results
Python -> What if the data is in a wrong format then write a small Python script

,R vs Python
- R is dedicated to statistics
- R is very popular in research
- Many good libraries for R; Genomics, GWAS, Proteomics, Transcriptomics,
Metabolomics etc.
- R is not a real programming language but more a statistical scripting tool
- Python is easier and much better in handling text files and data text files
- R and Python are slower than C++
- Although loads of people R there will be a decline so why still learn?

R
- open source package for Statistics
- most popular statistics program in bioinformatics
- Also popular -> Python data analysis library - pandas
- MATLAB

R vs Excel
- In excel you can load data by opening a file or copy paste a data table
- You can edit this data in excel
- You can NOT edit data in R

R Graphics
- popular Graphics library is ggplot2 (also in Python)
- you can also log the data by log(my_data)
- How to plot multiple classes: multiple_classes <- c(“N”, “O”, “P”) and
my_multi_subset <- subset(my_annotated_subset, classID%in% multiple_classes
- C() is a list
- to add dimensional data to the graph, often the graphs are plotted in a matrix
- You have: Script, Data Sets, Text output and graphic output

,College 2: Data Science in Biomedicine Basis Statistics 1
Datum: 27-09-2022

What is statistics?
- Why do we need statistics?
- when difference?
- p-value?
- impact of risk
- identify problems
- where does the data come from?
- which data and conclusions are trustworthy
- properties?
- Reliable p-value

Measurements
- experiments -> variation
- variation between persons, equipments and time of the day
- define the experiments properly
- what is the main source of variation
- after standardization; do we always get exactly the same value
- measurements show variation!

P-value
- a p- value is the probability of a an observed result
- 0.05
- x axis = set of possible results
- y-axis is probability density
- same statistics, same p-value, different “impact of risk”
- you can calculate p-values but it never tells you if it’s good or bad
- especially in Biomedical sciences this can be an ethical discussion: Risk for
treating/not treating patients and until which age should you treat a patient
- 0.05 is a good starting point but always evaluate this assumptiom
- p-value cutoff = This means that, if your null hypothesis is indeed correct
and there is no difference between the groups, the result that you
obtained is very rare. You would expect to obtain such a result fewer than 1
in 20 times if you collected samples over and over again.

Generating data
- A statistician want: a good designed study, trustworthy data and many
replicates
- a statistician know how to: analyze data and calculate p-values
- a statistician does not know; detailed theoretical background, impact of risk
(threshold) and potential pitfalls.

, Some basic statistics in this course
- t-test
- linear regression
- permutation testing
- FDR testing
- Fischer’s exact test
- Chi-squared test
- Pearson’s vs Spearman correlation
- PCA

T-statistic
- Compares two data sets and tells you if they are different from each other
- e.g. compare two groups, one treated with a drug the other with a placebo
- Pearson 1857
- Fisher 1890
- Neyman 1894 (Random stats)
- Bayes 1702 (probability stats)
- A t-test is a statistical test that is used to compare the means of two
groups. It is often used in hypothesis testing to determine whether a process
or treatment actually has an effect on the population of interest, or whether
two groups are different from one another

Types of T test
1. independent Samples: compares the means for two independent groups
2. Paired Samples: compares means from the same group (e.g. at different time
points
3. One: test the mean of a single group against a known mean (a standard or
reference

Paired data
- group of mice (8) before and after albumin treatment
- the null hypothesis is that the pairwise difference between the two tests is
equal (h0:μd =0)
$7.30
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
willemdevries99

Get to know the seller

Seller avatar
willemdevries99 Rijksuniversiteit Groningen
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
3 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions