100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Solution practical 3 to 5 - data mining

Rating
-
Sold
5
Pages
117
Uploaded on
13-09-2024
Written in
2024/2025

Complete solution (and assignment) of practicum 3 to 5 in English. Also a summary of the accompanying ppt. Includes codes and screenshots of the solutions in R. Everything is very quick and easy to find and clearly divided into the various practicals. This is the part that comes to the final exam.

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
September 13, 2024
Number of pages
117
Written in
2024/2025
Type
Summary

Subjects

Content preview

PRACTICAL 3
SUMMERY SLIDES

Automation, add-on packages and reshaping

AUTOMATION OF REPETITIVE ANALYSES

Dataset

• Genetic analysis of age-related hearing impairment

• Phenotype : Z-score

- Standardized measure for hearing quality

- Lower Z-score = better

• Association between Zscore and genotype

- SNP genotype : aa, ab, bb

• ANOVA : SNP GT = categorical

• Regression : SNP GT = 0,1,2

• Find SNP associated with phenotype



2 ways to analyse this

1) consider genotype categorical variable and do one way anova

2) Consider genotype to be a nr (count nr of rare alleles) (bb has 2 rare alleles)



Repetitive analysis

• Regression :

- one dependent numeric variable (Y)

- several independent (X) variables

• Regress Y on all separate X-variables

- à several times simple linear regression/ANOVA




1

,Y Xvar1 Xvar2 Xvar3 Xvar4
5.71 aa ab aa ab
-0.93 aa ab ab aa
2.58 ab bb bb aa


In columns the genotype

In the regression, the Y, the numeric phenotype = dependent variable (outcome)

We want to know p, which associations are significant

Don’t run all individually (anova) à smarter ways to do it trough R

R is useful for repetitive analysis




For the first Xvariable
myModel<- lm(Y ~ Xvar1)

For the second Xvariable
myModel<- lm(Y ~ Xvar2)

For the third X variable
myModel<- lm(Y ~ Xvar3)




• 1 analysis:

• myModel<- lm(Y ~ allXvars[,1])


For the first X variable
myModel<- lm(Y ~ allXvars[,1])

For the second X variable
myModel<- lm(Y ~ allXvars[,2])

For the third X variable
myModel<- lm(Y ~ allXvars[,3])



Could also run
i<-1
myModel<- lm(Y ~ allXvars[ ,i])

i<-2
myModel<- lm(Y ~ allXvars[ ,i])

i<-3

2

, myModel<- lm(Y ~ allXvars[ ,i])

Than have everytime the same formula, but you change i
Can ask R that i goes trough all the values

For-loop

• Let i run through all values from i to n, for all commands within the {curly braces}

for(i in 1:3) {

myModel<- lm(Y ~ allXvars[ ,i])

}



In curly brackets put value that i needs to run trough

Will be executed for i= 1, i=2, i=3



Parameter estimates in a loop


• Suppose in each step you estimate a parameter or a p-value
• First create empty vector to save the output from each step
p.value<-rep(NA,3)
for(i in 1:3) {
myModel<-lm(Y ~ allXvars[ ,i])
p.value[i]<-anova(myModel)[1,5]


We need p-values

Have to extract p value from that model, but we don’t want to overwrite the p-value

Have to extract and store it somewhere

First create an empty vector

Rep=repeat

Before start loop, create empty vector with 3 spaces

In the first loop you assign the result of the p value to the first place of the empty vector

When R has finished the loop, the vector has been filled and contains all p values of all loops



Automation with new function
• Piece of script, to be carried out multiple times
Data<- read.table(“input_1.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_1.txt”)

• 3 similar input files

3

, Data<- read.table(“input_1.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_1.txt”)
Data<- read.table(“input_2.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_2.txt”)
Data<- read.table(“input_3.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_3.txt”)




• Wrap the piece of code into a new function
- Give name to function
- List necessary arguments
- Use argument names in the code

doMyAnalysis<- function(inputfile,outputfile) {
Data<- read.table(inputfile)
p.value<-t.test(…)
write.table(p.value,file=outputfile)



To run the new function


• Initialize the new function
- Select code and run
- Each time you restart R
- No output
• Run
doMyAnalysis(“input_1.txt”,”output_1.txt”)
doMyAnalysis(“input_2.txt”,”output_2.txt”)
doMyAnalysis(“input_3.txt”,”output_3.txt”)




Further automation


• Using a list object
- Consists of other objects (elements)
- Individual elements accessed by double square brackets
list.object[[i]]
• Here
- Put input files and/or output files in a list
- 1 list-objects, containing the 3 input-dataframes




Combine list and for-loop


• First create empty list
Mylist<-vector(“list”,n.elements)



• Read in the 3 inputfiles

4

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
paulienmeulemeester Universiteit Antwerpen
Follow You need to be logged in order to follow users or courses
Sold
32
Member since
2 year
Number of followers
5
Documents
9
Last sold
8 months ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions