Samenvatting

Summary Statistical computing TUe

Beoordeling

Verkocht

Pagina's

Geüpload op

24-06-2020

Geschreven in

2019/2020

This is a summary of the statistical computing (JMB050) course on the Eindhoven University of Technology. It is based on the lectures in year and the examples that we covered in class. Our teacher was K. Deun. It contains my notes of all the twelve lectures, with some screenshots of the slides.

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Technische Universiteit Eindhoven (TUE)
Studie: Data Science
Vak: Statistical Computing (JBM050)

Alle documenten voor dit vak (3)

Documentinformatie

Geüpload op: 24 juni 2020
Aantal pagina's: 25
Geschreven in: 2019/2020
Type: Samenvatting

Onderwerpen

jbm050
statistical computing
data science
stats
tue
tu

Voorbeeld van de inhoud

Statistical computing: JBM050 Quartile 4
2019-2020

Course organization
Books:
- Monte Carlo simulation and resampling methods for social science (lecture 1-4)
- An introduction to statistical learning (rest)

Lecture 1: Introduction

Learning from your data

Mathematical models for experiments are Analytics solutions and Simulations. For the simulations we
make data with our computer and study those values.

The data generating model in random experiments
Bernoulli distribution
Use the Bernoulli trial when you want the know the probability of, for example, head and coins. To
have a good approximation, you need a very large sample.

The outcomes of a coin toss are two values, so it is a binary variable (values 1 and 0). Each outcome
has a certain π probability; it is a binary random variable. Only 1 toss.
P ( X=1 )=π
P ( X=0 )=1−π

Random variables that are binary are a Bernoulli distribution. → X Bern(π )

Binomial distribution
It is an extension on the Bernoulli distribution, only now you toss several times. If we toss it five
times, our number of trials, n, is 5. The outcome can take values 0, 1, … ,n . The number of successes
is represented by a k.

n!
P ( X=k )= π k ( 1−π )n−k
k ! ( n−k ) !

n!
= n and it counts the number of ways k successes can occur in
The binomial coefficient
k ! ( n−k ) ! k()
a sequence of n trials. It can be described as → X Bin(n , π )

Hypergeometric distribution
The lady tasting tea: There are 8 cups of tea, in the first 4, milk was poured first, and in the last 4, tea
was poured first. She said that she could discriminate which drink was poured first.
She has 6 of the 8 cups right. But what is the probability of guessing 6 of 8 correct if you are just
guessing.

P ¿ ¿ guessing ¿=nr of sequences 6∨more ¿ ¿ =nr ¿ guessing ¿ ¿+ nr ¿ guessing ¿ ¿
total nr of possible sequences n

It is not binomial distributes, because the teacups are dependent of each other.

,Other random experiments
Monty hall paradox:
There are three doors, behind which 2 are goats, and one a car. One door is opened and has a goat
behind it. Now, the player chooses a door, and is asked to switch. Which is better? Switching. The
probability of winning after staying is 1/3 and for switching it is 2/3.

Random experiments with R
Generating data from discrete random variables (rolling dice or tossing coin)
- The sample R function
- Bernoulli, binomial, hypergeometric, multinomial, Poisson distribution

The R function sample:
[ marbles = c(‘red’, ‘blue’, ‘yellow’) ]
[ sample(x = marbles, size=2, replace=False) ]

Bernoulli and Binomial distribution in R
X Bin(n , π) is [ rbinom(n=5, size=1, prob=0.5) ]
To find the number of successes, run this line 5 times and count the number of times it has 4.
^
P ( X=4 )=^π

Pseudo code
Steps to create your own pseudo code:
1. Give the function a name and specify the arguments
2. Define the input and output
3. Show underlying procedure
a. Programming statements
b. Indentation to add clear structure
4. Make sure it returns the correct output

,Lecture 2: Basics: Sampling

Something is normally distributed with a mean and variance. Also
typed as: N ( μ , σ 2 ), they are the model parameters. It looks like this:

In statistics, a (point) estimator is an approximation of a population parameter that uses observed data.
It is also called a statistic. The rule = estimator and its value are the estimate.

Determine the population model and its values: (possibly a test question)
For example, coin tossing experiment:
- Population model: X Bern( π)
- Population parameter of interest: π
k nr .heads up
- Estimator/statistic: ^π = =
n total nr of tosses
- Estimate: ^π =0.53

The sampling distribution
It is the probability distribution of the sample statistic for a given sample size n. It describes how the
sample statistics varies over different samples of size n. the statistic is considered to be a random
variable.
Call random samples: [ rnorm(n=5, mean=x, sd=y) ]
2
σ
We know that the sample average is: ^x N (μ , )
n
The sample distribution of the average of a sample in size n is a normal distribution itself. The
standard deviation of a sample statistic is also called the standard error of that statistic.

Mean, bias, MSE, RE

Data is used to estimate the population parameter θ and yield the estimator θ^ .
The bias is θ−E( θ)^ .
2
The variance is E( ( E ( θ^ ) −θ^ ) ).
Its standard deviation is called the standard error, which is √ variance .

The MSE is the bia s 2+ variance .

, Lecture 3: Monte Carlo Simulation
Sampling distribution of the difference in means: Assume that two samples are drawn independently.
Analytical results are that under the assumptions, the sample distribution of the difference in sample
means x́ 2−x́ 1is a t-distribution with n1 +n 2−2 degrees of freedom, centered at μ2−μ 1 and a standard
2 2
1 1 ( n1 −1 ) s1 + ( n2−1 ) s 2 .
error equal to SE ( x́2 −x́1 ) =S p
√ + with S p=
n1 n 2 √n1+ n2−2

A t-test was invented to test on the difference between two independent samples:

x́ 2− x́ 1
t= t
SE (x́2 −x́1 ) n 1+n 2−2

Central limit theorem:
For large n, the sample mean of X coming from some probability distribution with
σ2
mean μ and variance σ 2, X N ( μ , σ 2 ) is approximately normal with mean μ and variance ,
n
σ2
x́ N ( μ , ).
n

Properties of statistical methods must be established so that the methods may be used with confidence.
Exact derivations are rarely possible. Analytical results may require assumptions. If these assumptions
are violated, the results may not be possible.

We can find out which assumptions are violated with MC Simulation. MC Simulation can be used for
1) QC for estimators and 2) QC for hypothesis testing. All in absence of analytical results.

MC Simulation for properties of estimators:
An estimator has a true sampling distribution under a particular set of conditions. We cannot derive
this true sampling distribution, for this we can use MC.

How to approximate:
1. Generate S independent data sets of given sample size n.
2. Compute the numerical value of the estimator for each data set.
3. If S is large enough, the summary statistics should be good approximations to the true
sampling properties of the estimator under the conditions of interest.

Compute for each of the statistics:
S
´ 1 (k)
- Average θ^ (k)= ∑ θ^ s
s s=1
- Bias ´
θ^ (k)−θ
S
1 ´^ (k ) 2
- Variance ∑ ( θ^ ¿ ¿(k)
s −θ ) ¿ ¿
S−1 s=1
S
1 ´ 2
- MSE ∑ ( θ^ ¿ ¿ (k )−θ ¿ ) ¿ ¿
S S =1

MSE ( θ^ (1))
Relative efficiency: ℜ=
MSE ( θ^ (2))

$5.56

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

datasciencestudent

3.5

(2)

Maak kennis met de verkoper

datasciencestudent Technische Universiteit Eindhoven

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

10 maanden geleden

3.5

2 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper datasciencestudent. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $5.56. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 57429 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen