Samenvatting statistiek in bedrijfswetenschappen
Inhoudsopgave
1 Overzicht verschillende methodes ..................................................................... 4
1.1 Eén variabele .....................................................................................................................5
1.2 Twee of meer variabelen...............................................................................................6
1.3 Tijdreeksen .........................................................................................................................7
2 Introduction to probability .................................................................................... 8
2.1 Definitions of Probability ...............................................................................................8
2.1.1 Definitie Harold Jeffrey ...........................................................................................8
2.1.2 Waarschijnlijkheidsleer ...........................................................................................8
2.2 De stelling van Bayes .....................................................................................................9
2.3 Sensitiveit en specificteit ........................................................................................... 11
2.3.1 Sensitiviteit .............................................................................................................. 11
2.3.2 Specificiteit ............................................................................................................... 11
2.4 Mutlinomial Naive Bayes Classifier ......................................................................... 12
2.4.1 Interactie-effecten................................................................................................. 13
2.4.2 Zero probabilities ................................................................................................... 14
2.5 Law of large numbers.................................................................................................. 14
3 Probability Distributions (Waarschijnlijkheidsverdelingen) ................ 16
3.1 Statistical Measures for Probability Distributions ............................................. 16
3.2 Discrete Distributions (Discrete verdelingen) .................................................... 16
3.2.1 Bernouillie Distribution (Bernouillie verdeling) .......................................... 16
3.2.2 Binomial Distributions (Binominale verdeling) ........................................... 17
3.2.3 Multinomial Distributions (Mutinominaal Distributies) ............................ 18
3.3 Continuous Distributions ............................................................................................ 18
3.3.1 Uniform Distribution (Rectangular Distribution) ........................................ 18
3.3.2 Normal Distribution (Gaussian Distribution) ............................................... 20
3.3.3 Gaussian Naïve Bayes Classifier ...................................................................... 23
4 Descriptive Statistics & Exploratory Data Analysis .................................. 25
4.1 Soorten Data................................................................................................................... 25
4.1.1 Kwalitatieve gegevens ......................................................................................... 25
4.1.2 Kwantitatieve gegevens ...................................................................................... 25
4.2 Kwalitatieve data........................................................................................................... 26
4.2.1 Frequency plot ........................................................................................................ 26
4.2.2 Contingentie tabel ................................................................................................. 27
4.2.3 Binomial classificatie ............................................................................................ 27
1
, 4.3 Kwantitatieve data........................................................................................................ 28
4.3.1 Stem and Leaf plot ................................................................................................ 28
4.3.2 Histogram ................................................................................................................. 29
4.3.3 Kwantielen ................................................................................................................ 30
4.3.4 Central tendency .................................................................................................... 31
4.3.5 Variabiliteit ............................................................................................................... 38
4.3.6 Skewness & Kurtosis ............................................................................................ 39
4.3.7 Notched Boxplot ..................................................................................................... 41
4.3.8 Scatter plot .............................................................................................................. 43
4.3.9 Pearson correlatie .................................................................................................. 43
4.3.10 Phi coëfficiënt .......................................................................................................... 44
4.3.11 Rank Correlation .................................................................................................... 44
4.3.12 Partial Pearson correlation ................................................................................. 45
4.3.13 Simple linear regression ..................................................................................... 46
4.3.14 Quantile-Quantile Plot (QQ-plot) ..................................................................... 47
4.3.15 Probability Plot Correlation Coefficient Plot (PPCC plot) ......................... 49
4.3.16 Kernel Density Estimation .................................................................................. 50
4.3.17 Bivariate Kernel Density plot ............................................................................ 51
4.3.18 Bootstrap plot.......................................................................................................... 52
4.3.19 Survey Scores Rank Order Comparison (SSROC) .................................... 54
4.3.20 Cronbach Alpha ...................................................................................................... 56
4.4 Quantitive Data with Time Dimension (tijdreeksen) ....................................... 57
4.4.1 Equi-distant Time Series ..................................................................................... 57
4.4.2 Time series plot (run sequence plot) ............................................................. 57
4.4.3 Mean plot .................................................................................................................. 58
4.4.4 Blocked bootstrap plot ......................................................................................... 60
4.4.5 Standard Deviation-Mean Plot .......................................................................... 61
4.4.6 Variance Reduction Matrix ................................................................................. 62
4.4.7 (Partial) Autocorrelation Function ................................................................... 64
4.4.8 Periodogram & Cummulatief Periodogram................................................... 66
5 Hypothesis testing .................................................................................................. 71
5.1 Normal distribution....................................................................................................... 71
5.2 Populatie ........................................................................................................................... 71
5.3 De steekproef ................................................................................................................. 73
5.3.1 Theorema 1 .............................................................................................................. 74
5.3.2 Theorema 2 .............................................................................................................. 75
5.4 Centrale limietstelling.................................................................................................. 76
5.5 Statistische tests van het gemiddelde van de populatie met een gekende
variantie....................................................................................................................................... 77
5.6 Statistische tests voor het gemiddelde ................................................................ 79
5.6.1 Kritieke waarde bepalen ..................................................................................... 79
5.6.2 P-waarde bepalen .................................................................................................. 80
5.6.3 Type 2 fout of bèta-fout bepalen ..................................................................... 81
2
, 5.6.4 Steekproefgrootte bepalen ................................................................................ 82
5.6.5 De varinatie is niet gekend ................................................................................ 82
5.7 Statistische tests voor de variantie ....................................................................... 83
5.7.1 Kritische waarde bepalen ................................................................................... 83
5.7.2 P-waarde bepalen .................................................................................................. 84
5.7.3 Betrouwbaarheidsinterval van de variantie de steekproef .................... 84
5.7.4 Betrouwbaarheidsinterval voor de variantie van de populatie ............ 85
5.8 Statistische testen voor populatie proportie ...................................................... 86
5.9 Statistische test voor het verschil tussen gemiddeldes ................................. 86
5.10 Hypothesis Testing for Research Purposes ......................................................... 86
5.10.1 One-sample T-test ................................................................................................ 86
5.10.2 Skewness & Kurtosis test ................................................................................... 89
5.10.3 Paired Two Sample t-test ................................................................................... 91
5.10.4 Wilcoxon Signed-Rank Test ............................................................................... 93
5.10.5 Unpaired Two Sample t-test.............................................................................. 94
5.10.6 Unpaired two sample Welch test ..................................................................... 96
5.10.7 Mann-Whitney U test (Wilcoxon Rank-Sum test) ..................................... 97
5.10.8 Bayesian Two Sample t-Test ............................................................................. 98
5.10.9 Median test based on Notched Boxplots ....................................................... 98
5.10.10 Chi-Squared test for Count Data ................................................................. 98
5.10.11 One Way Analysis of Variance (1-way ANOVA) ................................... 101
5.10.12 Two Way Analysis of Variance (2-way ANOVA) ................................... 104
5.10.13 Testing correlation ........................................................................................... 106
5.10.14 Causaliteit ........................................................................................................... 112
6 Regression models ................................................................................................ 113
6.1 Simple linear Regression Model (enkelvoudig lineair regressiemodel) .. 113
6.1.1 Kleinste kwadraten criteria .............................................................................. 113
6.1.2 Ordinary least squares for simple linear regression .............................. 113
6.1.3 Statistische interferentie ................................................................................... 115
6.2 Multiple Linear Regression Model (MLRM) ......................................................... 117
6.3 Hypothesis Testing with Linear Regression Models ....................................... 119
6.3.1 On the relationship between Linear Regression models and ANOVA
119
6.3.2 Model Assumpties ................................................................................................ 122
6.3.3 Testing Regression Coefficients ..................................................................... 124
6.3.4 Testing regression model assumptions ....................................................... 126
6.3.5 Misspecification in Regression ........................................................................ 137
7 Tijdreeksen .............................................................................................................. 138
7.1 Extrapolatie ................................................................................................................... 138
7.2 Decompositie van tijdreeksen ................................................................................ 138
7.2.1 Classical Decomposition .................................................................................... 138
7.2.2 Decompositie van Loess .................................................................................... 141
3
, 7.2.3 Decomposition by Structural Time Series Models ................................... 142
7.3 Ad Hoc Forecasting of Time Series ...................................................................... 144
7.3.1 Regression analysis of time series ................................................................ 144
7.3.2 Smoothing models............................................................................................... 149
8 Univariate Box-Jenkins’ analyse ..................................................................... 155
8.1 Theoretische concepten ............................................................................................ 155
8.1.1 Stationariteit .......................................................................................................... 155
8.2 Een tijdreeks stationair maken .............................................................................. 156
8.2.1 Stationariteit in het gemiddelde .................................................................... 156
8.2.2 Stationariteit in de variantie............................................................................ 160
8.2.3 Waarom hebben we stationariteit nodig?................................................... 163
8.3 Het bepalen van ARMA parameters ..................................................................... 163
8.3.1 Het AR(1) model .................................................................................................. 163
8.3.2 Het AR(2) model .................................................................................................. 164
8.3.3 AR-model algemeen (AR(p)proces) .............................................................. 164
8.3.4 Het MA(1) model.................................................................................................. 164
8.3.5 Het volledige ARIMA model ............................................................................. 164
1 OVERZICHT VERSCHILLENDE METHODES
4
Inhoudsopgave
1 Overzicht verschillende methodes ..................................................................... 4
1.1 Eén variabele .....................................................................................................................5
1.2 Twee of meer variabelen...............................................................................................6
1.3 Tijdreeksen .........................................................................................................................7
2 Introduction to probability .................................................................................... 8
2.1 Definitions of Probability ...............................................................................................8
2.1.1 Definitie Harold Jeffrey ...........................................................................................8
2.1.2 Waarschijnlijkheidsleer ...........................................................................................8
2.2 De stelling van Bayes .....................................................................................................9
2.3 Sensitiveit en specificteit ........................................................................................... 11
2.3.1 Sensitiviteit .............................................................................................................. 11
2.3.2 Specificiteit ............................................................................................................... 11
2.4 Mutlinomial Naive Bayes Classifier ......................................................................... 12
2.4.1 Interactie-effecten................................................................................................. 13
2.4.2 Zero probabilities ................................................................................................... 14
2.5 Law of large numbers.................................................................................................. 14
3 Probability Distributions (Waarschijnlijkheidsverdelingen) ................ 16
3.1 Statistical Measures for Probability Distributions ............................................. 16
3.2 Discrete Distributions (Discrete verdelingen) .................................................... 16
3.2.1 Bernouillie Distribution (Bernouillie verdeling) .......................................... 16
3.2.2 Binomial Distributions (Binominale verdeling) ........................................... 17
3.2.3 Multinomial Distributions (Mutinominaal Distributies) ............................ 18
3.3 Continuous Distributions ............................................................................................ 18
3.3.1 Uniform Distribution (Rectangular Distribution) ........................................ 18
3.3.2 Normal Distribution (Gaussian Distribution) ............................................... 20
3.3.3 Gaussian Naïve Bayes Classifier ...................................................................... 23
4 Descriptive Statistics & Exploratory Data Analysis .................................. 25
4.1 Soorten Data................................................................................................................... 25
4.1.1 Kwalitatieve gegevens ......................................................................................... 25
4.1.2 Kwantitatieve gegevens ...................................................................................... 25
4.2 Kwalitatieve data........................................................................................................... 26
4.2.1 Frequency plot ........................................................................................................ 26
4.2.2 Contingentie tabel ................................................................................................. 27
4.2.3 Binomial classificatie ............................................................................................ 27
1
, 4.3 Kwantitatieve data........................................................................................................ 28
4.3.1 Stem and Leaf plot ................................................................................................ 28
4.3.2 Histogram ................................................................................................................. 29
4.3.3 Kwantielen ................................................................................................................ 30
4.3.4 Central tendency .................................................................................................... 31
4.3.5 Variabiliteit ............................................................................................................... 38
4.3.6 Skewness & Kurtosis ............................................................................................ 39
4.3.7 Notched Boxplot ..................................................................................................... 41
4.3.8 Scatter plot .............................................................................................................. 43
4.3.9 Pearson correlatie .................................................................................................. 43
4.3.10 Phi coëfficiënt .......................................................................................................... 44
4.3.11 Rank Correlation .................................................................................................... 44
4.3.12 Partial Pearson correlation ................................................................................. 45
4.3.13 Simple linear regression ..................................................................................... 46
4.3.14 Quantile-Quantile Plot (QQ-plot) ..................................................................... 47
4.3.15 Probability Plot Correlation Coefficient Plot (PPCC plot) ......................... 49
4.3.16 Kernel Density Estimation .................................................................................. 50
4.3.17 Bivariate Kernel Density plot ............................................................................ 51
4.3.18 Bootstrap plot.......................................................................................................... 52
4.3.19 Survey Scores Rank Order Comparison (SSROC) .................................... 54
4.3.20 Cronbach Alpha ...................................................................................................... 56
4.4 Quantitive Data with Time Dimension (tijdreeksen) ....................................... 57
4.4.1 Equi-distant Time Series ..................................................................................... 57
4.4.2 Time series plot (run sequence plot) ............................................................. 57
4.4.3 Mean plot .................................................................................................................. 58
4.4.4 Blocked bootstrap plot ......................................................................................... 60
4.4.5 Standard Deviation-Mean Plot .......................................................................... 61
4.4.6 Variance Reduction Matrix ................................................................................. 62
4.4.7 (Partial) Autocorrelation Function ................................................................... 64
4.4.8 Periodogram & Cummulatief Periodogram................................................... 66
5 Hypothesis testing .................................................................................................. 71
5.1 Normal distribution....................................................................................................... 71
5.2 Populatie ........................................................................................................................... 71
5.3 De steekproef ................................................................................................................. 73
5.3.1 Theorema 1 .............................................................................................................. 74
5.3.2 Theorema 2 .............................................................................................................. 75
5.4 Centrale limietstelling.................................................................................................. 76
5.5 Statistische tests van het gemiddelde van de populatie met een gekende
variantie....................................................................................................................................... 77
5.6 Statistische tests voor het gemiddelde ................................................................ 79
5.6.1 Kritieke waarde bepalen ..................................................................................... 79
5.6.2 P-waarde bepalen .................................................................................................. 80
5.6.3 Type 2 fout of bèta-fout bepalen ..................................................................... 81
2
, 5.6.4 Steekproefgrootte bepalen ................................................................................ 82
5.6.5 De varinatie is niet gekend ................................................................................ 82
5.7 Statistische tests voor de variantie ....................................................................... 83
5.7.1 Kritische waarde bepalen ................................................................................... 83
5.7.2 P-waarde bepalen .................................................................................................. 84
5.7.3 Betrouwbaarheidsinterval van de variantie de steekproef .................... 84
5.7.4 Betrouwbaarheidsinterval voor de variantie van de populatie ............ 85
5.8 Statistische testen voor populatie proportie ...................................................... 86
5.9 Statistische test voor het verschil tussen gemiddeldes ................................. 86
5.10 Hypothesis Testing for Research Purposes ......................................................... 86
5.10.1 One-sample T-test ................................................................................................ 86
5.10.2 Skewness & Kurtosis test ................................................................................... 89
5.10.3 Paired Two Sample t-test ................................................................................... 91
5.10.4 Wilcoxon Signed-Rank Test ............................................................................... 93
5.10.5 Unpaired Two Sample t-test.............................................................................. 94
5.10.6 Unpaired two sample Welch test ..................................................................... 96
5.10.7 Mann-Whitney U test (Wilcoxon Rank-Sum test) ..................................... 97
5.10.8 Bayesian Two Sample t-Test ............................................................................. 98
5.10.9 Median test based on Notched Boxplots ....................................................... 98
5.10.10 Chi-Squared test for Count Data ................................................................. 98
5.10.11 One Way Analysis of Variance (1-way ANOVA) ................................... 101
5.10.12 Two Way Analysis of Variance (2-way ANOVA) ................................... 104
5.10.13 Testing correlation ........................................................................................... 106
5.10.14 Causaliteit ........................................................................................................... 112
6 Regression models ................................................................................................ 113
6.1 Simple linear Regression Model (enkelvoudig lineair regressiemodel) .. 113
6.1.1 Kleinste kwadraten criteria .............................................................................. 113
6.1.2 Ordinary least squares for simple linear regression .............................. 113
6.1.3 Statistische interferentie ................................................................................... 115
6.2 Multiple Linear Regression Model (MLRM) ......................................................... 117
6.3 Hypothesis Testing with Linear Regression Models ....................................... 119
6.3.1 On the relationship between Linear Regression models and ANOVA
119
6.3.2 Model Assumpties ................................................................................................ 122
6.3.3 Testing Regression Coefficients ..................................................................... 124
6.3.4 Testing regression model assumptions ....................................................... 126
6.3.5 Misspecification in Regression ........................................................................ 137
7 Tijdreeksen .............................................................................................................. 138
7.1 Extrapolatie ................................................................................................................... 138
7.2 Decompositie van tijdreeksen ................................................................................ 138
7.2.1 Classical Decomposition .................................................................................... 138
7.2.2 Decompositie van Loess .................................................................................... 141
3
, 7.2.3 Decomposition by Structural Time Series Models ................................... 142
7.3 Ad Hoc Forecasting of Time Series ...................................................................... 144
7.3.1 Regression analysis of time series ................................................................ 144
7.3.2 Smoothing models............................................................................................... 149
8 Univariate Box-Jenkins’ analyse ..................................................................... 155
8.1 Theoretische concepten ............................................................................................ 155
8.1.1 Stationariteit .......................................................................................................... 155
8.2 Een tijdreeks stationair maken .............................................................................. 156
8.2.1 Stationariteit in het gemiddelde .................................................................... 156
8.2.2 Stationariteit in de variantie............................................................................ 160
8.2.3 Waarom hebben we stationariteit nodig?................................................... 163
8.3 Het bepalen van ARMA parameters ..................................................................... 163
8.3.1 Het AR(1) model .................................................................................................. 163
8.3.2 Het AR(2) model .................................................................................................. 164
8.3.3 AR-model algemeen (AR(p)proces) .............................................................. 164
8.3.4 Het MA(1) model.................................................................................................. 164
8.3.5 Het volledige ARIMA model ............................................................................. 164
1 OVERZICHT VERSCHILLENDE METHODES
4