Exam (elaborations)

Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1st Edition by Alan Agresti, Maria Kateri

Name: Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1st Edition by Alan Agresti, Maria Kateri
SKU: doc_2302195
Rating: 4.33 (3 reviews)
Author: tutorsection

Rating

4.3

(3)

Sold

Pages

106

Grade

A+

Uploaded on

26-01-2023

Written in

2022/2023

Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1st Edition by Alan Agresti, Maria Kateri Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1e by Alan Agresti, Maria Kateri

Show more Read less

Institution

Foundations Of Statistics

Module

Foundations of Statistics

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Foundations of Statistics
Module: Foundations of Statistics

Document information

Uploaded on: January 26, 2023
Number of pages: 106
Written in: 2022/2023
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

exam bank
solutions manual
instructor manual
exam manual
mcqs
test questions
test bank

Content preview

Solutions Manual for Foundations of Statistics for Data
Scientists With R and Python, 1e by Alan Agresti, Maria Kater
(All Chapters)

Chapter 1
1.1 (a) (i) an individual voter, (ii) the 1882 voters in the exit poll, (iii) the 11.1 million
people who voted
(b) Statistic: Sample percentage of 52.5% who voted for Feinstein
Parameter: Population percentage of 54.2% who voted for Feinstein
1.2 (a) Use a command such as in R,
> Students <- read.table("
+ header=TRUE)

(b) (i) What proportion of the students in this sample responded yes for whether
abortion should be legal in the first three months; (ii) Same question but for some
population, such as all social science graduate students at the University of Florida
1.3 (a) Quantitative; (b) categorical; (c) categorical; (d) quantitative
1.4 (a) Religious aﬀiﬀiliation (possible categories Christianity, Islam, Jewish,
Hinduism,
Buddhism, other, none)
(b) Body/mass index (BMI = (weight in kg)/(height in meters)2
(c) Number of children in family
(d) Height of a person
1.5 Ordinal, because categories have natural ordering
1.6 (a) College board score (e.g., SAT between 200 and 800)
(b) Time spent in college (measure by integer number of years)
1.7 In R, for students numbered 00001 to 52000,
> sample(1:52000, 10)
[1] 1687 18236 26783 35366 14244 11429 20973 31436 48476

1.8 (a) observational, (b) experiment (c) observational, (d) experiment
1.9 Median = 4, mode = 2, expect mean larger than median because distribution is skewed
right
1.10 (a)

3925

1

, 2 Solutions Manual: Foundations of Statistical Science for Data Scientists

> Carbon <- read.table("http://stat4ds.rwth-aachen.de/data/Carbon_West.dat",
+ header=TRUE)
> breaks <- seq(2.0, 18.0, by=2.0)
> freq <- table(cut(Carbon$CO2, breaks, right=FALSE))
> cbind(freq, freq/nrow(Carbon))
freq
[2,4) 4 0.11428571
[4,6) 15 0.42857143
[6,8) 7 0.20000000
[8,10) 6 0.17142857
[10,12) 0 0.00000000
[12,14) 0 0.00000000
[14,16) 2 0.05714286
[16,18) 1 0.02857143
> hist(Carbon$CO2)

(b) Mean = 6.72, median = 5.90, standard deviation = 3.36
mean(Carbon$CO2); median(Carbon$CO2); sd(Carbon$CO2)

1.11 Skewed to the right, because the mean is much larger than the median.
1.12 Number of times you went to a gym in the last week; median = 0 if more than half of
persons in the sample never went.
1.13 (a) 63,000 to 75,000; (b) 57,000 to 81,000; (c) 51,000 to 87,000. 100,000 would be unusual
because it is more than 5 standard deviations above the mean.
1.14 A quarter of the states had less that 6% without insurance, and a quarter had more than
9.5% without insurance. Half the states had between 6% and 9.5% without insurance,
encompassing an interquartile range of 3.5%.
1.15 Skewed to the right, because distances of median from LQ and minimum are less than
from UQ and maximum.
1.16 (a) The percentages in 2018 (with the default composite weight) for (0, 1, 2, 3, 4, 5,
6, ≥ 7) are (9.4, 24.8, 24.9, 14.8, 10.7, 5.3, 3.5, 6.7), somewhat skewed to the right.
(b) Mode = 2, median = 2
(c) Mean = 2.8, standard deviation = 2.6. The lowest possible observation is only
slightly more than a standard deviation below the mean, whereas in bell-shaped
distributions, observations can occur two or three standard deviations from the
mean in each direction.
1.17 > Murder <- read.table("http://stat4ds.rwth-aachen.de/data/Murder.dat", header=TRUE)
> Murder1 <- Murder[Murder$state!="DC",] # data frame without D.C.

(a) Mean = 4.87, standard deviation = 2.59
> mean(Murder1$murder); sd(Murder1$murder)

(b) Minimum = 1.0, LQ = 2.6, median = 4.85, UQ = 6.2, maximum = 12.4, somewhat
skewed right
> summary(Murder1$murder); boxplot(Murder1$murder)

(c) Repeat the analysis above for Murder1$murder. The DC is a large outlier, causing
the mean to increase (from 4.87 to 5.25) and the range to increase dramatically
(from 11.4 to 23.2).

1.18 (a) Histogram is skewed right.

, Solutions Manual: Foundations of Statistical Science for Data Scientists 3

> Income <- read.table("http://stat4ds.rwth-aachen.de/data/Income.dat",
+ header=TRUE); attach(Income)
> hist(income)

(b) Five number summary is min. = 16, lower quartile = 22, median = 30, upper
quartile = 465, max. = 120; also mean = 37.52 and standard deviation = 20.67.
> summary(income); sd(income)

(c) Density approximation with default bandwidth = 6.85 is skewed right. Increasing
the bandwidth (such as to 12) makes the curve smoother and bell-shaped, but still
skewed. Decreasing it (such as to 3) makes it much bumpier and probably a poorer
portrayal of a corresponding population distribution.
> plot(density(income)) # default bandwidth = 6.85
> plot(density(income, bw=12))

(d) > boxplot(income ~ race, xlab="Income", horizontal=TRUE)
> tapply(income, race, summary)
$B
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.00 19.50 24.00 27.75 31.00 66.00
$H
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.0 20.5 30.0 31.0 32.0 58.0
$W
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.00 24.00 37.00 42.48 50.00 120.00
> install.packages("tidyverse")
> library(tidyverse)
> Income %>% group_by(race) %>% summarize(n=n(),mean=mean(income),sd=sd(income))
race n mean sd
1 B 16 27.8 13.3
2 H 14 31 12.8
3 W 50 42.5 22.9

1.19 (a) Highly skewed right
> Houses <- read.table("http://stat4ds.rwth-aachen.de/data/Houses.dat",
+ header=TRUE); attach(Houses)
> PriceH <- hist(price); hist(price) # save histogram to use its breaks
> breaks <- PriceH$breaks # breaks used in histogram
> freq <- table(cut(Houses$price,breaks, right=FALSE))
> cbind(freq,freq/nrow(Houses)) # frequency table (not shown)

(b) y = 233.0, s = 151.9; 85%, not close to 68% because not bell-shaped but highly
skewed
> length(case[mean(price)-sd(price)<price & price<mean(price+sd(price)]) /
+ nrow(Houses)

(c) The boxplot shows many large observations that are outliers.
> boxplot(price)

(d) > tapply(Houses$price, Houses$new, summary)
$`0`
Min. 1st Qu. Median Mean 3rd Qu. Max.
31.5 135.0 190.8 207.9 240.0 880.5
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
158.8 256.9 427.5 436.4 519.7 866.2

New homes tend to have higher selling prices.

1.20 (a) Clear trend that price tends to increase as size increases.

, 4 Solutions Manual: Foundations of Statistical Science for Data Scientists

> plot(size, price)

(b) 0.834, strong positive association
> cor(size, price)

(c) Predicted price = −76.39 + 0.19(size), which is 113.5 thousand dollars at 1000
square feet and 683.2 thousand dollars at 4000 square feet.
> summary(lm(price ~ size)) # linear model: read the coefficients estimates
> pred <- function(x){-76.3894+0.1899*x}; pred(1000); pred(4000)

1.21 Correlation = 0.278 (positive but weak), predicted college GPA is 2.75 + 0.22(high
school GPA), which is 3.6 for high school GPA of 4.0.
1.22 > Happy <- read.table("http://stat4ds.rwth-aachen.de/data/Happy.dat", header=TRUE)
> Happiness <- factor(Happy$happiness); Marital <- factor(Happy$marital)
> levels(Happiness) <- c("Very happy", "Pretty happy", "Not too happy")
> levels(Marital) <- c("Married", "Divorced/Separated", "Never married")
> table(Marital, Happiness) # forms contingency table
Happiness
Marital Very happy Pretty happy Not too happy
Married 432 504 61
Divorced/Separated 92 282 103
Never married 124 409 135
> prop.table(table(Marital,Happiness), 1)
Happiness
Marital Very happy Pretty happy Not too happy
Married 0.43329990 0.50551655 0.06118355
Divorced/Separated 0.19287212 0.59119497 0.21593291
Never married 0.18562874 0.61227545 0.20209581

Married subjects are more likely to be very happy and less likely to be not too happy
than the other subjects.
1.23 > attach(Students)
> table(relig, abor)
abor
relig 0 1
0 1 14
1 4 25
2 1 6
3 7 2

The very religious (attending every week) are less likely to support legal abortion (only
2 of the 9 observations in support).
1.24 (a) Values are skewed right, with mean 153.9 and median 119.8 and a very high outlier
of 716 for the U.S.
(b) 0.90 between GDP and HDI.
(c) correlation = 0.674, predicted CO2 = 1.926 + 0.178(GDP), which increases dramat-
ically between 2.71 at the minimum GDP = 4.4 and 13.11 at the maximum.GDP
= 62.9.
1.25 > Races <- read.table("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat", header=TRUE)
> attach(Races)
> par(mfrow=c(2,2)) # a matrix of 2x2 plots in one graph
> boxplot(timeM); boxplot(timeW)
> hist(timeM); hist(timeW)
> summary(timeM)
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.10 47.63 67.17 84.88 113.91 439.15

£21.86

Get access to the full document:

Purchased by 26 students

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

tutorsection

4.1

(1118)

Reviews from verified buyers

Showing all 3 reviews

gb63046 · 1 review

10 months ago

swanpsswathi · 1 review

1 year ago

spamtrash202020 · 1 review

2 year ago

4.3

3 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

tutorsection Teachme2-tutor

View profile

Sold

7458

Member since

2 year

Number of followers

3245

Documents

5839

Last sold

2 hours ago

TutorSection

Best Educational Resources for Student. We are The Only Original and Complete Study Resources Provider in the Market. Majority of the Competitors in the Market are Selling Fake/Old/Wrong Edition files with cheap price attraction for customers.

4.1

1118 reviews

655

200

100

108

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tutorsection. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for £21.86. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47134 documents were sold in the last 30 days Founded in 2010, the go-to place to buy revision notes and other study material for 15 years now