100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1st Edition by Alan Agresti, Maria Kateri

Rating
4.3
(3)
Sold
26
Pages
106
Grade
A+
Uploaded on
26-01-2023
Written in
2022/2023

Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1st Edition by Alan Agresti, Maria Kateri Solutions Manual for Foundations of Statistics for Data Scientists With R and Python 1e by Alan Agresti, Maria Kateri

Show more Read less
Institution
Foundations Of Statistics
Module
Foundations of Statistics











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Foundations of Statistics
Module
Foundations of Statistics

Document information

Uploaded on
January 26, 2023
Number of pages
106
Written in
2022/2023
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

Solutions Manual for Foundations of Statistics for Data
Scientists With R and Python, 1e by Alan Agresti, Maria Kater
(All Chapters)


Chapter 1
1.1 (a) (i) an individual voter, (ii) the 1882 voters in the exit poll, (iii) the 11.1 million
people who voted
(b) Statistic: Sample percentage of 52.5% who voted for Feinstein
Parameter: Population percentage of 54.2% who voted for Feinstein
1.2 (a) Use a command such as in R,
> Students <- read.table("
+ header=TRUE)

(b) (i) What proportion of the students in this sample responded yes for whether
abortion should be legal in the first three months; (ii) Same question but for some
population, such as all social science graduate students at the University of Florida
1.3 (a) Quantitative; (b) categorical; (c) categorical; (d) quantitative
1.4 (a) Religious affiffiliation (possible categories Christianity, Islam, Jewish,
Hinduism,
Buddhism, other, none)
(b) Body/mass index (BMI = (weight in kg)/(height in meters)2
(c) Number of children in family
(d) Height of a person
1.5 Ordinal, because categories have natural ordering
1.6 (a) College board score (e.g., SAT between 200 and 800)
(b) Time spent in college (measure by integer number of years)
1.7 In R, for students numbered 00001 to 52000,
> sample(1:52000, 10)
[1] 1687 18236 26783 35366 14244 11429 20973 31436 48476

1.8 (a) observational, (b) experiment (c) observational, (d) experiment
1.9 Median = 4, mode = 2, expect mean larger than median because distribution is skewed
right
1.10 (a)

3925




1

, 2 Solutions Manual: Foundations of Statistical Science for Data Scientists

> Carbon <- read.table("http://stat4ds.rwth-aachen.de/data/Carbon_West.dat",
+ header=TRUE)
> breaks <- seq(2.0, 18.0, by=2.0)
> freq <- table(cut(Carbon$CO2, breaks, right=FALSE))
> cbind(freq, freq/nrow(Carbon))
freq
[2,4) 4 0.11428571
[4,6) 15 0.42857143
[6,8) 7 0.20000000
[8,10) 6 0.17142857
[10,12) 0 0.00000000
[12,14) 0 0.00000000
[14,16) 2 0.05714286
[16,18) 1 0.02857143
> hist(Carbon$CO2)

(b) Mean = 6.72, median = 5.90, standard deviation = 3.36
mean(Carbon$CO2); median(Carbon$CO2); sd(Carbon$CO2)

1.11 Skewed to the right, because the mean is much larger than the median.
1.12 Number of times you went to a gym in the last week; median = 0 if more than half of
persons in the sample never went.
1.13 (a) 63,000 to 75,000; (b) 57,000 to 81,000; (c) 51,000 to 87,000. 100,000 would be unusual
because it is more than 5 standard deviations above the mean.
1.14 A quarter of the states had less that 6% without insurance, and a quarter had more than
9.5% without insurance. Half the states had between 6% and 9.5% without insurance,
encompassing an interquartile range of 3.5%.
1.15 Skewed to the right, because distances of median from LQ and minimum are less than
from UQ and maximum.
1.16 (a) The percentages in 2018 (with the default composite weight) for (0, 1, 2, 3, 4, 5,
6, ≥ 7) are (9.4, 24.8, 24.9, 14.8, 10.7, 5.3, 3.5, 6.7), somewhat skewed to the right.
(b) Mode = 2, median = 2
(c) Mean = 2.8, standard deviation = 2.6. The lowest possible observation is only
slightly more than a standard deviation below the mean, whereas in bell-shaped
distributions, observations can occur two or three standard deviations from the
mean in each direction.
1.17 > Murder <- read.table("http://stat4ds.rwth-aachen.de/data/Murder.dat", header=TRUE)
> Murder1 <- Murder[Murder$state!="DC",] # data frame without D.C.


(a) Mean = 4.87, standard deviation = 2.59
> mean(Murder1$murder); sd(Murder1$murder)

(b) Minimum = 1.0, LQ = 2.6, median = 4.85, UQ = 6.2, maximum = 12.4, somewhat
skewed right
> summary(Murder1$murder); boxplot(Murder1$murder)

(c) Repeat the analysis above for Murder1$murder. The DC is a large outlier, causing
the mean to increase (from 4.87 to 5.25) and the range to increase dramatically
(from 11.4 to 23.2).

1.18 (a) Histogram is skewed right.

, Solutions Manual: Foundations of Statistical Science for Data Scientists 3

> Income <- read.table("http://stat4ds.rwth-aachen.de/data/Income.dat",
+ header=TRUE); attach(Income)
> hist(income)

(b) Five number summary is min. = 16, lower quartile = 22, median = 30, upper
quartile = 465, max. = 120; also mean = 37.52 and standard deviation = 20.67.
> summary(income); sd(income)

(c) Density approximation with default bandwidth = 6.85 is skewed right. Increasing
the bandwidth (such as to 12) makes the curve smoother and bell-shaped, but still
skewed. Decreasing it (such as to 3) makes it much bumpier and probably a poorer
portrayal of a corresponding population distribution.
> plot(density(income)) # default bandwidth = 6.85
> plot(density(income, bw=12))

(d) > boxplot(income ~ race, xlab="Income", horizontal=TRUE)
> tapply(income, race, summary)
$B
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.00 19.50 24.00 27.75 31.00 66.00
$H
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.0 20.5 30.0 31.0 32.0 58.0
$W
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.00 24.00 37.00 42.48 50.00 120.00
> install.packages("tidyverse")
> library(tidyverse)
> Income %>% group_by(race) %>% summarize(n=n(),mean=mean(income),sd=sd(income))
race n mean sd
1 B 16 27.8 13.3
2 H 14 31 12.8
3 W 50 42.5 22.9

1.19 (a) Highly skewed right
> Houses <- read.table("http://stat4ds.rwth-aachen.de/data/Houses.dat",
+ header=TRUE); attach(Houses)
> PriceH <- hist(price); hist(price) # save histogram to use its breaks
> breaks <- PriceH$breaks # breaks used in histogram
> freq <- table(cut(Houses$price,breaks, right=FALSE))
> cbind(freq,freq/nrow(Houses)) # frequency table (not shown)

(b) y = 233.0, s = 151.9; 85%, not close to 68% because not bell-shaped but highly
skewed
> length(case[mean(price)-sd(price)<price & price<mean(price+sd(price)]) /
+ nrow(Houses)

(c) The boxplot shows many large observations that are outliers.
> boxplot(price)

(d) > tapply(Houses$price, Houses$new, summary)
$`0`
Min. 1st Qu. Median Mean 3rd Qu. Max.
31.5 135.0 190.8 207.9 240.0 880.5
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
158.8 256.9 427.5 436.4 519.7 866.2

New homes tend to have higher selling prices.

1.20 (a) Clear trend that price tends to increase as size increases.

, 4 Solutions Manual: Foundations of Statistical Science for Data Scientists

> plot(size, price)

(b) 0.834, strong positive association
> cor(size, price)

(c) Predicted price = −76.39 + 0.19(size), which is 113.5 thousand dollars at 1000
square feet and 683.2 thousand dollars at 4000 square feet.
> summary(lm(price ~ size)) # linear model: read the coefficients estimates
> pred <- function(x){-76.3894+0.1899*x}; pred(1000); pred(4000)

1.21 Correlation = 0.278 (positive but weak), predicted college GPA is 2.75 + 0.22(high
school GPA), which is 3.6 for high school GPA of 4.0.
1.22 > Happy <- read.table("http://stat4ds.rwth-aachen.de/data/Happy.dat", header=TRUE)
> Happiness <- factor(Happy$happiness); Marital <- factor(Happy$marital)
> levels(Happiness) <- c("Very happy", "Pretty happy", "Not too happy")
> levels(Marital) <- c("Married", "Divorced/Separated", "Never married")
> table(Marital, Happiness) # forms contingency table
Happiness
Marital Very happy Pretty happy Not too happy
Married 432 504 61
Divorced/Separated 92 282 103
Never married 124 409 135
> prop.table(table(Marital,Happiness), 1)
Happiness
Marital Very happy Pretty happy Not too happy
Married 0.43329990 0.50551655 0.06118355
Divorced/Separated 0.19287212 0.59119497 0.21593291
Never married 0.18562874 0.61227545 0.20209581

Married subjects are more likely to be very happy and less likely to be not too happy
than the other subjects.
1.23 > attach(Students)
> table(relig, abor)
abor
relig 0 1
0 1 14
1 4 25
2 1 6
3 7 2

The very religious (attending every week) are less likely to support legal abortion (only
2 of the 9 observations in support).
1.24 (a) Values are skewed right, with mean 153.9 and median 119.8 and a very high outlier
of 716 for the U.S.
(b) 0.90 between GDP and HDI.
(c) correlation = 0.674, predicted CO2 = 1.926 + 0.178(GDP), which increases dramat-
ically between 2.71 at the minimum GDP = 4.4 and 13.11 at the maximum.GDP
= 62.9.
1.25 > Races <- read.table("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat", header=TRUE)
> attach(Races)
> par(mfrow=c(2,2)) # a matrix of 2x2 plots in one graph
> boxplot(timeM); boxplot(timeW)
> hist(timeM); hist(timeW)
> summary(timeM)
Min. 1st Qu. Median Mean 3rd Qu. Max.
15.10 47.63 67.17 84.88 113.91 439.15
£21.86
Get access to the full document:
Purchased by 26 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Reviews from verified buyers

Showing all 3 reviews
10 months ago

1 year ago

2 year ago

4.3

3 reviews

5
2
4
0
3
1
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
tutorsection Teachme2-tutor
Follow You need to be logged in order to follow users or courses
Sold
7458
Member since
2 year
Number of followers
3245
Documents
5839
Last sold
2 hours ago
TutorSection

Best Educational Resources for Student. We are The Only Original and Complete Study Resources Provider in the Market. Majority of the Competitors in the Market are Selling Fake/Old/Wrong Edition files with cheap price attraction for customers.

4.1

1118 reviews

5
655
4
200
3
100
2
55
1
108

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions