Examen

Data Science Python Part 2 2025 40 QA Machine Learning Sklearn Statistics Verified

Puntuación

Vendido

Páginas

Grado

A+

Subido en

23-03-2026

Escrito en

2025/2026

Data Science with Python Part 2 2025 — 40 Q&A Machine Learning and Statistics Verified

Institución

Grado

Vista previa del contenido

Data Science with Python Part 2 2025 — 40 Q&A Machine
Learning and Statistics Verified

Series:
CrashCourses Professional Study Series

Author:
Dr Z. Moomba, MBChB, MRCPsych | BethelWellness Ltd

Exam Target:
Data Science Python Part 2

Year:
2025/2026

Format:
40 Questions with Verified Answers and Rationales

>
Author's Note:
This document is an original work produced for the CrashCourses Professional Study Series.
Clinical questions and professional scenarios were composed by Dr Z. Moomba based on current
exam objectives, published guidelines, and evidence-based sources (2024–2025). All patient
names, ages, and case details are fictional. Any resemblance to existing published Q&A banks is
coincidental. For personal study use only — not for reproduction or redistribution.

SECTION A — FOUNDATIONS

1. A data science team at a major metropolitan hospital is analyzing the wait times in the emergency
department (ED). The wait times are normally distributed with a mean of 45 minutes and a standard
deviation of 10 minutes. A new triage protocol is implemented, and the team wants to calculate the
95% confidence interval for a sample of 100 patients whose average wait time was 42 minutes.
Which statistical principle must they apply to determine if the new protocol significantly changed
wait times?
A) The standard error of the mean must be calculated as 10 divided by the square root of 100.
B) The margin of error will be exactly 1.96 multiplied by 10.
C) A Chi-square goodness-of-fit test should be used to establish the interval.

,D) The confidence interval is purely determined by the median wait time rather than the sample
mean.

Answer: A

Rationale:
a) To calculate a confidence interval for a sample mean, the standard error (SE) is required, which
is the population standard deviation divided by the square root of the sample size ($\sigma /
\sqrt{n}$).
b) The key discriminating feature is recognizing that the standard error accounts for sample size,
distinguishing it from population standard deviation. Here, $10 / \sqrt{100} = 1.0$.
c) Option B fails because the margin of error multiplies the critical z-score (1.96) by the standard
error (1.0), not the standard deviation (10).
d) In clinical statistics, narrowing a confidence interval requires increasing the sample size, as SE
is inversely proportional to the square root of $n$. [NHS Health Data Analytics Guidelines 2024]

2. A machine learning algorithm is developed to screen for early-stage pancreatic cancer based on
blood biomarkers. The model incorrectly classifies a healthy patient as having cancer. In the context
of hypothesis testing where the null hypothesis ($H_0$) is that the patient is healthy, what type of
statistical error has occurred?
A) Type II error (False Negative)
B) Type I error (False Positive)
C) Power error
D) Standard error of measurement

Answer: B

Rationale:
a) A Type I error occurs when the null hypothesis is true, but is incorrectly rejected (a false
positive).
b) The scenario explicitly describes classifying a healthy person ($H_0$ true) as diseased, which
is the defining characteristic of a false positive.
c) Option A fails because a Type II error would involve failing to reject $H_0$ when it is false (i.e.,
telling a patient with cancer that they are healthy).
d) In medical screening, the threshold for Type I vs. Type II errors is a critical clinical decision; high
sensitivity reduces Type II errors but increases Type I errors. [Statistical Foundations for Health
Data 2025]

, 3. A clinical researcher wants to compare the mean reduction in systolic blood pressure across three
different antihypertensive medications (Drug A, Drug B, and Drug C). The dependent variable is
continuous, and the independent variable is categorical with three levels. Which statistical test is
most appropriate?
A) Independent two-sample t-test
B) Chi-square test of independence
C) One-way ANOVA
D) Pearson correlation coefficient

Answer: C

Rationale:
a) One-way Analysis of Variance (ANOVA) is used to compare means across three or more
independent groups.
b) The presence of three distinct drug groups (categorical independent variable) and a continuous
outcome (blood pressure reduction) dictates the use of ANOVA.
c) Option A is incorrect because a t-test is limited to comparing exactly two groups.
d) If the ANOVA yields a significant p-value, post-hoc testing (like Tukey's HSD) is required to
determine exactly which drugs differ from one another. [Scikit-learn Statistics Primer 2025]

4. A health informatics team is building an automated diagnostic pipeline. They write the following
scikit-learn code to split their dataset containing 10,000 patient records, where only 2% of patients
have the target rare disease:
`X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)`

What is the critical flaw in this implementation for a highly imbalanced medical dataset?
A) `test_size=0.2` is too small for clinical data; it must be at least 0.3.
B) The code fails to include the `stratify=y` parameter, risking a test set with zero positive cases.
C) `random_state=42` introduces deterministic bias and should be set to `None`.
D) `train_test_split` cannot process categorical string labels in `y`.

Answer: B

Rationale:

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Data Science Python Part 2
Estudio: Data Science Python Part 2
Grado: Data Science Python Part 2

Todos documentos para esta materia (1)

Información del documento

Subido en: 23 de marzo de 2026
Número de páginas: 26
Escrito en: 2025/2026
Tipo: Examen
Contiene: Preguntas y respuestas

Temas

data science
machine learning
scikit learn
statistics
2025

$15.98

Accede al documento completo:

Escrito por estudiantes que aprobaron

Inmediatamente disponible después del pago

Leer en línea o como PDF

Conoce al vendedor

CrashCourses

4.7

(9)

Conoce al vendedor

CrashCourses (At Home Study)

Ver perfil

Seguir

Vendido

145

Miembro desde

5 año

Número de seguidores

Documentos

664

Última venta

3 meses hace

University of the People MBA solutions

University of the People - 100% Correct Solutions

4.7

9 reseñas

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller CrashCourses. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $15.98. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now