Resumen

Summary FODS - Exam Questions

Puntuación

Vendido

Páginas

Subido en

03-06-2024

Escrito en

2023/2024

A collection of questions from the slides, previous exam questions and questions found online for 'Fundamentals of Data Science'. Used as preparation for oral exam.

Institución

Grado

Vista previa del contenido

Exam questions + Questions from slides, online …

Recap: Pre-processing
Q1: What is the importance of pre-processing?

Importance of Pre-processing:
- Data Quality Improvement: Pre-processing ensures the data is clean and free
from errors or inconsistencies (e.g., removing duplicates, handling missing
values).
- Data Consistency: Standardizes data to ensure consistency, making it suitable
for analysis.
- Feature Engineering: Transforms raw data into meaningful features that
enhance the predictive power of models.
- Algorithm Compatibility: Prepares data to meet the requirements of specific
algorithms, such as encoding categorical variables for models that only handle
numerical data.
- Enhanced Performance: Improves the efficiency and accuracy of models by
ensuring that the input data is appropriately formatted and scaled.

Q2: True or false? Explain. Pre-processing is a standardized procedure that is
independent of the model that will be used afterwards.

False.
Explanation: Pre-processing is not entirely standardized and often depends on
the specific requirements of the model to be used. Different models have
different requirements; for example:
- Decision Trees: May not require normalization or scaling of features.
- Linear Models and Neural Networks: Often require features to be
normalized or standardized.
- Algorithms Handling Categorical Data: Some models (e.g., tree-based
models) can handle categorical variables directly, while others (e.g.,
linear regression, SVM) require these variables to be encoded (e.g., one
hot encoding).

,Q3: True or false? Explain. One-hot encoding a categorical feature with
originally 3 separate categories results in 3 new columns.

False.
Explanation: One-hot encoding a categorical feature with 3 categories results
in 2 new columns. In one-hot encoding, n categories are transformed into n-1
new binary columns to avoid multicollinearity in linear models. Each new
column represents a distinct category, with a 1 indicating the presence of the
category and 0 indicating absence.

Q4: When one-hot encoding, what happens to the original categorical feature?
Why?

When one-hot encoding, the original categorical feature is replaced by the new
binary columns.

Reason:
- The original categorical feature is transformed into a set of binary (0 or 1)
columns, each representing a unique category. This transformation allows
algorithms that require numerical input to process the categorical data
effectively.
- Removing the original categorical feature helps prevent redundancy and
multicollinearity (when one predictor variable in a model can be linearly
predicted from the others with a substantial degree of accuracy), which can
negatively affect model performance and interpretability in linear models.

, Q5: Campaign Example:

Consider a company that wants to use data science to improve its targeting of
costly personally targeted advertisements. The company runs a test campaign,
targeting those who are most likely to respond according to their expert. As a
campaign progresses, more and more data arrive on people who make
purchases after having seen the ad versus those who do not. These data can be
used to build models to discriminate between those to whom we should and
should not advertise. Examples can be put aside to evaluate how accurate the
models are in predicting whether consumers will respond to the ad.

When the resulting models are put into production, targeting their full
customer base “in the wild,” the company is surprised that the models do not
work as well as they did in the lab. Why does it not work?

Scenario: A company uses data science to target ads, builds models based on
test campaign data, but finds the models underperform in production. Why
does it not work?
• Sampling Bias: Training data from the test campaign may not be
representative of the entire customer base.
Solution: Use a more representative sample for training.
• Data Drift: Customer behaviour changes over time, making the model
outdated.
Solution: Continuously update models with new data.
• Overfitting: Models fit too closely to the training data and fail to generalize.
Solution: Apply regularization, cross-validation, and simpler models.
• Feature Mismatch: Features available in the lab might differ from those in
production.
Solution: Ensure consistency in feature availability and quality.
• Environmental Differences: Differences in operational environments
between lab and production.
Solution: Test models in environments that mimic production setups.
• Evaluation Metrics: Metrics used in the lab may not align with business
objectives.
Solution: Align model evaluation metrics with business goals and test
accordingly.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Vrije Universiteit Brussel (VUB)
Estudio: Business engineering
Grado: Fundamentals of Data Science

Todos documentos para esta materia (3)

Información del documento

Subido en: 3 de junio de 2024
Número de páginas: 22
Escrito en: 2023/2024
Tipo: RESUMEN

Temas

exam questions
fods
business engineering
data science
vub
fundamentals of data science

3,75 €

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

jefdecuyper

5,0

(1)

Documento también disponible en un lote

Conoce al vendedor

jefdecuyper Vrije Universiteit Brussel

Ver perfil

Seguir

Vendido

Miembro desde

5 año

Número de seguidores

Documentos

Última venta

1 mes hace

5,0

1 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller jefdecuyper. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for 3,75 €. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now

Summary FODS - Exam Questions

Vista previa del contenido

Escuela, estudio y materia

Información del documento

Temas

Mas cursos para Vrije Universiteit Brussel (VUB) > Business engineering

Documento también disponible en un lote

Conoce al vendedor

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

¿No estás satisfecho? Elige otro documento

Paga como quieras, empieza a estudiar al instante

Preguntas frecuentes

What do I get when I buy this document?

100% de satisfacción garantizada: ¿Cómo funciona?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?