Samenvatting

Summary FODS - Exam Questions

Beoordeling

Verkocht

Pagina's

Geüpload op

03-06-2024

Geschreven in

2023/2024

A collection of questions from the slides, previous exam questions and questions found online for 'Fundamentals of Data Science'. Used as preparation for oral exam.

Instelling

Vak

Voorbeeld van de inhoud

Exam questions + Questions from slides, online …

Recap: Pre-processing
Q1: What is the importance of pre-processing?

Importance of Pre-processing:
- Data Quality Improvement: Pre-processing ensures the data is clean and free
from errors or inconsistencies (e.g., removing duplicates, handling missing
values).
- Data Consistency: Standardizes data to ensure consistency, making it suitable
for analysis.
- Feature Engineering: Transforms raw data into meaningful features that
enhance the predictive power of models.
- Algorithm Compatibility: Prepares data to meet the requirements of specific
algorithms, such as encoding categorical variables for models that only handle
numerical data.
- Enhanced Performance: Improves the efficiency and accuracy of models by
ensuring that the input data is appropriately formatted and scaled.

Q2: True or false? Explain. Pre-processing is a standardized procedure that is
independent of the model that will be used afterwards.

False.
Explanation: Pre-processing is not entirely standardized and often depends on
the specific requirements of the model to be used. Different models have
different requirements; for example:
- Decision Trees: May not require normalization or scaling of features.
- Linear Models and Neural Networks: Often require features to be
normalized or standardized.
- Algorithms Handling Categorical Data: Some models (e.g., tree-based
models) can handle categorical variables directly, while others (e.g.,
linear regression, SVM) require these variables to be encoded (e.g., one
hot encoding).

,Q3: True or false? Explain. One-hot encoding a categorical feature with
originally 3 separate categories results in 3 new columns.

False.
Explanation: One-hot encoding a categorical feature with 3 categories results
in 2 new columns. In one-hot encoding, n categories are transformed into n-1
new binary columns to avoid multicollinearity in linear models. Each new
column represents a distinct category, with a 1 indicating the presence of the
category and 0 indicating absence.

Q4: When one-hot encoding, what happens to the original categorical feature?
Why?

When one-hot encoding, the original categorical feature is replaced by the new
binary columns.

Reason:
- The original categorical feature is transformed into a set of binary (0 or 1)
columns, each representing a unique category. This transformation allows
algorithms that require numerical input to process the categorical data
effectively.
- Removing the original categorical feature helps prevent redundancy and
multicollinearity (when one predictor variable in a model can be linearly
predicted from the others with a substantial degree of accuracy), which can
negatively affect model performance and interpretability in linear models.

, Q5: Campaign Example:

Consider a company that wants to use data science to improve its targeting of
costly personally targeted advertisements. The company runs a test campaign,
targeting those who are most likely to respond according to their expert. As a
campaign progresses, more and more data arrive on people who make
purchases after having seen the ad versus those who do not. These data can be
used to build models to discriminate between those to whom we should and
should not advertise. Examples can be put aside to evaluate how accurate the
models are in predicting whether consumers will respond to the ad.

When the resulting models are put into production, targeting their full
customer base “in the wild,” the company is surprised that the models do not
work as well as they did in the lab. Why does it not work?

Scenario: A company uses data science to target ads, builds models based on
test campaign data, but finds the models underperform in production. Why
does it not work?
• Sampling Bias: Training data from the test campaign may not be
representative of the entire customer base.
Solution: Use a more representative sample for training.
• Data Drift: Customer behaviour changes over time, making the model
outdated.
Solution: Continuously update models with new data.
• Overfitting: Models fit too closely to the training data and fail to generalize.
Solution: Apply regularization, cross-validation, and simpler models.
• Feature Mismatch: Features available in the lab might differ from those in
production.
Solution: Ensure consistency in feature availability and quality.
• Environmental Differences: Differences in operational environments
between lab and production.
Solution: Test models in environments that mimic production setups.
• Evaluation Metrics: Metrics used in the lab may not align with business
objectives.
Solution: Align model evaluation metrics with business goals and test
accordingly.

Meld schending auteursrecht

Geschreven voor

Instelling: Vrije Universiteit Brussel (VUB)
Studie: Business engineering
Vak: Fundamentals of Data Science

Alle documenten voor dit vak (3)

Documentinformatie

Geüpload op: 3 juni 2024
Aantal pagina's: 22
Geschreven in: 2023/2024
Type: SAMENVATTING

Onderwerpen

exam questions
fods
business engineering
data science
vub
fundamentals of data science

€3,75

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

jefdecuyper

5,0

(1)

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

jefdecuyper Vrije Universiteit Brussel

Bekijk profiel

Volgen

Verkocht

Lid sinds

5 jaar

Aantal volgers

Documenten

Laatst verkocht

1 maand geleden

5,0

1 beoordelingen

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jefdecuyper. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,75. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 51655 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary FODS - Exam Questions

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen Vrije Universiteit Brussel (VUB) > Business engineering

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?