100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary FODS - Exam Questions

Beoordeling
-
Verkocht
2
Pagina's
22
Geüpload op
03-06-2024
Geschreven in
2023/2024

A collection of questions from the slides, previous exam questions and questions found online for 'Fundamentals of Data Science'. Used as preparation for oral exam.

Instelling
Vak










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
3 juni 2024
Aantal pagina's
22
Geschreven in
2023/2024
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Exam questions + Questions from slides, online …

Recap: Pre-processing
Q1: What is the importance of pre-processing?

Importance of Pre-processing:
- Data Quality Improvement: Pre-processing ensures the data is clean and free
from errors or inconsistencies (e.g., removing duplicates, handling missing
values).
- Data Consistency: Standardizes data to ensure consistency, making it suitable
for analysis.
- Feature Engineering: Transforms raw data into meaningful features that
enhance the predictive power of models.
- Algorithm Compatibility: Prepares data to meet the requirements of specific
algorithms, such as encoding categorical variables for models that only handle
numerical data.
- Enhanced Performance: Improves the efficiency and accuracy of models by
ensuring that the input data is appropriately formatted and scaled.


Q2: True or false? Explain. Pre-processing is a standardized procedure that is
independent of the model that will be used afterwards.

False.
Explanation: Pre-processing is not entirely standardized and often depends on
the specific requirements of the model to be used. Different models have
different requirements; for example:
- Decision Trees: May not require normalization or scaling of features.
- Linear Models and Neural Networks: Often require features to be
normalized or standardized.
- Algorithms Handling Categorical Data: Some models (e.g., tree-based
models) can handle categorical variables directly, while others (e.g.,
linear regression, SVM) require these variables to be encoded (e.g., one
hot encoding).

,Q3: True or false? Explain. One-hot encoding a categorical feature with
originally 3 separate categories results in 3 new columns.

False.
Explanation: One-hot encoding a categorical feature with 3 categories results
in 2 new columns. In one-hot encoding, n categories are transformed into n-1
new binary columns to avoid multicollinearity in linear models. Each new
column represents a distinct category, with a 1 indicating the presence of the
category and 0 indicating absence.



Q4: When one-hot encoding, what happens to the original categorical feature?
Why?

When one-hot encoding, the original categorical feature is replaced by the new
binary columns.

Reason:
- The original categorical feature is transformed into a set of binary (0 or 1)
columns, each representing a unique category. This transformation allows
algorithms that require numerical input to process the categorical data
effectively.
- Removing the original categorical feature helps prevent redundancy and
multicollinearity (when one predictor variable in a model can be linearly
predicted from the others with a substantial degree of accuracy), which can
negatively affect model performance and interpretability in linear models.

, Q5: Campaign Example:

Consider a company that wants to use data science to improve its targeting of
costly personally targeted advertisements. The company runs a test campaign,
targeting those who are most likely to respond according to their expert. As a
campaign progresses, more and more data arrive on people who make
purchases after having seen the ad versus those who do not. These data can be
used to build models to discriminate between those to whom we should and
should not advertise. Examples can be put aside to evaluate how accurate the
models are in predicting whether consumers will respond to the ad.

When the resulting models are put into production, targeting their full
customer base “in the wild,” the company is surprised that the models do not
work as well as they did in the lab. Why does it not work?

Scenario: A company uses data science to target ads, builds models based on
test campaign data, but finds the models underperform in production. Why
does it not work?
• Sampling Bias: Training data from the test campaign may not be
representative of the entire customer base.
Solution: Use a more representative sample for training.
• Data Drift: Customer behaviour changes over time, making the model
outdated.
Solution: Continuously update models with new data.
• Overfitting: Models fit too closely to the training data and fail to generalize.
Solution: Apply regularization, cross-validation, and simpler models.
• Feature Mismatch: Features available in the lab might differ from those in
production.
Solution: Ensure consistency in feature availability and quality.
• Environmental Differences: Differences in operational environments
between lab and production.
Solution: Test models in environments that mimic production setups.
• Evaluation Metrics: Metrics used in the lab may not align with business
objectives.
Solution: Align model evaluation metrics with business goals and test
accordingly.
€2,49
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
jefdecuyper Vrije Universiteit Brussel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
45
Lid sinds
5 jaar
Aantal volgers
17
Documenten
10
Laatst verkocht
1 week geleden

5,0

1 beoordelingen

5
1
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen