100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary FODS - Exam Questions

Rating
-
Sold
2
Pages
22
Uploaded on
03-06-2024
Written in
2023/2024

A collection of questions from the slides, previous exam questions and questions found online for 'Fundamentals of Data Science'. Used as preparation for oral exam.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 3, 2024
Number of pages
22
Written in
2023/2024
Type
Summary

Subjects

Content preview

Exam questions + Questions from slides, online …

Recap: Pre-processing
Q1: What is the importance of pre-processing?

Importance of Pre-processing:
- Data Quality Improvement: Pre-processing ensures the data is clean and free
from errors or inconsistencies (e.g., removing duplicates, handling missing
values).
- Data Consistency: Standardizes data to ensure consistency, making it suitable
for analysis.
- Feature Engineering: Transforms raw data into meaningful features that
enhance the predictive power of models.
- Algorithm Compatibility: Prepares data to meet the requirements of specific
algorithms, such as encoding categorical variables for models that only handle
numerical data.
- Enhanced Performance: Improves the efficiency and accuracy of models by
ensuring that the input data is appropriately formatted and scaled.


Q2: True or false? Explain. Pre-processing is a standardized procedure that is
independent of the model that will be used afterwards.

False.
Explanation: Pre-processing is not entirely standardized and often depends on
the specific requirements of the model to be used. Different models have
different requirements; for example:
- Decision Trees: May not require normalization or scaling of features.
- Linear Models and Neural Networks: Often require features to be
normalized or standardized.
- Algorithms Handling Categorical Data: Some models (e.g., tree-based
models) can handle categorical variables directly, while others (e.g.,
linear regression, SVM) require these variables to be encoded (e.g., one
hot encoding).

,Q3: True or false? Explain. One-hot encoding a categorical feature with
originally 3 separate categories results in 3 new columns.

False.
Explanation: One-hot encoding a categorical feature with 3 categories results
in 2 new columns. In one-hot encoding, n categories are transformed into n-1
new binary columns to avoid multicollinearity in linear models. Each new
column represents a distinct category, with a 1 indicating the presence of the
category and 0 indicating absence.



Q4: When one-hot encoding, what happens to the original categorical feature?
Why?

When one-hot encoding, the original categorical feature is replaced by the new
binary columns.

Reason:
- The original categorical feature is transformed into a set of binary (0 or 1)
columns, each representing a unique category. This transformation allows
algorithms that require numerical input to process the categorical data
effectively.
- Removing the original categorical feature helps prevent redundancy and
multicollinearity (when one predictor variable in a model can be linearly
predicted from the others with a substantial degree of accuracy), which can
negatively affect model performance and interpretability in linear models.

, Q5: Campaign Example:

Consider a company that wants to use data science to improve its targeting of
costly personally targeted advertisements. The company runs a test campaign,
targeting those who are most likely to respond according to their expert. As a
campaign progresses, more and more data arrive on people who make
purchases after having seen the ad versus those who do not. These data can be
used to build models to discriminate between those to whom we should and
should not advertise. Examples can be put aside to evaluate how accurate the
models are in predicting whether consumers will respond to the ad.

When the resulting models are put into production, targeting their full
customer base “in the wild,” the company is surprised that the models do not
work as well as they did in the lab. Why does it not work?

Scenario: A company uses data science to target ads, builds models based on
test campaign data, but finds the models underperform in production. Why
does it not work?
• Sampling Bias: Training data from the test campaign may not be
representative of the entire customer base.
Solution: Use a more representative sample for training.
• Data Drift: Customer behaviour changes over time, making the model
outdated.
Solution: Continuously update models with new data.
• Overfitting: Models fit too closely to the training data and fail to generalize.
Solution: Apply regularization, cross-validation, and simpler models.
• Feature Mismatch: Features available in the lab might differ from those in
production.
Solution: Ensure consistency in feature availability and quality.
• Environmental Differences: Differences in operational environments
between lab and production.
Solution: Test models in environments that mimic production setups.
• Evaluation Metrics: Metrics used in the lab may not align with business
objectives.
Solution: Align model evaluation metrics with business goals and test
accordingly.
$2.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
jefdecuyper Vrije Universiteit Brussel
Follow You need to be logged in order to follow users or courses
Sold
45
Member since
5 year
Number of followers
17
Documents
10
Last sold
1 week ago

5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions