100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

DATA QUALITY AND TRANSFORMATION

Rating
-
Sold
-
Pages
61
Uploaded on
08-11-2025
Written in
2025/2026

Short notes on ensuring data accuracy and consistency, and transforming data into suitable formats for analysis and modeling.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Course

Document information

Uploaded on
November 8, 2025
Number of pages
61
Written in
2025/2026
Type
Class notes
Professor(s)
Abirami
Contains
All classes

Subjects

Content preview

SCSB1231 DATA AND INFORMATION SCIENCE




UNIT 3 DATA QUALITY AND TRANSFORMATION



Data Imputation – Data Transformation (minmax, log transform, z-score transform etc.,). –
Binning, Classing and Standardization. – Outlier/Noise & Anomalies.

,Data Imputation:

Data imputation is the process of replacing missing or incomplete data points in a
dataset with estimated or substituted values. These estimated values are typically derived
from the available data, statistical methods, or machine learning algorithms.

Data imputation fills missing values in datasets, preserving data completeness and quality. It
ensures practical analysis, model performance, and visualizations by preventing data loss and
maintaining sample size. Imputation reduces bias, maintains data relationships, and
facilitates various statistical techniques, enabling better decision-making and insights from
incomplete data.




Importance of Data Imputation in Analysis


Data imputation is crucial in data analysis as it addresses missing or incomplete data,
ensuring the integrity of analyses. Imputed data enables the use of various statistical methods
and machine learning algorithms, improving model accuracy and predictive power.
Without imputation, valuable information may be lost, leading to biased or less reliable
results. It helps maintain sample size, reduces bias, and enhances the overall quality and
reliability of data-driven insights.

Types of Missing Data
Below are the different types as follows:

1. Missing Completely at Random (MCAR)
In this type, the probability of data being missing is unrelated to both observed and
unobserved data. In other words, missing is purely random and occurs by chance. MCAR
implies that the missing data is not systematically related to any variables in the dataset. For
example, a sensor failure that results in sporadic missing temperature readings can be
considered MCAR.

2. Missing at Random (MAR)

,Missing data is considered MAR when the probability of data being missing is related to
observed data but not directly to unobserved data. In other words, missingness is dependent
on some observed variables. For instance, in a medical study, men might be less likely to
report certain health conditions than women, creating missing data related to the gender
variable. MAR is a more general and common type of missing data than MCAR.

3. Missing Not at Random (MNAR)
MNAR occurs when the probability of data being missing is related to unobserved data or the
missing values themselves. This type of missing data can introduce bias into analyses because
the missingness is related to the missing values. An example of MNAR could be patients with
severe symptoms avoiding follow-up appointments, resulting in missing data related to the
severity of their condition.



Data Imputation Techniques:


There are several methods and techniques for data imputation, each with its strengths and
suitability depending on the nature of the data and the analysis goals. Let’s discuss some
commonly used data imputation techniques:

1. Mean/Median/Mode Imputation

• Mean Imputation: Replace missing values in numerical variables with the average of
the observed values for that variable.
• Median Imputation: Replace missing values in numerical variables with the middle
value of the observed values for that variable.
• Mode Imputation: Replace missing values in categorical variables with the most
frequent category among the observed values for that variable.




Advantages

, • Simplicity
• Preserves Data Structure
• Applicability

Disadvantages and Considerations

• Ignores Data Relationships
• May Distort Data
• Inappropriate for Missing Data Patterns



When to Use:

• Use mean imputation for numerical variables when missing data is missing
completely at random (MCAR) and the variable has a relatively normal distribution.
• Use median imputation when the data is skewed or contains outliers, as it is less
sensitive to extreme values.
• Use mode imputation for categorical variables when you have missing values that can
be reasonably replaced with the most frequent category.

2. Forward Fill and Backward Fill

• Forward Fill: In forward fill imputation, missing values are replaced with the most
recent observed value in the sequence. It propagates the last known value forward
until a new observation is encountered.
• Backward Fill: In backward fill imputation, missing values are replaced with the
next observed value in the sequence. It propagates the next known value backward
until a new observation is encountered.




For forward fill, replace each missing value with the most recent observed value that
precedes it in time. For backward fill, replace each missing value with the next
observed value that follows it in time.
$3.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
lsharan

Get to know the seller

Seller avatar
lsharan Sathyabama institute of science and technology
Follow You need to be logged in order to follow users or courses
Sold
New on Stuvia
Member since
1 month
Number of followers
0
Documents
15
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions