IS424 Data Mining and Business Analytics (G2)
Project Title:
Exposing Fake News using Classification Models
, Table of Contents
Problem Statement 2
Motivation 2
Dataset 3
Exploratory Data Analysis 5
Tools and Resources 8
Python Packages 8
Word Clouds 8
Methodology 10
Data Cleaning and Transformation 10
Classification Models 10
Sentiment Analysis 10
Topic Modelling 12
Results and Discussion 13
Model Comparison 13
Bias of the Dataset 14
Conclusion and Future Work 14
References 15
Appendix 18
1
, Problem Statement
The issue of online fake news has attained an increasing eminence in the diffusion of fake
news stories online. Misleading or unreliable information in the form of videos, posts,
articles, and URLs are extensively disseminated through popular social media platforms such
as Facebook, Twitter, and WhatsApp. Most people believe that the information they receive
from various social media sites is reliable and true; that is, people are inherently truth biased.
During India’s national election in 2019, for example, more than 900,000 WhatsApp groups
were created to disseminate fake information regarding India’s ruling party. During the US
presidential election, most of the widely spread articles on social media were fake (Kaur et
al., 2019).
Motivation
The demand for accurate fake news detection has produced novel techniques that aim to
detect fake news by exploring its spread on social networks. However, we seek to detect fake
news at its inception, when it has just been published by news agencies and has not yet spread
on social media. Hence there is a need for a model to predict fake news by focusing on news
content rather than on news propagation patterns (Zhou et al., 2020).
The ability of one to distinguish between reliable and fabricated information received by a
variety of media channels has become extremely rare today. 82% of middle schoolers could
not distinguish between an ad labelled as “sponsored content” and a real news story on a
website (Donald, 2016).
Online fake news is widespread and has the potential for destructive consequences. The need
for media literacy is one that requires urgent attention today.
By comparing the performances of various classification models in detecting fake news and
conducting further analysis of news content, we intend to have a better understanding of fake
news and analyse which models perform best.
2
Project Title:
Exposing Fake News using Classification Models
, Table of Contents
Problem Statement 2
Motivation 2
Dataset 3
Exploratory Data Analysis 5
Tools and Resources 8
Python Packages 8
Word Clouds 8
Methodology 10
Data Cleaning and Transformation 10
Classification Models 10
Sentiment Analysis 10
Topic Modelling 12
Results and Discussion 13
Model Comparison 13
Bias of the Dataset 14
Conclusion and Future Work 14
References 15
Appendix 18
1
, Problem Statement
The issue of online fake news has attained an increasing eminence in the diffusion of fake
news stories online. Misleading or unreliable information in the form of videos, posts,
articles, and URLs are extensively disseminated through popular social media platforms such
as Facebook, Twitter, and WhatsApp. Most people believe that the information they receive
from various social media sites is reliable and true; that is, people are inherently truth biased.
During India’s national election in 2019, for example, more than 900,000 WhatsApp groups
were created to disseminate fake information regarding India’s ruling party. During the US
presidential election, most of the widely spread articles on social media were fake (Kaur et
al., 2019).
Motivation
The demand for accurate fake news detection has produced novel techniques that aim to
detect fake news by exploring its spread on social networks. However, we seek to detect fake
news at its inception, when it has just been published by news agencies and has not yet spread
on social media. Hence there is a need for a model to predict fake news by focusing on news
content rather than on news propagation patterns (Zhou et al., 2020).
The ability of one to distinguish between reliable and fabricated information received by a
variety of media channels has become extremely rare today. 82% of middle schoolers could
not distinguish between an ad labelled as “sponsored content” and a real news story on a
website (Donald, 2016).
Online fake news is widespread and has the potential for destructive consequences. The need
for media literacy is one that requires urgent attention today.
By comparing the performances of various classification models in detecting fake news and
conducting further analysis of news content, we intend to have a better understanding of fake
news and analyse which models perform best.
2