Summary

Summary Big data Description assignment

Rating

Sold

Pages

Uploaded on

14-08-2023

Written in

2023/2024

Summary of 20 pages for the course CTEC2921_2223_502 Big Data and Machine Learning at DMU (N/a)

Institution

Module

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: De Montfort University (DMU)
Study: Unknown
Module: CTEC2921_2223_502 Big Data and Machine Learning (CTEC2921)

All documents for this subject (1)

Document information

Uploaded on: August 14, 2023
Number of pages: 20
Written in: 2023/2024
Type: Summary

Subjects

big data

Content preview

Table of Contents
INTRODUCTION ...................................................................................................................... 2
Details of the approach: ........................................................................................................ 2
Preparing Data ........................................................................................................................ 2
Data Visualiza+on: ......................................................................................................................................... 5
Selec+on:..................................................................................................................................................... 11
Random Forest: ........................................................................................................................................... 15
Results Analysis: ........................................................................................................................19
Discussion and conclusions:.......................................................................................................19

,INTRODUCTION
The Titanic disaster, which resulted in the loss of many lives, is a widely recognized historical
event. This undertaking aims to utilize classification techniques with the Titanic dataset to
estimate the passengers' survival rate. To achieve this goal, we will divide the task into smaller
tasks, starting with data pre-processing, cleaning, normalization, visualization, and feature
extraction/selection. Finally, we will utilize classification models to predict the survival rate.

Details of the approach:
Our project will use the Python programming language and established libraries such as
pandas, NumPy, Matplotlib, and scikit-learn. Initially, we will import the Titanic dataset and
perform data pre-processing and cleaning to eliminate any duplicates or missing data. The
data will be normalized, which will enhance the performance of the classification models by
ensuring that each feature has a comparable scale. Next, we will conduct data analysis and
visualization to gain insight into the data and identify any patterns that might assist in predicting
the survival rate. Following that, we will employ feature extraction and selection to identify the
most relevant characteristics that are strongly correlated with the target variable.

Preparing Data
df = pd.read_csv("/kaggle/input/titanicdata/TitanicData.csv")

The following code reads a CSV file from the directory
("/kaggle/input/titanicdata/TitanicData.csv") utilizing the pandas library and stores its
contents in a pandas DataFrame object named df. The CSV file is likely to contain the
training set of the Titanic dataset, which includes information on the passengers aboard the
Titanic.

df.describe()

The code snippet commands the pandas DataFrame object, df, to execute the describe()
function. This method generates a summary of statistical measures for the numerical
columns present in the DataFrame, including count, mean, standard deviation, minimum and
maximum values. The output of this function is a table that presents these summary
statistics for each numerical column in the DataFrame.

This code reads and summarizes the Titanic dataset, providing sta5s5cal informa5on on the
numerical characteris5cs of the dataset's training set.

2

, df.isnull()

This code generates a boolean DataFrame with the same shape as df, where a True value
indicates that the cell is null (NaN), and a False value indicates that the cell contains a value.
As a result, the resulting DataFrame contains True values for missing values and False
values for all other cells.

df.isnull().sum()

This code tallies the number of null values in each column of a pandas DataFrame. This
information can be useful in identifying which columns have null values and how many null
values there are in each column.

3

£5.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

hasibabid29

Get to know the seller

hasibabid29 De Montfort University

View profile

Sold

Member since

2 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller hasibabid29. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for £5.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 52191 documents were sold in the last 30 days Founded in 2010, the go-to place to buy revision notes and other study material for 15 years now

Summary Big data Description assignment

Written for

Document information

Subjects

Content preview

More courses for De Montfort University (DMU) >

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning straight away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?