Samenvatting

Summary Big data Description assignment

Beoordeling

Verkocht

Pagina's

Geüpload op

14-08-2023

Geschreven in

2023/2024

Summary of 20 pages for the course CTEC2921_2223_502 Big Data and Machine Learning at DMU (N/a)

Instelling

Vak

Voorbeeld van de inhoud

Table of Contents
INTRODUCTION ...................................................................................................................... 2
Details of the approach: ........................................................................................................ 2
Preparing Data ........................................................................................................................ 2
Data Visualiza+on: ......................................................................................................................................... 5
Selec+on:..................................................................................................................................................... 11
Random Forest: ........................................................................................................................................... 15
Results Analysis: ........................................................................................................................19
Discussion and conclusions:.......................................................................................................19

,INTRODUCTION
The Titanic disaster, which resulted in the loss of many lives, is a widely recognized historical
event. This undertaking aims to utilize classification techniques with the Titanic dataset to
estimate the passengers' survival rate. To achieve this goal, we will divide the task into smaller
tasks, starting with data pre-processing, cleaning, normalization, visualization, and feature
extraction/selection. Finally, we will utilize classification models to predict the survival rate.

Details of the approach:
Our project will use the Python programming language and established libraries such as
pandas, NumPy, Matplotlib, and scikit-learn. Initially, we will import the Titanic dataset and
perform data pre-processing and cleaning to eliminate any duplicates or missing data. The
data will be normalized, which will enhance the performance of the classification models by
ensuring that each feature has a comparable scale. Next, we will conduct data analysis and
visualization to gain insight into the data and identify any patterns that might assist in predicting
the survival rate. Following that, we will employ feature extraction and selection to identify the
most relevant characteristics that are strongly correlated with the target variable.

Preparing Data
df = pd.read_csv("/kaggle/input/titanicdata/TitanicData.csv")

The following code reads a CSV file from the directory
("/kaggle/input/titanicdata/TitanicData.csv") utilizing the pandas library and stores its
contents in a pandas DataFrame object named df. The CSV file is likely to contain the
training set of the Titanic dataset, which includes information on the passengers aboard the
Titanic.

df.describe()

The code snippet commands the pandas DataFrame object, df, to execute the describe()
function. This method generates a summary of statistical measures for the numerical
columns present in the DataFrame, including count, mean, standard deviation, minimum and
maximum values. The output of this function is a table that presents these summary
statistics for each numerical column in the DataFrame.

This code reads and summarizes the Titanic dataset, providing sta5s5cal informa5on on the
numerical characteris5cs of the dataset's training set.

2

, df.isnull()

This code generates a boolean DataFrame with the same shape as df, where a True value
indicates that the cell is null (NaN), and a False value indicates that the cell contains a value.
As a result, the resulting DataFrame contains True values for missing values and False
values for all other cells.

df.isnull().sum()

This code tallies the number of null values in each column of a pandas DataFrame. This
information can be useful in identifying which columns have null values and how many null
values there are in each column.

3

Meld schending auteursrecht

Geschreven voor

Instelling: De Montfort University (DMU)
Studie: Onbekend
Vak: CTEC2921_2223_502 Big Data and Machine Learning (CTEC2921)

Alle documenten voor dit vak (1)

Documentinformatie

Geüpload op: 14 augustus 2023
Aantal pagina's: 20
Geschreven in: 2023/2024
Type: SAMENVATTING

Onderwerpen

big data

€7,07

Krijg toegang tot het volledige document:

Geschreven door studenten die geslaagd zijn

Direct beschikbaar na je betaling

Online lezen of als PDF

Maak kennis met de verkoper

hasibabid29

Maak kennis met de verkoper

hasibabid29 De Montfort University

Bekijk profiel

Volgen

Verkocht

Lid sinds

2 jaar

Aantal volgers

Documenten

Laatst verkocht

0,0

0 beoordelingen

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper hasibabid29. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,07. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50161 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen

Summary Big data Description assignment

Voorbeeld van de inhoud

Geschreven voor

Documentinformatie

Onderwerpen

Meer vakken binnen De Montfort University (DMU) >

Maak kennis met de verkoper

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?