Samenvatting

Summary Python Data Operations 1: Data frames

Beoordeling

Verkocht

Pagina's

117

Geüpload op

09-12-2022

Geschreven in

2022/2023

Notes of Pandas data operations covered in the Principles of Programming course, part of the Computer Science and AI bachelor degree. The notes are initially written in Jupyter Notebook. They contain practical examples of data operations in python and images to explain the structures and processes. This first notebook contains: - Introduction to Pandas and dataframes - The structure of a dataframe - Selecting values - Select cells - Select rows - Select columns - Slicing - Inserting/Updating Elements - Insert/update values - Insert/update rows -Insert/update columns - Renaming rows/columns

Meer zien Lees minder

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Gekoppeld boek

John M. Zelle Python Programming

Uitgave:2004
ISBN:9781887902991
Druk:Onbekend

Geschreven voor

Instelling: IE University
Studie: Computer Science
Vak: Programming

Alle documenten voor dit vak (4)

Documentinformatie

Heel boek samengevat?: Nee
Wat is er van het boek samengevat?: Data wrangling
Geüpload op: 9 december 2022
Aantal pagina's: 117
Geschreven in: 2022/2023
Type: Samenvatting

Onderwerpen

dataframe
data frame
data analytics
data science
business analytics
python
pandas
principles of programming
programming
reference
cheat sheet
overview
test
select comman
study guide
summary
notes
exam

Voorbeeld van de inhoud

2022-05-15 22:28 S1 _solved

In [1]:
import pandas as pd
import numpy as np

Pandas -
What is a Dataframe?
DataFrame is a data type provided by the library pandas
In python is the most relevant data type to work with tables and data
List are also important to work with data , as obviously you already know, but we are
going to focus in df (dataframes)
Imagine dataframe as a table created by rows and colummns:
Each row and column is an object type pandas.Series
An object type pandas.Series is a vector (list). In each element contains a label
Create a DataFrame
The main ways to do it:
Using data manually
Lists of lists
Nested dictionaries
Reading the information from .csv file
Using the function pd.read_csv() path of the file is mandatory.
In [1]:
data_lst = [
['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']
]

data_lst

[['A3', 0, -1, 0, 'si'],
Out[1]:
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']]

In [12]:
col0 = []
for row in data_lst:
col0.append(row[0])

col0

['A3', 'B1', 'B3', 'B3', 'A1', 'A3', 'C2']
Out[12]:
file:///Users/bestricemossberg/Downloads/S1 _solved.html 1/14

,2022-05-15 22:28 S1 _solved

Test
In [7]:
test_df = pd.DataFrame(
data_lst
)
test_df

Out[7]: 0 1 2 3 4
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no
3 B3 5 1.0 0.0 si
4 A1 4 0.0 NaN None
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

In [10]:
test_df = pd.DataFrame(
data_lst,
columns=['A', 'B', 'C', 'D', 'E'],
index=[f'row{i}' for i in range(1, 8)]
)
test_df

Out[10]: A B C D E
row1 A3 0 -1.0 0.0 si
row2 B1 1 NaN 0.0 no
row3 B3 4 NaN 0.0 no
row4 B3 5 1.0 0.0 si
row5 A1 4 0.0 NaN None
row6 A3 1 2.0 1.0 si
row7 C2 4 1.0 1.0 no

DataFrame structure
In [9]:
# .index como .columns are iterable objects
print('ROWS:')
for index in test_df.index:
print(index)

print()
print('COLUMNS:')
for col in test_df.columns:
print(col)

ROWS:
row1

file:///Users/bestricemossberg/Downloads/S1 _solved.html 2/14

,2022-05-15 22:28 S1 _solved
row2
row3
row4
row5
row6
row7

COLUMNS:
A
B
C
D
E
DataFrames can be understood as a matrix of values with an index for rows and an indes for

columns.
Any bi-dimensional subset will be consider as a DataFrame and any one-dimensional will be
consider as a Series data type

Although the DataFrame has explicit indices (labels) for rows and columns, both DataFrame
and Series still have a positional ("hidden") index.

file:///Users/bestricemossberg/Downloads/S1 _solved.html 3/14

, 2022-05-15 22:28 S1 _solved

In [29]:
# first 3 lines
test_df.head(3)

Out[29]: A B C D E
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no

In [30]:
# last 2 lines
test_df.tail(2)

Out[30]: A B C D E
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

Select values
.iloc vs .loc
One of the advantages of the DataFrame is that it allows us to access the elements (rows,
columns, cells...) in two ways:
1. through the position (numerical index), e.g. first row, eighth column, etc...
2. through the labels, e.g. the column named "name", the file with index "FHX129M", etc...
In order to be clear with which type of index we want to use, there are two methods: .iloc
and .loc .
1. .iloc is used to access elements via (numeric) positions.
file:///Users/bestricemossberg/Downloads/S1 _solved.html 4/14

€6,99

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

beatricemossberg

3,0

(1)

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

beatricemossberg IE University

Bekijk profiel

Volgen

Verkocht

Lid sinds

3 jaar

Aantal volgers

Documenten

Laatst verkocht

2 jaar geleden

Computer Science and Data Notes

3,0

1 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper beatricemossberg. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 45736 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Summary Python Data Operations 1: Data frames

Gekoppeld boek

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen IE University > Computer Science

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?