Resumen

Summary Python Data Operations 1: Data frames

Puntuación

Vendido

Páginas

117

Subido en

09-12-2022

Escrito en

2022/2023

Notes of Pandas data operations covered in the Principles of Programming course, part of the Computer Science and AI bachelor degree. The notes are initially written in Jupyter Notebook. They contain practical examples of data operations in python and images to explain the structures and processes. This first notebook contains: - Introduction to Pandas and dataframes - The structure of a dataframe - Selecting values - Select cells - Select rows - Select columns - Slicing - Inserting/Updating Elements - Insert/update values - Insert/update rows -Insert/update columns - Renaming rows/columns

Mostrar más Leer menos

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Libro relacionado

John M. Zelle Python Programming

Edición:2004
ISBN:9781887902991
Edición:Desconocido

Escuela, estudio y materia

Institución: IE University
Estudio: Computer Science
Grado: Programming

Todos documentos para esta materia (4)

Información del documento

¿Un libro?: No
¿Qué capítulos están resumidos?: Data wrangling
Subido en: 9 de diciembre de 2022
Número de páginas: 117
Escrito en: 2022/2023
Tipo: Resumen

Temas

dataframe
data frame
data analytics
data science
business analytics
python
pandas
principles of programming
programming
reference
cheat sheet
overview
test
select comman
study guide
summary
notes
exam

Vista previa del contenido

2022-05-15 22:28 S1 _solved

In [1]:
import pandas as pd
import numpy as np

Pandas -
What is a Dataframe?
DataFrame is a data type provided by the library pandas
In python is the most relevant data type to work with tables and data
List are also important to work with data , as obviously you already know, but we are
going to focus in df (dataframes)
Imagine dataframe as a table created by rows and colummns:
Each row and column is an object type pandas.Series
An object type pandas.Series is a vector (list). In each element contains a label
Create a DataFrame
The main ways to do it:
Using data manually
Lists of lists
Nested dictionaries
Reading the information from .csv file
Using the function pd.read_csv() path of the file is mandatory.
In [1]:
data_lst = [
['A3', 0, -1, 0, 'si'],
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']
]

data_lst

[['A3', 0, -1, 0, 'si'],
Out[1]:
['B1', 1, None, 0, 'no'],
['B3', 4, None, 0, 'no'],
['B3', 5, 1, 0, 'si'],
['A1', 4, 0, None, None],
['A3', 1, 2, 1, 'si'],
['C2', 4, 1, 1, 'no']]

In [12]:
col0 = []
for row in data_lst:
col0.append(row[0])

col0

['A3', 'B1', 'B3', 'B3', 'A1', 'A3', 'C2']
Out[12]:
file:///Users/bestricemossberg/Downloads/S1 _solved.html 1/14

,2022-05-15 22:28 S1 _solved

Test
In [7]:
test_df = pd.DataFrame(
data_lst
)
test_df

Out[7]: 0 1 2 3 4
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no
3 B3 5 1.0 0.0 si
4 A1 4 0.0 NaN None
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

In [10]:
test_df = pd.DataFrame(
data_lst,
columns=['A', 'B', 'C', 'D', 'E'],
index=[f'row{i}' for i in range(1, 8)]
)
test_df

Out[10]: A B C D E
row1 A3 0 -1.0 0.0 si
row2 B1 1 NaN 0.0 no
row3 B3 4 NaN 0.0 no
row4 B3 5 1.0 0.0 si
row5 A1 4 0.0 NaN None
row6 A3 1 2.0 1.0 si
row7 C2 4 1.0 1.0 no

DataFrame structure
In [9]:
# .index como .columns are iterable objects
print('ROWS:')
for index in test_df.index:
print(index)

print()
print('COLUMNS:')
for col in test_df.columns:
print(col)

ROWS:
row1

file:///Users/bestricemossberg/Downloads/S1 _solved.html 2/14

,2022-05-15 22:28 S1 _solved
row2
row3
row4
row5
row6
row7

COLUMNS:
A
B
C
D
E
DataFrames can be understood as a matrix of values with an index for rows and an indes for

columns.
Any bi-dimensional subset will be consider as a DataFrame and any one-dimensional will be
consider as a Series data type

Although the DataFrame has explicit indices (labels) for rows and columns, both DataFrame
and Series still have a positional ("hidden") index.

file:///Users/bestricemossberg/Downloads/S1 _solved.html 3/14

, 2022-05-15 22:28 S1 _solved

In [29]:
# first 3 lines
test_df.head(3)

Out[29]: A B C D E
0 A3 0 -1.0 0.0 si
1 B1 1 NaN 0.0 no
2 B3 4 NaN 0.0 no

In [30]:
# last 2 lines
test_df.tail(2)

Out[30]: A B C D E
5 A3 1 2.0 1.0 si
6 C2 4 1.0 1.0 no

Select values
.iloc vs .loc
One of the advantages of the DataFrame is that it allows us to access the elements (rows,
columns, cells...) in two ways:
1. through the position (numerical index), e.g. first row, eighth column, etc...
2. through the labels, e.g. the column named "name", the file with index "FHX129M", etc...
In order to be clear with which type of index we want to use, there are two methods: .iloc
and .loc .
1. .iloc is used to access elements via (numeric) positions.
file:///Users/bestricemossberg/Downloads/S1 _solved.html 4/14

6,99 €

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

beatricemossberg

3,0

(1)

Documento también disponible en un lote

Conoce al vendedor

beatricemossberg IE University

Ver perfil

Seguir

Vendido

Miembro desde

3 año

Número de seguidores

Documentos

Última venta

2 año hace

Computer Science and Data Notes

3,0

1 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller beatricemossberg. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for 6,99 €. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

Summary Python Data Operations 1: Data frames

Libro relacionado

Escuela, estudio y materia

Información del documento

Temas

Vista previa del contenido

Mas cursos para IE University > Computer Science

Documento también disponible en un lote

Conoce al vendedor

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

¿No estás satisfecho? Elige otro documento

Paga como quieras, empieza a estudiar al instante

Preguntas frecuentes

What do I get when I buy this document?

100% de satisfacción garantizada: ¿Cómo funciona?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?