Resumen

Summary Codes for Data Analysts--Time Saving

Puntuación

Vendido

Páginas

Subido en

27-08-2025

Escrito en

2025/2026

1. Pandas – Data Structures & Manipulation Series: 1D labeled array (like a column). DataFrame: 2D labeled data (like a spreadsheet/SQL table). Handling Missing Data: dropna(), fillna(). Selection/Filtering: By column, row index, boolean conditions. Sorting: sort_values(), sort_index(). Merging/Joining: concat(), merge(). Transformation: apply(), replace(). Loading Data: From CSV, JSON, Excel, HTML, etc. Cleaning: Handle duplicates, missing values, outliers. 2. Pandas – Exploration & Summarization Descriptive Statistics: describe(), mean(), median(), mode(). GroupBy & Aggregations: Split–apply–combine using groupby(). Pivot Tables: pivot_table() for summarizing data. 3. Pandas – Visualization Built-in plotting with Matplotlib backend: Line Plot: (kind="line") Bar Plot: _counts().plot(kind="bar") Scatter Plot: (kind="scatter") Histogram: (kind="hist") 4. NumPy – Arrays & Operations ndarray: Core n-dimensional array object. Array Creation: (), zeros(), ones(), eye(), (). Manipulation: reshape(), concatenate(), split(). Broadcasting: Operations across arrays of different shapes. Math (UFuncs): sqrt(), exp(), custom ufuncs via frompyfunc. Linear Algebra: (), eig(). Statistics: mean(), median(), std(). Random Numbers: rand(), randn(), randint(). 5. Matplotlib – Visualization Basics Line, Bar, Scatter, Histogram plots. Figure & Subplots: figure(), subplots(). Annotations: text(), annotate(). Customization: Titles, labels, gridlines, (). Object-Oriented Interface: More control with fig, ax. 6. Seaborn – Statistical Visualization High-Level Plots: lineplot(), scatterplot(), pairplot(). Distribution Plots: KDE, histograms, violin plots. Categorical Plots: barplot(), boxplot(), swarmplot(). Heatmaps: heatmap((), annot=True). Faceting: FacetGrid for multi-plot comparisons. Customization: set_style(), set_palette(), set_context() for themes and scaling. This cheat sheet is essentially a hands-on reference for data analysis programming using Pandas, NumPy, Matplotlib, and Seaborn — covering data structures, cleaning, transformation, exploration, and visualization.

Mostrar más Leer menos

Institución

Programming For Python Language..

Grado

Programming for python language..

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Programming for python language..
Grado: Programming for python language..

Información del documento

Subido en: 27 de agosto de 2025
Número de páginas: 4
Escrito en: 2025/2026
Tipo: Resumen

Temas

data analyst
data science
python
programming

Vista previa del contenido

Data Analyst Programming Cheat Sheet
# Drop rows with missing values
Pandas - Data Structures df_cleaned = df.dropna() rations like applying a function across the axis, repla-
cing values, etc.
Series
# Fill missing values with mean # Apply function numpy.cumsum to each column
The pandas Series is a one-dimensional labeled array df.apply(np.cumsum)
df_filled = df.fillna(df.mean())
capable of holding any data type (integers, strings, floa- # Replace all occurrences of a string in DataFrame
ting point numbers, Python objects, etc.). It is essenti- df.replace(„test“, „replace_test“)
Filtering and Selection
ally a column in an excel sheet.
Pandas provides various ways to select and filter data.
import pandas as pd Pandas - Data Exploration
s = pd.Series([1, 3, 5, np.nan, 6, 8]) You can select data by row numbers, column names, or
print(s) through boolean indexing. Descriptive Statistics
# Selecting by column names Descriptive statistics can give you great insight into the
Here, the output will be a series object that includes
df[‚A‘] shape of each attribute. Pandas provides a suite of func-
both the data and an associated array of data labels, cal-
# Selecting by row numbers
led the index. tions for generating descriptive statistics and exploring
df[0:3] # selects the first three rows.
# Boolean indexing
the structure of your data.
DataFrame # summary statistics
df[df[‚A‘] > 0] # selects rows where the value of ‚A‘ is greater
The DataFrame is a 2-dimensional labeled data struc- df.describe()
than zero.
# to calculate the mean
ture with columns of potentially different types. You
df.mean()
can think of it like a spreadsheet or SQL table, or a dic- Sorting and Ranking
# to calculate the median
tionary of Series objects. Pandas allows sorting by values and by index. df.median()
Below is an example of creating a DataFrame by pas- # Sort by values # to calculate the mode
sing a NumPy array, with a datetime index and labeled df.sort_values(by=‘B‘) df.mode()
columns: # Sort by index
dates = pd.date_range(‚20130101‘, periods=6) df.sort_index(axis=1, ascending=False)
Aggregations and Grouping
df = pd.DataFrame(np.random.randn(6, 4), index=dates, co- In pandas, the groupby function is used to split the data
lumns=list(‚ABCD‘)) Merging and Joining
into groups based on some criteria.
print(df) Pandas has full-featured, high-performance in-memo-
# groupby ‚A‘ and calculate mean of ‚B‘ and ‚C‘
ry join operations idiomatically similar to relational
df.groupby(‚A‘)[‚B‘,‘C‘].mean()
Pandas - Data Manipulation databases like SQL.
In this code, df.groupby(‚A‘)[‚B‘,‘C‘].mean() groups the
# Merging two data frames
Loading Data df1 = pd.DataFrame({‚A‘: [‚A0‘, ‚A1‘, ‚A2‘],
DataFrame by column ‚A‘ and calculates the mean of
Data can be loaded into DataFrame and Series structu- ‚B‘: [‚B0‘, ‚B1‘, ‚B2‘]}, ‚B‘ and ‚C‘ for each group.
res from many different file formats. For example, you index=[0, 1, 2])
can load data from a CSV file like this: df2 = pd.DataFrame({‚A‘: [‚A2‘, ‚A3‘, ‚A4‘], Pivot Tables
df = pd.read_csv(‚data.csv‘) ‚B‘: [‚B2‘, ‚B3‘, ‚B4‘]}, A pivot table is a way of summarizing data in a DataF-
Similarly, pandas provides read_json, read_html, read_ index=[2, 3, 4]) rame for a particular purpose. It makes heavy use of the
excel, etc. for loading data from different file formats. result = pd.concat([df1, df2]) aggregation function.
In this code, pd.concat([df1, df2]) concatenates df1 and # create a pivot table
Data Cleaning df2 along the row axis. df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘], columns=[‚C‘])
Data cleaning is a key part of the data analysis process, In this code, df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘],
where you‘ll need to handle missing values, duplicate Data Transformation columns=[‚C‘]) creates a pivot table that groups by ‚A‘
data, and outliers. Pandas DataFrame allows you to perform various ope- and ‚B‘, and columns are set as ‚C‘.

1

$2.99

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

c.7

Documento también disponible en un lote

Conoce al vendedor

c.7 Icahn School of Medicine at Mount Sinai

Ver perfil

Seguir

Vendido

Miembro desde

4 meses

Número de seguidores

Documentos

Última venta

0.0

0 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller c.7. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $2.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now