Summary

Summary Codes for Data Analysts--Time Saving

Rating

Sold

Pages

Uploaded on

27-08-2025

Written in

2025/2026

1. Pandas – Data Structures & Manipulation Series: 1D labeled array (like a column). DataFrame: 2D labeled data (like a spreadsheet/SQL table). Handling Missing Data: dropna(), fillna(). Selection/Filtering: By column, row index, boolean conditions. Sorting: sort_values(), sort_index(). Merging/Joining: concat(), merge(). Transformation: apply(), replace(). Loading Data: From CSV, JSON, Excel, HTML, etc. Cleaning: Handle duplicates, missing values, outliers. 2. Pandas – Exploration & Summarization Descriptive Statistics: describe(), mean(), median(), mode(). GroupBy & Aggregations: Split–apply–combine using groupby(). Pivot Tables: pivot_table() for summarizing data. 3. Pandas – Visualization Built-in plotting with Matplotlib backend: Line Plot: (kind="line") Bar Plot: _counts().plot(kind="bar") Scatter Plot: (kind="scatter") Histogram: (kind="hist") 4. NumPy – Arrays & Operations ndarray: Core n-dimensional array object. Array Creation: (), zeros(), ones(), eye(), (). Manipulation: reshape(), concatenate(), split(). Broadcasting: Operations across arrays of different shapes. Math (UFuncs): sqrt(), exp(), custom ufuncs via frompyfunc. Linear Algebra: (), eig(). Statistics: mean(), median(), std(). Random Numbers: rand(), randn(), randint(). 5. Matplotlib – Visualization Basics Line, Bar, Scatter, Histogram plots. Figure & Subplots: figure(), subplots(). Annotations: text(), annotate(). Customization: Titles, labels, gridlines, (). Object-Oriented Interface: More control with fig, ax. 6. Seaborn – Statistical Visualization High-Level Plots: lineplot(), scatterplot(), pairplot(). Distribution Plots: KDE, histograms, violin plots. Categorical Plots: barplot(), boxplot(), swarmplot(). Heatmaps: heatmap((), annot=True). Faceting: FacetGrid for multi-plot comparisons. Customization: set_style(), set_palette(), set_context() for themes and scaling. This cheat sheet is essentially a hands-on reference for data analysis programming using Pandas, NumPy, Matplotlib, and Seaborn — covering data structures, cleaning, transformation, exploration, and visualization.

Show more Read less

Institution

Programming For Python Language..

Course

Programming for python language..

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Programming for python language..
Course: Programming for python language..

Document information

Uploaded on: August 27, 2025
Number of pages: 4
Written in: 2025/2026
Type: Summary

Subjects

Content preview

Data Analyst Programming Cheat Sheet
# Drop rows with missing values
Pandas - Data Structures df_cleaned = df.dropna() rations like applying a function across the axis, repla-
cing values, etc.
Series
# Fill missing values with mean # Apply function numpy.cumsum to each column
The pandas Series is a one-dimensional labeled array df.apply(np.cumsum)
df_filled = df.fillna(df.mean())
capable of holding any data type (integers, strings, floa- # Replace all occurrences of a string in DataFrame
ting point numbers, Python objects, etc.). It is essenti- df.replace(„test“, „replace_test“)
Filtering and Selection
ally a column in an excel sheet.
Pandas provides various ways to select and filter data.
import pandas as pd Pandas - Data Exploration
s = pd.Series([1, 3, 5, np.nan, 6, 8]) You can select data by row numbers, column names, or
print(s) through boolean indexing. Descriptive Statistics
# Selecting by column names Descriptive statistics can give you great insight into the
Here, the output will be a series object that includes
df[‚A‘] shape of each attribute. Pandas provides a suite of func-
both the data and an associated array of data labels, cal-
# Selecting by row numbers
led the index. tions for generating descriptive statistics and exploring
df[0:3] # selects the first three rows.
# Boolean indexing
the structure of your data.
DataFrame # summary statistics
df[df[‚A‘] > 0] # selects rows where the value of ‚A‘ is greater
The DataFrame is a 2-dimensional labeled data struc- df.describe()
than zero.
# to calculate the mean
ture with columns of potentially different types. You
df.mean()
can think of it like a spreadsheet or SQL table, or a dic- Sorting and Ranking
# to calculate the median
tionary of Series objects. Pandas allows sorting by values and by index. df.median()
Below is an example of creating a DataFrame by pas- # Sort by values # to calculate the mode
sing a NumPy array, with a datetime index and labeled df.sort_values(by=‘B‘) df.mode()
columns: # Sort by index
dates = pd.date_range(‚20130101‘, periods=6) df.sort_index(axis=1, ascending=False)
Aggregations and Grouping
df = pd.DataFrame(np.random.randn(6, 4), index=dates, co- In pandas, the groupby function is used to split the data
lumns=list(‚ABCD‘)) Merging and Joining
into groups based on some criteria.
print(df) Pandas has full-featured, high-performance in-memo-
# groupby ‚A‘ and calculate mean of ‚B‘ and ‚C‘
ry join operations idiomatically similar to relational
df.groupby(‚A‘)[‚B‘,‘C‘].mean()
Pandas - Data Manipulation databases like SQL.
In this code, df.groupby(‚A‘)[‚B‘,‘C‘].mean() groups the
# Merging two data frames
Loading Data df1 = pd.DataFrame({‚A‘: [‚A0‘, ‚A1‘, ‚A2‘],
DataFrame by column ‚A‘ and calculates the mean of
Data can be loaded into DataFrame and Series structu- ‚B‘: [‚B0‘, ‚B1‘, ‚B2‘]}, ‚B‘ and ‚C‘ for each group.
res from many different file formats. For example, you index=[0, 1, 2])
can load data from a CSV file like this: df2 = pd.DataFrame({‚A‘: [‚A2‘, ‚A3‘, ‚A4‘], Pivot Tables
df = pd.read_csv(‚data.csv‘) ‚B‘: [‚B2‘, ‚B3‘, ‚B4‘]}, A pivot table is a way of summarizing data in a DataF-
Similarly, pandas provides read_json, read_html, read_ index=[2, 3, 4]) rame for a particular purpose. It makes heavy use of the
excel, etc. for loading data from different file formats. result = pd.concat([df1, df2]) aggregation function.
In this code, pd.concat([df1, df2]) concatenates df1 and # create a pivot table
Data Cleaning df2 along the row axis. df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘], columns=[‚C‘])
Data cleaning is a key part of the data analysis process, In this code, df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘],
where you‘ll need to handle missing values, duplicate Data Transformation columns=[‚C‘]) creates a pivot table that groups by ‚A‘
data, and outliers. Pandas DataFrame allows you to perform various ope- and ‚B‘, and columns are set as ‚C‘.

1

$2.99

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

c.7

Also available in package deal

Get to know the seller

c.7 Icahn School of Medicine at Mount Sinai

View profile

Sold

Member since

4 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller c.7. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $2.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47073 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Summary Codes for Data Analysts--Time Saving

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?