100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Codes for Data Analysts--Time Saving

Rating
-
Sold
-
Pages
4
Uploaded on
27-08-2025
Written in
2025/2026

1. Pandas – Data Structures & Manipulation Series: 1D labeled array (like a column). DataFrame: 2D labeled data (like a spreadsheet/SQL table). Handling Missing Data: dropna(), fillna(). Selection/Filtering: By column, row index, boolean conditions. Sorting: sort_values(), sort_index(). Merging/Joining: concat(), merge(). Transformation: apply(), replace(). Loading Data: From CSV, JSON, Excel, HTML, etc. Cleaning: Handle duplicates, missing values, outliers. 2. Pandas – Exploration & Summarization Descriptive Statistics: describe(), mean(), median(), mode(). GroupBy & Aggregations: Split–apply–combine using groupby(). Pivot Tables: pivot_table() for summarizing data. 3. Pandas – Visualization Built-in plotting with Matplotlib backend: Line Plot: (kind="line") Bar Plot: _counts().plot(kind="bar") Scatter Plot: (kind="scatter") Histogram: (kind="hist") 4. NumPy – Arrays & Operations ndarray: Core n-dimensional array object. Array Creation: (), zeros(), ones(), eye(), (). Manipulation: reshape(), concatenate(), split(). Broadcasting: Operations across arrays of different shapes. Math (UFuncs): sqrt(), exp(), custom ufuncs via frompyfunc. Linear Algebra: (), eig(). Statistics: mean(), median(), std(). Random Numbers: rand(), randn(), randint(). 5. Matplotlib – Visualization Basics Line, Bar, Scatter, Histogram plots. Figure & Subplots: figure(), subplots(). Annotations: text(), annotate(). Customization: Titles, labels, gridlines, (). Object-Oriented Interface: More control with fig, ax. 6. Seaborn – Statistical Visualization High-Level Plots: lineplot(), scatterplot(), pairplot(). Distribution Plots: KDE, histograms, violin plots. Categorical Plots: barplot(), boxplot(), swarmplot(). Heatmaps: heatmap((), annot=True). Faceting: FacetGrid for multi-plot comparisons. Customization: set_style(), set_palette(), set_context() for themes and scaling. This cheat sheet is essentially a hands-on reference for data analysis programming using Pandas, NumPy, Matplotlib, and Seaborn — covering data structures, cleaning, transformation, exploration, and visualization.

Show more Read less
Institution
Programming For Python Language..
Course
Programming for python language..








Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Programming for python language..
Course
Programming for python language..

Document information

Uploaded on
August 27, 2025
Number of pages
4
Written in
2025/2026
Type
Summary

Content preview

Data Analyst Programming Cheat Sheet
# Drop rows with missing values
Pandas - Data Structures df_cleaned = df.dropna() rations like applying a function across the axis, repla-
cing values, etc.
Series
# Fill missing values with mean # Apply function numpy.cumsum to each column
The pandas Series is a one-dimensional labeled array df.apply(np.cumsum)
df_filled = df.fillna(df.mean())
capable of holding any data type (integers, strings, floa- # Replace all occurrences of a string in DataFrame
ting point numbers, Python objects, etc.). It is essenti- df.replace(„test“, „replace_test“)
Filtering and Selection
ally a column in an excel sheet.
Pandas provides various ways to select and filter data.
import pandas as pd Pandas - Data Exploration
s = pd.Series([1, 3, 5, np.nan, 6, 8]) You can select data by row numbers, column names, or
print(s) through boolean indexing. Descriptive Statistics
# Selecting by column names Descriptive statistics can give you great insight into the
Here, the output will be a series object that includes
df[‚A‘] shape of each attribute. Pandas provides a suite of func-
both the data and an associated array of data labels, cal-
# Selecting by row numbers
led the index. tions for generating descriptive statistics and exploring
df[0:3] # selects the first three rows.
# Boolean indexing
the structure of your data.
DataFrame # summary statistics
df[df[‚A‘] > 0] # selects rows where the value of ‚A‘ is greater
The DataFrame is a 2-dimensional labeled data struc- df.describe()
than zero.
# to calculate the mean
ture with columns of potentially different types. You
df.mean()
can think of it like a spreadsheet or SQL table, or a dic- Sorting and Ranking
# to calculate the median
tionary of Series objects. Pandas allows sorting by values and by index. df.median()
Below is an example of creating a DataFrame by pas- # Sort by values # to calculate the mode
sing a NumPy array, with a datetime index and labeled df.sort_values(by=‘B‘) df.mode()
columns: # Sort by index
dates = pd.date_range(‚20130101‘, periods=6) df.sort_index(axis=1, ascending=False)
Aggregations and Grouping
df = pd.DataFrame(np.random.randn(6, 4), index=dates, co- In pandas, the groupby function is used to split the data
lumns=list(‚ABCD‘)) Merging and Joining
into groups based on some criteria.
print(df) Pandas has full-featured, high-performance in-memo-
# groupby ‚A‘ and calculate mean of ‚B‘ and ‚C‘
ry join operations idiomatically similar to relational
df.groupby(‚A‘)[‚B‘,‘C‘].mean()
Pandas - Data Manipulation databases like SQL.
In this code, df.groupby(‚A‘)[‚B‘,‘C‘].mean() groups the
# Merging two data frames
Loading Data df1 = pd.DataFrame({‚A‘: [‚A0‘, ‚A1‘, ‚A2‘],
DataFrame by column ‚A‘ and calculates the mean of
Data can be loaded into DataFrame and Series structu- ‚B‘: [‚B0‘, ‚B1‘, ‚B2‘]}, ‚B‘ and ‚C‘ for each group.
res from many different file formats. For example, you index=[0, 1, 2])
can load data from a CSV file like this: df2 = pd.DataFrame({‚A‘: [‚A2‘, ‚A3‘, ‚A4‘], Pivot Tables
df = pd.read_csv(‚data.csv‘) ‚B‘: [‚B2‘, ‚B3‘, ‚B4‘]}, A pivot table is a way of summarizing data in a DataF-
Similarly, pandas provides read_json, read_html, read_ index=[2, 3, 4]) rame for a particular purpose. It makes heavy use of the
excel, etc. for loading data from different file formats. result = pd.concat([df1, df2]) aggregation function.
In this code, pd.concat([df1, df2]) concatenates df1 and # create a pivot table
Data Cleaning df2 along the row axis. df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘], columns=[‚C‘])
Data cleaning is a key part of the data analysis process, In this code, df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘],
where you‘ll need to handle missing values, duplicate Data Transformation columns=[‚C‘]) creates a pivot table that groups by ‚A‘
data, and outliers. Pandas DataFrame allows you to perform various ope- and ‚B‘, and columns are set as ‚C‘.

1
$2.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
c.7

Also available in package deal

Thumbnail
Package deal
Data Science Bundle Must Have
-
4 2025
$ 11.96 More info

Get to know the seller

Seller avatar
c.7 Icahn School of Medicine at Mount Sinai
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
4 months
Number of followers
0
Documents
26
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions