Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Codes for Data Analysts--Time Saving

Beoordeling
-
Verkocht
-
Pagina's
4
Geüpload op
27-08-2025
Geschreven in
2025/2026

1. Pandas – Data Structures & Manipulation Series: 1D labeled array (like a column). DataFrame: 2D labeled data (like a spreadsheet/SQL table). Handling Missing Data: dropna(), fillna(). Selection/Filtering: By column, row index, boolean conditions. Sorting: sort_values(), sort_index(). Merging/Joining: concat(), merge(). Transformation: apply(), replace(). Loading Data: From CSV, JSON, Excel, HTML, etc. Cleaning: Handle duplicates, missing values, outliers. 2. Pandas – Exploration & Summarization Descriptive Statistics: describe(), mean(), median(), mode(). GroupBy & Aggregations: Split–apply–combine using groupby(). Pivot Tables: pivot_table() for summarizing data. 3. Pandas – Visualization Built-in plotting with Matplotlib backend: Line Plot: (kind="line") Bar Plot: _counts().plot(kind="bar") Scatter Plot: (kind="scatter") Histogram: (kind="hist") 4. NumPy – Arrays & Operations ndarray: Core n-dimensional array object. Array Creation: (), zeros(), ones(), eye(), (). Manipulation: reshape(), concatenate(), split(). Broadcasting: Operations across arrays of different shapes. Math (UFuncs): sqrt(), exp(), custom ufuncs via frompyfunc. Linear Algebra: (), eig(). Statistics: mean(), median(), std(). Random Numbers: rand(), randn(), randint(). 5. Matplotlib – Visualization Basics Line, Bar, Scatter, Histogram plots. Figure & Subplots: figure(), subplots(). Annotations: text(), annotate(). Customization: Titles, labels, gridlines, (). Object-Oriented Interface: More control with fig, ax. 6. Seaborn – Statistical Visualization High-Level Plots: lineplot(), scatterplot(), pairplot(). Distribution Plots: KDE, histograms, violin plots. Categorical Plots: barplot(), boxplot(), swarmplot(). Heatmaps: heatmap((), annot=True). Faceting: FacetGrid for multi-plot comparisons. Customization: set_style(), set_palette(), set_context() for themes and scaling. This cheat sheet is essentially a hands-on reference for data analysis programming using Pandas, NumPy, Matplotlib, and Seaborn — covering data structures, cleaning, transformation, exploration, and visualization.

Meer zien Lees minder
Instelling
Programming For Python Language..
Vak
Programming for python language..

Voorbeeld van de inhoud

Data Analyst Programming Cheat Sheet
# Drop rows with missing values
Pandas - Data Structures df_cleaned = df.dropna() rations like applying a function across the axis, repla-
cing values, etc.
Series
# Fill missing values with mean # Apply function numpy.cumsum to each column
The pandas Series is a one-dimensional labeled array df.apply(np.cumsum)
df_filled = df.fillna(df.mean())
capable of holding any data type (integers, strings, floa- # Replace all occurrences of a string in DataFrame
ting point numbers, Python objects, etc.). It is essenti- df.replace(„test“, „replace_test“)
Filtering and Selection
ally a column in an excel sheet.
Pandas provides various ways to select and filter data.
import pandas as pd Pandas - Data Exploration
s = pd.Series([1, 3, 5, np.nan, 6, 8]) You can select data by row numbers, column names, or
print(s) through boolean indexing. Descriptive Statistics
# Selecting by column names Descriptive statistics can give you great insight into the
Here, the output will be a series object that includes
df[‚A‘] shape of each attribute. Pandas provides a suite of func-
both the data and an associated array of data labels, cal-
# Selecting by row numbers
led the index. tions for generating descriptive statistics and exploring
df[0:3] # selects the first three rows.
# Boolean indexing
the structure of your data.
DataFrame # summary statistics
df[df[‚A‘] > 0] # selects rows where the value of ‚A‘ is greater
The DataFrame is a 2-dimensional labeled data struc- df.describe()
than zero.
# to calculate the mean
ture with columns of potentially different types. You
df.mean()
can think of it like a spreadsheet or SQL table, or a dic- Sorting and Ranking
# to calculate the median
tionary of Series objects. Pandas allows sorting by values and by index. df.median()
Below is an example of creating a DataFrame by pas- # Sort by values # to calculate the mode
sing a NumPy array, with a datetime index and labeled df.sort_values(by=‘B‘) df.mode()
columns: # Sort by index
dates = pd.date_range(‚20130101‘, periods=6) df.sort_index(axis=1, ascending=False)
Aggregations and Grouping
df = pd.DataFrame(np.random.randn(6, 4), index=dates, co- In pandas, the groupby function is used to split the data
lumns=list(‚ABCD‘)) Merging and Joining
into groups based on some criteria.
print(df) Pandas has full-featured, high-performance in-memo-
# groupby ‚A‘ and calculate mean of ‚B‘ and ‚C‘
ry join operations idiomatically similar to relational
df.groupby(‚A‘)[‚B‘,‘C‘].mean()
Pandas - Data Manipulation databases like SQL.
In this code, df.groupby(‚A‘)[‚B‘,‘C‘].mean() groups the
# Merging two data frames
Loading Data df1 = pd.DataFrame({‚A‘: [‚A0‘, ‚A1‘, ‚A2‘],
DataFrame by column ‚A‘ and calculates the mean of
Data can be loaded into DataFrame and Series structu- ‚B‘: [‚B0‘, ‚B1‘, ‚B2‘]}, ‚B‘ and ‚C‘ for each group.
res from many different file formats. For example, you index=[0, 1, 2])
can load data from a CSV file like this: df2 = pd.DataFrame({‚A‘: [‚A2‘, ‚A3‘, ‚A4‘], Pivot Tables
df = pd.read_csv(‚data.csv‘) ‚B‘: [‚B2‘, ‚B3‘, ‚B4‘]}, A pivot table is a way of summarizing data in a DataF-
Similarly, pandas provides read_json, read_html, read_ index=[2, 3, 4]) rame for a particular purpose. It makes heavy use of the
excel, etc. for loading data from different file formats. result = pd.concat([df1, df2]) aggregation function.
In this code, pd.concat([df1, df2]) concatenates df1 and # create a pivot table
Data Cleaning df2 along the row axis. df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘], columns=[‚C‘])
Data cleaning is a key part of the data analysis process, In this code, df.pivot_table(values=‘D‘, index=[‚A‘, ‚B‘],
where you‘ll need to handle missing values, duplicate Data Transformation columns=[‚C‘]) creates a pivot table that groups by ‚A‘
data, and outliers. Pandas DataFrame allows you to perform various ope- and ‚B‘, and columns are set as ‚C‘.

1

Geschreven voor

Instelling
Programming for python language..
Vak
Programming for python language..

Documentinformatie

Geüpload op
27 augustus 2025
Aantal pagina's
4
Geschreven in
2025/2026
Type
SAMENVATTING
$3.49
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
c.7

Ook beschikbaar in voordeelbundel

Thumbnail
Voordeelbundel
Data Science Bundle Must Have
-
4 2025
$ 13.96 Meer info

Maak kennis met de verkoper

Seller avatar
c.7 Icahn School of Medicine at Mount Sinai
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
8 maanden
Aantal volgers
0
Documenten
26
Laatst verkocht
-

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen