Summary

Summary data science

Rating

Sold

Pages

Uploaded on

17-01-2025

Written in

2024/2025

An SQL guide is a comprehensive resource designed to teach users how to use SQL (Structured Query Language) for managing and manipulating relational databases. It typically covers the syntax, commands, and best practices for querying, updating, inserting, and deleting data in a database. Key topics included in an SQL guide often include: Basic SQL Commands: Introduction to fundamental SQL commands like SELECT, INSERT, UPDATE, and DELETE. Filtering and Sorting: Techniques for filtering data using WHERE conditions and sorting results using ORDER BY. Joins and Relationships: Explanation of different types of joins (INNER JOIN, LEFT JOIN, RIGHT JOIN, etc.) to combine data from multiple tables based on relationships. Aggregating Data: Use of aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN() to summarize data. Subqueries: Writing nested queries to perform more complex operations. Grouping Data: Using GROUP BY to group rows based on shared column values and applying aggregate functions to groups. Database Design: Overview of how to create, modify, and delete database tables, as well as setting primary and foreign keys to ensure data integrity. Data Constraints: Explanation of constraints such as NOT NULL, UNIQUE, CHECK, and FOREIGN KEY for ensuring data accuracy.

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Cheyney University
Course: 000 - P1

All documents for this subject (1)

Document information

Uploaded on: January 17, 2025
Number of pages: 5
Written in: 2024/2025
Type: Summary

Subjects

data definition language
data manipulation language
query filtering and sorting

Content preview

Data Science
15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

Study Guide: Data Retrieval with SQL Category Operator Command
Equality / non-equality = / !=, <>
Inequalities >=, >, <, <=
Afshine Amidi and Shervine Amidi Belonging IN (val_1, ..., val_n)
General
And / or AND / OR
August 21, 2020 Check for missing value IS NULL
Between bounds BETWEEN val_1 AND val_2

e
Strings Pattern matching LIKE ’%val%’
General concepts
r Structured Query Language – Structured Query Language, abbreviated as SQL, is a

id
language that is largely used in the industry to query data from databases. r Joins – Two tables table_1 and table_2 can be joined in the following way:

r Query structure – Queries are usually structured as follows: SQL
...
SQL FROM table_1 t1

gu
-- Select fields.....................mandatory type_of_join table_2 t2
SELECT ..ON (t2.key = t1.key)
....col_1,
....col_2, ...
........ ,
....col_n where the different type_of_join commands are summarized in the table below:
-- Source of data....................mandatory
FROM table t Type of join Illustration

n
-- Gather info from other sources....optional
JOIN other_table ot INNER JOIN
..ON (t.key = ot.key)

-- Conditions........................optional
WHERE some_condition(s)
-- Aggregating.......................optional
GROUP BY column_group_list
io LEFT JOIN
is
-- Sorting values....................optional
ORDER BY column_order_list
RIGHT JOIN
-- Restricting aggregated values.....optional
HAVING some_condition(s)
v

-- Limiting number of rows...........optional
LIMIT some_value FULL JOIN
re

Remark: the SELECT DISTINCT command can be used to ensure not having duplicate rows.

r Condition – A condition is of the following format: Remark: joining every row of table 1 with every row of table 2 can be done with the CROSS JOIN
command, and is commonly known as the cartesian product.
SQL
some_col some_operator some_col_or_value
Aggregations
where some_operator can be among the following common operations: r Grouping data – Aggregate metrics are computed on grouped data in the following way:

Massachusetts Institute of Technology 1 https://www.mit.edu/~amidi

, 15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi

WHERE HAVING
- Filter condition applies to individual rows - Filter condition applies to aggregates
- Statement placed right after FROM - Statement placed right after GROUP BY

Remark: if WHERE and HAVING are both in the same query, WHERE will be executed first.

The SQL command is as follows:
Window functions
SQL
r Definition – A window function computes a metric over groups and has the following struc-

e
SELECT ture:
....col_1,
....agg_function(col_2)
FROM table

id
GROUP BY col_1

r Grouping sets – The GROUPING SETS command is useful when there is a need to compute
aggregations across different dimensions at a time. Below is an example of how all aggregations

gu
across two dimensions are computed:
The SQL command is as follows:
SQL
SQL
SELECT
....col_1, some_window_function() OVER(PARTITION BY some_col ORDER BY another_col)
....col_2,
....agg_function(col_3) Remark: window functions are only allowed in the SELECT clause.
FROM table
GROUP BY (

n
r Row numbering – The table below summarizes the main commands that rank each row
..GROUPING SETS across specified groups, ordered by a specific column:
....(col_1),
....(col_2),
....(col_1, col_2)
)
io
r Aggregation functions – The table below summarizes the main aggregate functions that
can be used in an aggregation query:
Command
ROW_NUMBER()
RANK()
DENSE_RANK()
Description
Ties are given different ranks
Ties are given same rank and skip numbers
Ties are given same rank and don’t skip numbers
Example
1, 2, 3, 4
1, 2, 2, 4
1, 2, 2, 3
is
Category Operation Command
r Values – The following window functions allow to keep track of specific types of values with
Mean AVG(col) respect to the partition:
v

Percentile PERCENTILE_APPROX(col, p)
Command Description
Values Sum / # of instances SUM(col) / COUNT(col)
Takes the first value of the column
re

FIRST_VALUE(col)
Max / min MAX(col) / MIN(col)
LAST_VALUE(col) Takes the last value of the column
Variance / standard deviation VAR(col) / STDEV(col)
LAG(col, n) Takes the nth previous value of the column
Arrays Concatenate into array collect_list(col)
LEAD(col, n) Takes the nth following value of the column
Remark: the median can be computed using the PERCENTILE_APPROX function with p equal to 0.5. NTH_VALUE(col, n) Takes the nth value of the column
r Filtering – The table below highlights the differences between the WHERE and HAVING com-
mands:

Massachusetts Institute of Technology 2 https://www.mit.edu/~amidi

$9.89

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

gracemmuiruri

Get to know the seller

gracemmuiruri Teachme2-tutor

View profile

Sold

Member since

11 months

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller gracemmuiruri. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $9.89. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45804 documents were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now

Summary data science

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay how you prefer, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying this summary from?

Will I be stuck with a subscription?

Can Stuvia be trusted?