100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary data science

Rating
-
Sold
-
Pages
5
Uploaded on
17-01-2025
Written in
2024/2025

An SQL guide is a comprehensive resource designed to teach users how to use SQL (Structured Query Language) for managing and manipulating relational databases. It typically covers the syntax, commands, and best practices for querying, updating, inserting, and deleting data in a database. Key topics included in an SQL guide often include: Basic SQL Commands: Introduction to fundamental SQL commands like SELECT, INSERT, UPDATE, and DELETE. Filtering and Sorting: Techniques for filtering data using WHERE conditions and sorting results using ORDER BY. Joins and Relationships: Explanation of different types of joins (INNER JOIN, LEFT JOIN, RIGHT JOIN, etc.) to combine data from multiple tables based on relationships. Aggregating Data: Use of aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN() to summarize data. Subqueries: Writing nested queries to perform more complex operations. Grouping Data: Using GROUP BY to group rows based on shared column values and applying aggregate functions to groups. Database Design: Overview of how to create, modify, and delete database tables, as well as setting primary and foreign keys to ensure data integrity. Data Constraints: Explanation of constraints such as NOT NULL, UNIQUE, CHECK, and FOREIGN KEY for ensuring data accuracy.

Show more Read less









Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
January 17, 2025
Number of pages
5
Written in
2024/2025
Type
Summary

Subjects

Content preview

Data Science
15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi



Study Guide: Data Retrieval with SQL Category Operator Command
Equality / non-equality = / !=, <>
Inequalities >=, >, <, <=
Afshine Amidi and Shervine Amidi Belonging IN (val_1, ..., val_n)
General
And / or AND / OR
August 21, 2020 Check for missing value IS NULL
Between bounds BETWEEN val_1 AND val_2




e
Strings Pattern matching LIKE ’%val%’
General concepts
r Structured Query Language – Structured Query Language, abbreviated as SQL, is a




id
language that is largely used in the industry to query data from databases. r Joins – Two tables table_1 and table_2 can be joined in the following way:

r Query structure – Queries are usually structured as follows: SQL
...
SQL FROM table_1 t1




gu
-- Select fields.....................mandatory type_of_join table_2 t2
SELECT ..ON (t2.key = t1.key)
....col_1,
....col_2, ...
........ ,
....col_n where the different type_of_join commands are summarized in the table below:
-- Source of data....................mandatory
FROM table t Type of join Illustration




n
-- Gather info from other sources....optional
JOIN other_table ot INNER JOIN
..ON (t.key = ot.key)

-- Conditions........................optional
WHERE some_condition(s)
-- Aggregating.......................optional
GROUP BY column_group_list
io LEFT JOIN
is
-- Sorting values....................optional
ORDER BY column_order_list
RIGHT JOIN
-- Restricting aggregated values.....optional
HAVING some_condition(s)
v

-- Limiting number of rows...........optional
LIMIT some_value FULL JOIN
re


Remark: the SELECT DISTINCT command can be used to ensure not having duplicate rows.

r Condition – A condition is of the following format: Remark: joining every row of table 1 with every row of table 2 can be done with the CROSS JOIN
command, and is commonly known as the cartesian product.
SQL
some_col some_operator some_col_or_value
Aggregations
where some_operator can be among the following common operations: r Grouping data – Aggregate metrics are computed on grouped data in the following way:


Massachusetts Institute of Technology 1 https://www.mit.edu/~amidi

, 15.003 Software Tools — Data Science Afshine Amidi & Shervine Amidi


WHERE HAVING
- Filter condition applies to individual rows - Filter condition applies to aggregates
- Statement placed right after FROM - Statement placed right after GROUP BY

Remark: if WHERE and HAVING are both in the same query, WHERE will be executed first.

The SQL command is as follows:
Window functions
SQL
r Definition – A window function computes a metric over groups and has the following struc-




e
SELECT ture:
....col_1,
....agg_function(col_2)
FROM table




id
GROUP BY col_1


r Grouping sets – The GROUPING SETS command is useful when there is a need to compute
aggregations across different dimensions at a time. Below is an example of how all aggregations




gu
across two dimensions are computed:
The SQL command is as follows:
SQL
SQL
SELECT
....col_1, some_window_function() OVER(PARTITION BY some_col ORDER BY another_col)
....col_2,
....agg_function(col_3) Remark: window functions are only allowed in the SELECT clause.
FROM table
GROUP BY (




n
r Row numbering – The table below summarizes the main commands that rank each row
..GROUPING SETS across specified groups, ordered by a specific column:
....(col_1),
....(col_2),
....(col_1, col_2)
)
io
r Aggregation functions – The table below summarizes the main aggregate functions that
can be used in an aggregation query:
Command
ROW_NUMBER()
RANK()
DENSE_RANK()
Description
Ties are given different ranks
Ties are given same rank and skip numbers
Ties are given same rank and don’t skip numbers
Example
1, 2, 3, 4
1, 2, 2, 4
1, 2, 2, 3
is
Category Operation Command
r Values – The following window functions allow to keep track of specific types of values with
Mean AVG(col) respect to the partition:
v

Percentile PERCENTILE_APPROX(col, p)
Command Description
Values Sum / # of instances SUM(col) / COUNT(col)
Takes the first value of the column
re

FIRST_VALUE(col)
Max / min MAX(col) / MIN(col)
LAST_VALUE(col) Takes the last value of the column
Variance / standard deviation VAR(col) / STDEV(col)
LAG(col, n) Takes the nth previous value of the column
Arrays Concatenate into array collect_list(col)
LEAD(col, n) Takes the nth following value of the column
Remark: the median can be computed using the PERCENTILE_APPROX function with p equal to 0.5. NTH_VALUE(col, n) Takes the nth value of the column
r Filtering – The table below highlights the differences between the WHERE and HAVING com-
mands:


Massachusetts Institute of Technology 2 https://www.mit.edu/~amidi
$9.89
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
gracemmuiruri

Get to know the seller

Seller avatar
gracemmuiruri Teachme2-tutor
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
11 months
Number of followers
0
Documents
2
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions