100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

ASU CSE 578 Midterm and Final Prep & Subjects from 2021

Rating
2.0
(1)
Sold
6
Pages
16
Grade
A
Uploaded on
12-01-2022
Written in
2020/2021

The documents contains the preparation part for the midterm and final exam for CSE578 Data Visualization to help prep you through all the main definitions, formulas and plots needed for the exam and it also contains subjects (most- 80% with answers) given in the midterm and final of Data Visualization in the academic year 2020 and 2021. My personal score for the midterm was 90% and for the final 97.5%.CSE 578 Final. CSE 578 Midterm. cse 578 asu final. cse578 asu midterm.

Show more Read less










Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
January 12, 2022
Number of pages
16
Written in
2020/2021
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

ASU CSE 578 Data Visualization


Midterm Exam Preparation, Topics and Subjects from previous
terms 2020/2021




Final Exam Preparation, Topics and Subjects from previous terms
2020/2021




1

,CSE 578 Data Visualization: Midterm Recap, Topics and Subjects
I. Intro To Data Visualization (Week 1)

1) What is visualization
1. What is visualization?
2. Key purposes of visualization

2) Data Processing vs. Querying vs. Exploration
Def. awareness
Def. understanding
Difference between the following: Data Processing ->, Querying ->, Navigation ->, Exploration ->
Exploratory search steps: Analysis, Comparison, Aggregation, Transformation, Visualization

3) Intro To Data Exploration
Data challenges: Imprecision, Noise, Sparsity (having few data) = INS; 3Vs – Volume, Velocity(speed),
Variety-heterogeneity; HMLE – High-dimensional, multi-modal, inter-linked, evolving
- Data management/mining techniques for supporting scalable, real-time, analysis & exploration
- Most data in the world are imprecise, multi-modal, subjective
- Data exploration systems need to support both: effective data manipulation (filtering, integration-
join, set operations- union & intersections) & effective data analysis/retrieval (feature extraction,
similarity search, clustering (partitioning), aggregation, classification, preference-driven)

II. Intro To Data Exploration Components (Week 2)

1) Data Organization

What is a database? -> collection of data org. in some fashion
What is a data model? -> used to describe how data is organized (hierarchical, relational, OO, spatial,
fuzzy), a formalism to describe constraints that describe properties of data
What is a data schema? -> a set of constraints that describe properties of data, structure of data,
enable validation & efficient storage of data, enable querying & retrieval of data (comparison,
indexing, query optimization, query processing)
Levels of data organization -> structured data/databases, semi-structured data/databases,
unstructured data/databases
Structured data/databases: - the data are well-structured & organized (schema describes this
structure, DBMS enforces this struct.)-advantages: data organization is predictable : easier to query,
optimize, explore
▪ Ex. Relational data models : data is organized in tabular form, schema for each table
consists of attributes, functional dependencies: describe the relationships among the
attributes in the schema (ex. Key uniquely identifies a tuple)
Semi-structured Data : - the constraints that reflect the struct. Of data are flexible, data is self-
describing: each item in the db. describes its own schema
▪ Advantages: data organization is flexible/malleable, easier to integrate & exchange
▪ books from different sources that don’t have same structure, are able to be stored as
semi-struct. Data

2) Vector Data

-images, videos, social networks, books, sensor readings.

3 & 4) Vector spaces

- what are good features to use as basis vectors & how many to use as basis vectors?
- dimensionality curse: high dim. Makes it harder to analyze data, the less efficient & effective search
& analysis becomes

, - efficiency: search data structures are not efficient at high dimensions (trees), effectiveness: the more
dimensions we have the more data we need to discover patterns (prevent overfitting)

Def. linear independence & basis : Vectors in V = {v1, v2, v3, …vn} are linearly independent if they are
non-redundant &
b) The linearly independent set V is said to be a basis for S if for every vector u in S
every vector u can be written as a linear combination of V – property called completeness

5) Norms

- Most used family of length measurements = p-norms
- 1-norm: Manhattan distance, city block distance,L1 distance : |x1-x2| + |y1-y2|
- 2-norm: Euclidean distance, L2 distance,
- - norm , L distance: max(|x1-x2|, |y1-y2|)

6) Vector distance measures

- A metric distance must satisfy self-minimality, minimality, symmetry, triangular inequality
(effective pruning search space during retrieval)
- P-norms are metric
- Intersection similarity:
∑𝑖=1,..𝑑 min (𝑎𝑖 , 𝑏𝑖 )
𝑠𝑖𝑚𝑖𝑛𝑡 (𝑎⃗, 𝑏⃗⃗) =
∑𝑖=1,..𝑑 max (𝑎𝑖 , 𝑏𝑖 )
o 𝑠𝑖𝑚 ~ 1 if a, b similar
o 𝑠𝑖𝑚 ~ 0 if a, b ≠ similar

- Angle-based measures:

⃗⃗
⃗⃗∗ 𝒃
𝒂
o cosine similarity
|𝒂 ⃗⃗⃗⃗
⃗⃗|∗|𝒃|

o ⃗⃗ ∗ ⃗𝒃⃗ = ∑𝒏𝒊=𝟏(𝒂𝒊 ∗ 𝒃𝒊 )
dot product similarity 𝒂

- Cosine and dot product are the same if vectors are unit length
- Other:
Pearson’s correlation (similarity measure) – linear correlation among the corresponding
components of 2 vectors, KL-divergence (distance measure)- how one vector diverges from the
other, earth-movers distance: how one vector diverges from the other

7) Strings & sequences

Edit distance between 2 sequences is the min. no. of edit operations to convert one sequence to
another. (deletions, replacements, insertions)
- Time series matching: Synchronous/ non-elastic distance & similarity measures: Euclidean dist.
- Asynchrony in time series : distance & similarity measures : Edit Distance, Dynamic Time Warping,
Feature-based alignment
- motifs : frequently repeating patterns in time series, can also occur in multi-variate time series

III. Exploratory Querying & Visual Variables used in Data Exploration & Visualization (Week 3)

1) Data Processing vs Querying vs Exploration

• Similarity queries/ranked queries
• Drill-down/Roll-up
• Summarization/aggregation(MAX)
• Aggregate/iceberg queries

Reviews from verified buyers

Showing all reviews
3 year ago

2.0

1 reviews

5
0
4
0
3
0
2
1
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
studynotes888 Arizona State University
View profile
Follow You need to be logged in order to follow users or courses
Sold
37
Member since
4 year
Number of followers
24
Documents
12
Last sold
5 months ago
Study Notes

2.7

6 reviews

5
1
4
0
3
2
2
2
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions