Answers With Latest Updates
Why is data visualization helpful? ANS 1. amplifies cognition
2. expands working memory
3. reduces search time
4. improves pattern detection
5. controls attention
Describe the difference between data processing and querying ANS In both instances, the user knows what
they want. The difference is that with querying, they can only describe it whereas in data processing they
actually have a way to compute it.
Describe the difference between data exploration and navigation ANS In exploration, the user does not
know what they want but wants to get an idea about the data. In navigation, the user DOES know what they
want but does not know how to describe/locate it.
What do these acronyms describe? INS, 3Vs, HMLE ANS Data challenges
What does INS stand for and mean? ANS INS is a lit of data challenges: Imprecision, Noise, Sparsity
What does 3Vs stand for and mean? ANS 3Vs is a list of data challenges: Volume, Velocity, Variety
What does HMLE stand for and mean? ANS HMLE is a list of data challenges: High-dimensional, Multi-
modal, Inter-Linked, Evolving
What is a data schema? ANS A set of constraints that...
1. describe the "properties" of data
2. describe the structure of data
3. enable validation & efficient storage of data
4. enable querying and retrieval of data
, Advantages of a structured database? ANS 1. Easier to query
2. Easier to optimize
3. Easier to explore
Advantages of semi-structured database? ANS Data organization is flexible/malleable (easier to integrate
and exchange).
Describe the curse of dimensionality ANS The more dimensions we have, the more data we need to
discover patterns (prevent overfitting).
True or False: The distance between two points is equal to the length of the distance vector. ANS True
Give an example of data transformation ANS Gender column originally being a single column with "M" or
"F", and then is transformed into two columns (one for M and one for F) with 0s and 1s to confirm sex.
Which aspects of data should be handled by a scalable data exploratory system? ANS 1. The amount of
data
2. The diversity of the data types
3. The speed of new data generated
Give an example of prefix search ANS Find all strings that start with "tab"
• "table"; "tabular"; "tablet";...
Give an example of subsequence search ANS Find all strings that contain the subsequence "ark"
• "marketing"; "spark"; "quark";...
Give an example of subsequence match ANS - Find the longest matching subsequence between "plasticity"
and "scholastic"
- Find the most frequently repeating 3 character subsequence
• "abcbbbaabbaabcbbbaaabbc"