Exam Questions and CORRECT Answers
Data Transformation - CORRECT ANSWER - Data mapping: converting data from one
format to another.
Data deduplication: eliminating repeated or redundant data.
Derived variables: creating new variables from existing ones.
Data sorting or ordering: arranging data in a specific sequence.
Data Transformation occurs in the ______ _____________ phase, the role this applies to is the
_____ A___ - CORRECT ANSWER - Preparation phase, Data Analyst
____________ is great at combining unstructured data feeds from multiple sources. - CORRECT
ANSWER - Hadoop
Examples of when to use _____ : stream processing, fraud detection, and prevention, content
management, risk management. - CORRECT ANSWER - Hadoop
Set up sandbox, extract and transform data, condition data and exploring visually occurs in the
______ ____________ phase - CORRECT ANSWER - Data Preparation
When you convert a Microsoft Word file to a PDF, for example, you are ________data -
CORRECT ANSWER - transforming
Running a virtual machine on Linux operating system on Windows is an example of ---------- -
CORRECT ANSWER - sandboxing
Some key features of an Analytical Sandbox may include tools and features for c---------- and s--
-ing work with colleagues. Flexibility to allow analysts to try out different analytical approaches
and techniques. Clear documentation and support resources to help analysts get up to speed
quickly. - CORRECT ANSWER - collaboration, sharing
,Why is it important to collect data in a certain time frame? - CORRECT ANSWER -
Result: more precise findings than working with an open-ended timeframe.
___ testing works by randomly showing two versions of the same asset (ad, website, pop-up,
offer, etc.) to different users - CORRECT ANSWER - A/B
What does it mean for a dependent variable to be binary?
(this is always applied to logistic regression) - CORRECT ANSWER - A binary variable is
a categorical variable that can only take one of two values, usually represented as a Boolean —
True or False — or an integer variable — 0 or 1, yes or no, sick or not sick, obese or
underweight, etc., depending on the independent variable.
______ Analysis when you're looking to segment or categorize a dataset into groups based on
similarities, but aren't sure what those groups should be. - CORRECT ANSWER - Cluster
Analysis
Preprocessing (of data) - CORRECT ANSWER - the process of transforming raw data into
an understandable format
Bounce Rate - CORRECT ANSWER - the percentage of visitors to a particular website
who navigate away from the site after viewing only one page.
Logistic Regression - CORRECT ANSWER - A statistical analysis which determines an
individual's risk of the outcome as a function of a risk factor. The outcome of interest has two
categories (yes or no, obese or not obese, at risk of cancer or not at risk of cancer, happens or
does not happen, etc.).
K-means clustering - CORRECT ANSWER - Informally, goal is to find groups of points
that are close to each other but far from points in other groups.
• Each cluster is defined entirely and only by its centre, or mean value µk
,Random Forest - CORRECT ANSWER - An algorithm used for regression or
classification that uses a collection of tree data structures trees "vote" on the best model.
Examples of when to use Random Forest - CORRECT ANSWER - In HC: to identify the
correct combination of components in medicine and to analyze a patient's medical history to
identify diseases (for example using symptoms to predict whether a person's symptoms are more
closely tied to malaria or a simple fever, another example can be a cold or a sinus infection).
Centroid Clustering - CORRECT ANSWER - clusters are represented by their centroids.
hierarchical clustering with cluster distance defined by a centroid/assigned center
___ has many applications in diverse fields such as face recognition, computer vision, image
compression, bioinformatics, and fraud detection. - CORRECT ANSWER - PCA
Density Clustering - CORRECT ANSWER - detecting areas where points are concentrated
and where they are separated by areas that are empty or sparse.
Data Wrangling is - CORRECT ANSWER - the process of removing errors and
combining complex data sets to make them more accessible and easier to analyze.
Data Wrangling Examples - CORRECT ANSWER - Merging several data sources into
one data-set for analysis.
Identifying gaps or empty cells in data and either filling or removing them.
Deleting irrelevant or unnecessary data.
Identifying severe outliers in data and either explaining the inconsistencies or deleting them to
facilitate analysis.
Merging several data sources into one data-set for analysis.
Identifying gaps or empty cells in data and either filling or removing them.
Deleting irrelevant or unnecessary data.
, Identifying severe outliers in data and either explaining the inconsistencies or deleting them to
facilitate analysis. - CORRECT ANSWER - Data Wrangling Examples
Maintaining Databases examples/purpose - CORRECT ANSWER - Routines meant to
help performance, free up disk space, check for data errors, check for hardware faults, update
internal statistics, and many other obscure (but important) things.
Maintaining DB2® and Oracle databases involves updating statistics, monitoring database,
server, and space utilization, and planning backup and recovery strategies. - an example of ? -
CORRECT ANSWER - Maintaining databases
Project initiation falls under what role? - CORRECT ANSWER - Project Sponsor
How to find p-value? - CORRECT ANSWER - -look at alternative
-if less than, find area to the left of z value
-if greater than, find area to the right of z value
-if not equal to, if positive, find area to the right and double it. if negative, find area to the left
and double it.
creating tables and establishing relationships between those tables according to rules designed
both to protect the data and to make the database more flexible by eliminating redundancy and
inconsistent dependency. - CORRECT ANSWER - normalization
OpenLayers - CORRECT ANSWER -
MATLAB - CORRECT ANSWER - The Matrix Laboratory language is used for a variety
of mathematical calculations and tasks
Ruby - CORRECT ANSWER -