SOLUTIONS!!
What are the 3 types of big data? correct answers structured, unstructured, semi-structured
What percent of data is structured? correct answers 5-10%
What percent of data is unstructured? correct answers 80%
What percent of data is semi-structured? correct answers 5-10%
What type of data is email? correct answers unstructured
What is structured data? correct answers Data that can be processed in a fixed format. Highly
organized information that can be readily available
An excel spreadsheet would be which type of data? correct answers Structured, due to organized
rows and columns
What are the 7 steps of the Data Science Lifestyle? correct answers Business Understanding,
Data Mining, Data Cleaning, Data Exploration, Feature Engineering, Predictive Modeling, Data
Visualization
What is Business Understanding? correct answers Asking relevant questions and defining
objectives for the problem that needs to be solved.
What is Data Mining? correct answers "mining the data"
-gathering and scoping the data necessary for the project
What is Data Cleaning? correct answers "cleaning the data"
-fixing the inconsistencies in the data and handling missing values
What is Data Exploration? correct answers "exploring the data"
-forming a hypothesis about your defined problem by visually analyzing the data
What is Feature Engineering? correct answers Selecting important features & constructing more
meaningful ones using the raw data you have.
What is Predictive Modeling? correct answers -"using models to make predictions"
-train machine learning models, evaluate their performance and use them to make predictions.
What is Data Visualization? correct answers "visualizing the data"
-communicate the findings with key shareholders using plots and interactive visualizations.
, What is data science? correct answers The practice of mining large data sets of raw data to
identify patterns and get insight from them.
-raw data is both structured and unstructured
What is AI? correct answers Getting a computer to mimic human behavior in some way.
What is machine learning? correct answers subset of AI
-giving the computer a brain to learn from data
-helps computers figure things out from data and deliver AI applications
What is one of the most common tools data scientists use? correct answers open source
notebooks- web applications for writing & running code and visualizing data all in one place
ex: Jupyter, RStudio, & Zeppelin
Who oversees data science process? correct answers Business managers, IT managers, and Data
Science managers
What is the contribution of the business manager? correct answers develop the problem and
develop a strategy of analysis
What is the contribution of the IT manager? correct answers may help build and update IT
environments for data science teams
-monitor operations and resource usage
What is the contribution of the data science manager? correct answers oversees data science team
team builders who balance development with project planning and monitoring
Who is the most important role in the data science process? correct answers The data scientist.
What is secondary use? correct answers Information collected for one purpose used for another
How is Google's Personalized Search a secondary source? correct answers The information
collected from Google about your search queries and Web pages can be used by
companies/retailers for direct marketing.
What is collaborative filtering? correct answers Analyzing information about preferences of
large numbers of people to predict what one person may prefer.
What are the explicit and implicit methods of collaborative filtering? correct answers Explicit
filtering- asking people to rank preferences,
ex: asking to give stars to a movie
Implicit filtering- keeping track of purchases done by consumers, not asking directly