and Answers
Statistical Model - Correct Answers: A statistical model is a class of mathematical model, which
embodies a set of assumptions concerning the generation of some sample data, and similar data from a
larger population. A statistical model represents, often in considerably idealized form, the data-
generating process.
The assumptions embodied by a statistical model describe a set of probability distributions, some of
which are assumed to adequately approximate the distribution from which a particular data set is
sampled. The probability distributions inherent in statistical models are what distinguishes statistical
models from other, non-statistical, mathematical models.
A statistical model is usually specified by mathematical equations that relate one or more random
variables and possibly other non-random variables. As such, "a model is a formal representation of a
theory".
All statistical hypothesis tests and all statistical estimators are derived from statistical models. More
generally, statistical models are part of the foundation of statistical inference.
Data Science - Correct Answers: Data science is an interdisciplinary field about processes and systems to
extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] which
is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and
predictive analytics,[3] similar to Knowledge Discovery in Databases (KDD).
Data science employs techniques and theories drawn from many fields within the broad areas of
mathematics, statistics, operations research,[4] information science, and computer science, including
signal processing, probability models, machine learning, statistical learning, data mining, database, data
engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling,
data warehousing, data compression, computer programming, artificial intelligence, and high
performance computing. Methods that scale to big data are of particular interest in data science,
although the discipline is not generally considered to be restricted to such big data, and big data
technologies are often focused on organizing and preprocessing the data instead of analysis. The
development of machine learning has enhanced the growth and importance of data science.
,Data science affects academic and applied research in many domains, including machine translation,
speech recognition, robotics, search engines, digital economy, but also the biological sciences, medical
informatics, health care, social sciences and the humanities. It heavily influences economics, business
and finance. From the business perspective, data science is an integral part of competitive intelligence, a
newly emerging field that encompasses a number of activities, such as data mining and data analysis.[5]
Data Scientist - Correct Answers: Data scientists use their data and analytical ability to find and interpret
rich data sources; manage large amounts of data despite hardware, software, and bandwidth
constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in
understanding data; build mathematical models using the data; and present and communicate the data
insights/findings. They are often expected to produce answers in days rather than months, work by
exploratory analysis and rapid iteration, and to produce and present results with dashboards (displays of
current values) rather than papers/reports, as statisticians normally do.[6]
Data Vizualization - Correct Answers: Data visualization or data visualisation is viewed by many
disciplines as a modern equivalent of visual communication. It involves the creation and study of the
visual representation of data, meaning "information that has been abstracted in some schematic form,
including attributes or variables for the units of information".[1]
A primary goal of data visualization is to communicate information clearly and efficiently via statistical
graphics, plots and information graphics. Numerical data may be encoded using dots, lines, or bars, to
visually communicate a quantitative message.[2] Effective visualization helps users analyze and reason
about data and evidence. It makes complex data more accessible, understandable and usable. Users
may have particular analytical tasks, such as making comparisons or understanding causality, and the
design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables
are generally used where users will look up a specific measurement, while charts of various types are
used to show patterns or relationships in the data for one or more variables.
Data visualization is both an art and a science. It is viewed as a branch of descriptive statistics by some,
but also as a grounded theory development tool by others. The rate at which data is generated has
increased. Data created by internet activity and an expanding number of sensors in the environment,
such as satellites, are referred to as "Big Data". Processing, analyzing and communicating this data
present a variety of ethical and analytical challenges for data visualization. The field of data science and
practitioners called data scientists have emerged to help address this challenge.[3]
Exploratory Data Analysis - Correct Answers: In statistics, exploratory data analysis (EDA) is an approach
to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical
model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal
modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to
, encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new
data collection and experiments. EDA is different from initial data analysis (IDA),[1] which focuses more
narrowly on checking assumptions required for model fitting and hypothesis testing, and handling
missing values and making transformations of variables as needed. EDA encompasses IDA.
Big Data - Correct Answers: Big data is a term for data sets that are so large or complex that traditional
data processing applications are inadequate. Challenges include analysis, capture, data curation, search,
sharing, storage, transfer, visualization, querying, updating and information privacy. The term often
refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data
analytics methods that extract value from data, and seldom to a particular size of data set.[2] Accuracy
in big data may lead to more confident decision making, and better decisions can result in greater
operational efficiency, cost reduction and reduced risk.
Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime
and so on."[3] Scientists, business executives, practitioners of medicine, advertising and governments
alike regularly meet difficulties with large data sets in areas including Internet search, finance, urban
informatics, and business informatics. Scientists encounter limitations in e-Science work, including
meteorology, genomics,[4] connectomics, complex physics simulations, biology and environmental
research.[5]
Data Mining - Correct Answers: Data mining is an interdisciplinary subfield of computer science.[1][2][3]
It is the computational process of discovering patterns in large data sets involving methods at the
intersection of artificial intelligence, machine learning, statistics, and database systems.[1] The overall
goal of the data mining process is to extract information from a data set and transform it into an
understandable structure for further use.[1] Aside from the raw analysis step, it involves database and
data management aspects, data pre-processing, model and inference considerations, interestingness
metrics, complexity considerations, post-processing of discovered structures, visualization, and online
updating.[1] Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
[4]
The term is a misnomer, because the goal is the extraction of patterns and knowledge from large
amounts of data, not the extraction (mining) of data itself.[5] It also is a buzzword[6] and is frequently
applied to any form of large-scale data or information processing (collection, extraction, warehousing,
analysis, and statistics) as well as any application of computer decision support system, including
artificial intelligence, machine learning, and business intelligence. The book Data mining: Practical
machine learning tools and techniques with Java[7] (which covers mostly machine learning material)
was originally to be named just Practical machine learning, and the term data mining was only added for
marketing reasons.[8] Often the more general terms (large scale) data analysis and analytics - or, when
referring to actual methods, artificial intelligence and machine learning - are more appropriate.