CHAPTER 1: DEFINING AND
COLLECTING DATA
Statistics = the collection of methods that allow one to work with data
effectively
Statistics is a tool to obtain information from data
Provides us with a formal basis to:
● summarise + visualise data
● Reach conclusions about the data
● Make reliable predictions about business activities
● Improve the business process
To minimise errors → use the DCOVA framework
● Define the data that you want to study to meet an objective
● Collect the data from appropriate sources
● Organise the data collected by developing tables
● Visualise the data by developing charts
● Analyse the data collected, reach conclusions, and present results
BUSINESS ANALYTICS
Combine statistical methods with management science and information systems →
to form an interdisciplinary tool that supports fact-based decision making
This includes:
● Statistical methods to analyse & explore data that can uncover previously
unknown / unforeseen relationships
● Information systems methods to collect & process datasets of all sizes
○ Including very large datasets that would otherwise be hard to use
effectively
● Management science methods to develop optimisation models that support all
levels of management
○ From strategic management → daily operations
,DATA SCIENCE
Data science = the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful insights
from data.
Data science practitioners use their methods to:
● Use a wide range of tools + techniques for evaluating and preparing data
● Extract insights from data → using predictive analytics and artificial
intelligence
○ Including machine learning and deep learning models
● Write applications that automate data processing calculations
● Tell and illustrate stories that clearly convey the meaning of results to
decision-makers and stakeholders at every level of technical knowledge and
understanding
● Explain how these results can be used to solve business problems
BIG DATA
(E.G. tickets being sold for concerts with 3000 people)
Big data = a collection of data that cannot be easily browsed r analysed using
traditional methods
● Big data = collected in massive volumes, at very fast rates, and in a variety of
forms
● Might refer to large databases of structured data stored in files / worksheets
● Big data might be unstructured such that the data have an irregular pattern
and contain values that are not comprehensible without further interpretation.
○ Unstructured data could be: text, pictures, videos or audio
, CLASSIFYING VARIABLES BY TYPE
CATEGORICAL (qualitative) VARIABLES → take categories as their values
(e.g. “yes” , “no” , or “blue” , “brown” , “green” )
NUMERICAL (quantitative) VARIABLES → Have values that represent a counted /
measured quantity
● DISCRETE VARIABLES arise from a counting process
○ Values = countable over a finite range
○ A number of something
■ (E.G. No. of books sold) AA`
● CONTINUOUS VARIABLES arise from a measuring process
○ Values = uncountable over a finite range
○ Takes any value in an interval
■ (E.G. how long you were waiting in the line for)
COLLECTING DATA
Statistics = the collection of methods that allow one to work with data
effectively
Statistics is a tool to obtain information from data
Provides us with a formal basis to:
● summarise + visualise data
● Reach conclusions about the data
● Make reliable predictions about business activities
● Improve the business process
To minimise errors → use the DCOVA framework
● Define the data that you want to study to meet an objective
● Collect the data from appropriate sources
● Organise the data collected by developing tables
● Visualise the data by developing charts
● Analyse the data collected, reach conclusions, and present results
BUSINESS ANALYTICS
Combine statistical methods with management science and information systems →
to form an interdisciplinary tool that supports fact-based decision making
This includes:
● Statistical methods to analyse & explore data that can uncover previously
unknown / unforeseen relationships
● Information systems methods to collect & process datasets of all sizes
○ Including very large datasets that would otherwise be hard to use
effectively
● Management science methods to develop optimisation models that support all
levels of management
○ From strategic management → daily operations
,DATA SCIENCE
Data science = the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful insights
from data.
Data science practitioners use their methods to:
● Use a wide range of tools + techniques for evaluating and preparing data
● Extract insights from data → using predictive analytics and artificial
intelligence
○ Including machine learning and deep learning models
● Write applications that automate data processing calculations
● Tell and illustrate stories that clearly convey the meaning of results to
decision-makers and stakeholders at every level of technical knowledge and
understanding
● Explain how these results can be used to solve business problems
BIG DATA
(E.G. tickets being sold for concerts with 3000 people)
Big data = a collection of data that cannot be easily browsed r analysed using
traditional methods
● Big data = collected in massive volumes, at very fast rates, and in a variety of
forms
● Might refer to large databases of structured data stored in files / worksheets
● Big data might be unstructured such that the data have an irregular pattern
and contain values that are not comprehensible without further interpretation.
○ Unstructured data could be: text, pictures, videos or audio
, CLASSIFYING VARIABLES BY TYPE
CATEGORICAL (qualitative) VARIABLES → take categories as their values
(e.g. “yes” , “no” , or “blue” , “brown” , “green” )
NUMERICAL (quantitative) VARIABLES → Have values that represent a counted /
measured quantity
● DISCRETE VARIABLES arise from a counting process
○ Values = countable over a finite range
○ A number of something
■ (E.G. No. of books sold) AA`
● CONTINUOUS VARIABLES arise from a measuring process
○ Values = uncountable over a finite range
○ Takes any value in an interval
■ (E.G. how long you were waiting in the line for)