1.Chapter 1
1.1 Why is Data Analytics Important for accountants
Data Analytics: the process of transforming and evaluating data with the purpose of drawing
conclusions to address business questions. Effective data analytics provides a way to
search through large structured and unstructured data to identify unknown patterns or
relationships.
Big Data refers to datasets which are too large and complex to be analysed traditionally.
The 4 V’s of data analytics:
- Volume refers to size of dataset
- Velocity refers to speed of processing
- Variety refers to different types of data
- Veracity refers to the data quality.
Use for data analytics in business:
- The audit process is evolving toward a more automated one which enhances quality,
enables more services and adds value for clients.
- Data analytics enhances cost analysis thus leading to better decision-making, better
forecasting, budgeting, production, and sales.
- Accountants can make better estimates of collectability, write-downs, etc.
1.2 Introduction IMPACT model
The IMPACT model:
- Identify the questions: in this step we try to understand the business problem(s) at
hand. Some things we may need to consider could be:
- What data do we need to answer the question?
- Who is the audience that will use the results?
- Is the scope of the question too narrow or too broad?
- How will the results be used?
- Master the data: in this step we try to make sure the data aligns with our question(s)
from step 1. We therefore want to make sure we have a clear understanding of the
data, key elements to consider are the following:
- Know what data is available and how they relate to the problem.
- Data available in Internal systems.
- Data available in External networks and data warehouses.
- Data dictionaries.
- ETL - Extraction, transformation, and loading.
- Data validation and completeness.
- Data normalization.
, - Data preparation and scrubbing
- Perform the test plan: in this step we will choose the analytical method we will use on
the data and execute this chosen method.We try to identify a relationship between
the response variable and those items that affect the response. Generally, we’ll make
a model, or a simplified representation of reality, to address this purpose. Provost and
Fawcett identify 8 key methods:
- Address and refine results: identify issues with the analyses, possible issues, and
refine the model. We could ask additional questions, further explore the data or rerun
analyses.
- Communicate insights: here we our results with regard to the type of person we are
communicating with (management vs other data analyst for example). Try to use
clear language and visualisations such as dashboards, static reports and summaries.
- Track outcomes: after all this we will follow up on the results of the analysis. How
frequently should the analysis be performed? Has anything changed? => we will start
the cycle again when necessary.
,Modern accountants must posses certain analytical skills:
- Articulate business problems.
- Communicate with data scientists.
- Draw appropriate conclusions.
- Present results in an accessible manner.
- Develop an analytics mindset.
They must thus develop an analytical mindset with 7 important areas:
- Know when and how data analytics can address business questions.
- Data scrubbing and data preparation
- Data quality
- Descriptive data analysis
- Data analysis through data manipulation
- Statistical data analysis competency
- Data visualization and data reporting
Here is a short overview of the tools we will be using and their main uses:
1.3 IMPACT Example and Conclusion
See powerpoint slides for example of the previously discussed methodology.
2.Chapter 2
2.1 Master The Data: How is data organized and stored?
In this chapter we will analyse step 2 of the IMPACT-cycle in more detail.
One way to start to try and understand data is by looking at how it is organised: data can be
found throughout various systems: internal vs external, flat files vs relational databases
(RDBMS). In most cases, you need to know which tables and attributes contain the relevant
data. Looking at the stored data could already give us an idea of what we are dealing with,
even before an analysis.
One way to visualise and thus understand a database is the Unified Modeling Language
(UML): recall how UML works from the BIS course.
, Relational databases: these databases ensure that data is complete = stores all data. They
make sure data isn’t redundant, to ensure it doesn't take up too much space. The database
will make sure the data follows business rules, internal controls and aids communication and
integration of business processes. In a relational database there are four types of attributes:
- Primary keys: unique identifiers (it is common practice for it to be the first column in
a table, this is however not mandatory) .
- Foreign keys: attributes that point to a primary key in another table.
- Composite keys: a combination of two foreign keys used for line items (a line entry
is an entry in the database). It makes a primary key that consists of two or more
columns to uniquely identify a record in a table. It is used when a single column is not
sufficient to ensure uniqueness.
- Descriptive attributes: everything else.
=> These attributes are defined in a data dictionary, it describes what type of key it is, what
data is required, what data can be stored in it and how much data is stored.
2.2 Master The Data: ETL
ETL: a process to request and create usable data, Extract, Transform and Load. It is a
process involving 5 steps:
- Step 1: Determine the purpose and scope of the data request. This step is the same
as the first step in the IMPACT-cycle.
- Step 2: Obtain the data. We need to know: what data we need, from whom we need
to obtain it, how we will obtain it and which tools we will be using to analyse it (to
ensure it is in a usable format). This could for example be through a request to the
IT-department or an SQL query. We want to ensure we only request/extract the data
that is relevant for our question(s) to avoid bloated files and slow analyses.
- Step 3: Validate the data for completeness and integrity.
- Step 4: Clean the data. We’ll have to pay attention to inconsistencies for example
dates (US vs EU format), written vs “numerical” numbers (“seven” vs 7),
abbreviations, encoded data and human error. Other necessary steps could include
removing headings or subtotals, cleaning leading zeros and non printable characters
and formatting negative numbers.
- Step 5: Load the data for data analysis. Finally, import the data into the tool of choice
and expect the functions to work properly.
However data storage and use could lead to ethical issues:
- How does the company use data, and to what extent are they integrated into firm
strategy
- Does the company send a privacy notice to individuals when their personal data are
collected?
- Does the company assess the risks linked to the specific type of data the company
uses?
- Does the company have safeguards in place to mitigate the risks of data misuse?
- Does the company have the appropriate tools to manage the risks of data misuse?
- Does our company conduct appropriate due diligence when sharing with or acquiring
data from third parties?