100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Business Intelligence and Data Management - Summary Book

Beoordeling
4,0
(1)
Verkocht
34
Pagina's
57
Geüpload op
22-03-2018
Geschreven in
2017/2018

This is a summary of Chapters 1, 2, 5, 6, 7, 8, 9, 14 and 15 from "Data Mining for Business Analytics" (Schmueli, Bruce & Patel, 3th edition). These are the relevant chapters for the exam of the course Business Intelligence and Data Management, which is compulsory for MSc Information Management and an elective for other MSc programs. The summary is written in English.

Meer zien Lees minder











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Chapters 1, 2, 5, 6, 7, 8, 9, 14 and 15
Geüpload op
22 maart 2018
Aantal pagina's
57
Geschreven in
2017/2018
Type
Samenvatting

Voorbeeld van de inhoud

Chapter 1 Introduction

1.1 What is Business Analytics?

Business analytics (BA) is the practice of bringing quantitative data to bear on decision-making. It
includes a range of data analysis methods.

Business intelligence (BI) refers to data visualization and reporting for understanding the past and the
present. It has evolved into effective tools and practices, such as creating interactive dashboards that
allow the user not only to access real-time data but also to directly interact with it.

Business analytics now typically includes BI as well as sophisticated data analysis methods used for
exploring relationships between measurements, predicting new records, and to forecast future
values.

1.2 What is Data Mining?

Data mining refers to business analytics methods that include statistical and machine-learning
methods that inform decision making, often in automated fashion. Prediction is typically an
important component, often at the individual level.

1.3 Data Mining and Related Terms

Data mining stands at the confluence of the fields of statistics and machine learning (also known as
artificial intelligence). However, in classical statistics computing is difficult and data are scarce,
whereas in data mining applications both data and computing power are plentiful.

Another major difference is the focus inn statistics on inference from a sample to the population. In
contrast, the focus in machine learning is on predicting individual records.

Data mining is vulnerable to the danger of overfitting, where a model is fit so closely to the available
sample of data that it describes not merely structural characteristics of the data but random
peculiarities as well.

We use the term machine learning to refer to algorithms that learn directly from data, especially
local patterns, often in layered or iterative fashion. In contrast, we use statistical models to refer to
methods that apply global structure to the data.

1.4 Big Data

Big data is a relative term that refers to the amount of data by reference to the past, and to the
methods and devices available to deal with them. The challenge big data presents if often
characterized by the four V’s:
- Volume refers to the amount of data.

,- Velocity refers to the speed at which it is being generated and changed.
- Variety refers to the different types of data being generated.
- Veracity refers to the fact that data is being generated by organic distributed processes and not
subject to the controls or quality checks that apply to data collected for a study.

Most large organizations face both the challenge and the opportunity of big data because most
routine data processes now generate data that can be stored and, possibly, analyzed.

1.5 Data Science

Data science is a mix of skills in the areas of statistics, machine learning, math, programming,
business, and IT. However, it is a rare individual who combines deep skills in all the constituent areas.

This book focuses on developing the statistical and machine learning models that will eventually be
plugged into a deployed system.

1.6 Why Are There So Many Different Methods?

The usefulness of a method can depend on factors such a the size of the dataset, the types of
patterns that exist in the data, whether the data meet some underlying assumptions of the method,
how noise the data are, and the particular goal of the analysis.

Different methods can lead to different results, and their performance can vary. It is therefore
customary in data mining to apply several different methods and select the one that appears most
useful for the goal at hand.

,Chapter 2 Overview of the Data Mining Process

2.1 Introduction

This book focuses on predictive analytics, the tasks of classification and prediction as well as pattern
discovery. Not covered are OLAP (online analytical processing) and SQL (structured query language).
They do not involve statistical modeling or automated algorithmic methods.

2.2 Core Ideas in Data Mining

Classification

A common task in data mining is to predict the value of a categorical variable (e.g. the recipient of an
offer can respond or not respond). Similar data where the classification is known are used to develop
rules, which are then applied to the data with the unknown classification.

Prediction

Here, we are trying to predict the value of a numerical variable (e.g. amount of purchase).

Association Rules and Recommendation Systems

Association rules, or affinity analysis, is designed to find general associations patterns between items
in large databases (“what goes with what”). For example, grocery stores can use such information for
bundling products and it can help predict future symptoms for returning patients.

Online recommendation systems (e.g. Netflix) use collaborative filtering, which is a method that
generates “what goes with what” at the individual user level. Recommendation systems aim to
deliver personalized recommendations to users with a wide range of preferences.

Predictive Analytics

Classification, prediction, and association rules and collaborative filtering constitute the analytical
methods employed in predictive analytics.

Data Reduction and Dimension Reduction

The performance of data mining algorithms if often improved when the number of variables is
limited, and when large numbers of records can be grouped into homogeneous groups. The process
of consolidating a large number of records into a smaller set is termed data reduction. Methods for
reducing the number of cases are often called clustering.

Reducing the number of variables is typically called dimension reduction, which improves predictive
power, manageability, and interpretability.

, Data Exploration and Visualization

Exploration is used for data cleaning and manipulation as well as for visual discovery and hypothesis
generation.

Exploration by creating charts and dashboards is called data visualization or visual analytics. For
numerical variables we use histograms and boxplots to learn about the distribution of their value, to
detect outliers, and to find other information that is relevant to the analysis. Similarly, for categorical
variables we use bar charts.

Supervised and Unsupervised Learning

For supervised learning algorithms we must have data available in which the value of the outcome of
interest is known. These training data are the data from which the algorithm “learns” or is “trained”
about the relationship between predictor variables and the outcome variable. The algorithm is then
applied to the validation data where the outcome is known, to see how well it does in comparison to
other models. It is prudent to save a third sample which also includes known outcomes (the test
data) to use with the model finally selected to predict how well it will do. The model can then be
used to classify or predict the outcome of interest in new cases where the outcome is unknown.

Unsupervised learning algorithms are those used where there is no outcome variable to predict or
classify. Association rules, dimension reduction methods, and clustering techniques are all
unsupervised learning methods.

2.3 The Steps in Data Mining

Here is a list of steps to taken in a typical data mining effort:
1. Develop an understanding of the purpose of the data-mining project. The most serious errors in
analytics projects result from a poor understanding of the problem.

2. Obtain the dataset to be used in the analysis. This often involves random sampling from a large
database. It may also involve pulling together data from different databases or sources. The
databases could be internal (e.g. past purchases made by customers) or external (e.g. credit
ratings).

3. Explore, clean, and preprocess the data. This step involves verifying that the data are in
reasonable condition (e.g. missing data, reasonable range of values, outliers, consistency in the
definitions of fields).

4. Reduce the data dimension, if necessary. Dimension reduction can involve operations such as
eliminating unneeded variables, transforming variables, and creating new variables.

5. Determine the data-mining task. This involves translating the general question or problem into a
more specific data-mining question.

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven
5 jaar geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
brendastevense Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
39
Lid sinds
8 jaar
Aantal volgers
38
Documenten
3
Laatst verkocht
2 jaar geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen