100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Resumen

Summary Methodology for Premasters DSS, 800884-B-6

Puntuación
-
Vendido
-
Páginas
46
Subido en
21-06-2022
Escrito en
2021/2022

Detailed summary of all lectures and additional notes, explanations and examplesof the coure "Methodology for Premasters DSS" at Tilburg University which is part of the Pre-Master Data Science and Society. The course was given by B. Nicenboim and G. Saygili in the first semester of the academic year 2021/2022 (August to December 2021).

Mostrar más Leer menos
Institución
Grado












Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
21 de junio de 2022
Número de páginas
46
Escrito en
2021/2022
Tipo
Resumen

Temas

Vista previa del contenido

Tilburg University
Study Program: Pre-Master Data Science and Society
Academic Year 2021/2022, Semester 1 (August to December 2021)


Course: Methodology for Premasters DSS, 800884-B-6
Lecturers: B. Nicenboim and G. Saygili

,2. The data science process


Data Science Introduction
• Data Science is the art of turning data into actions
• the terminology exists since 1960s
• the term data science as the current understanding was introduced in 1990s in
statistics and data mining communities
• It was first named as an independent discipline in 2001
• data consists of
o 1) structured data → traditional relational database management system
o 2) unstructured data
o 3) semi structured data


Data Science in Organizations
• organizations use data to maintain competitiveness
o increase of data usability by 10 % leads to
→ 17 - 49 % increase in their productivity
→ 11 - 42 % return on assets
→ 5 - 6 % performance improvement via data driven decision making
• Big Data Opportunities
o Top 3: Increasing Operational efficiency (51%), Informing Strategic Direction,
Better Customer Service (27%)
• Four key activities of data science in organizations
o acquire: obtaining needed data
o prepare: preprocessing operations
on the data
o analyze: analyzing and interpret results
o act: taking actions based on results


Data Mining Process
• Data Mining is the process of extracting previously unknown and potentially useful
information from the data using mathematical, statistical and machine learning
methods
• CRISP-DM: Cross-Industry Standard Process for Data Mining (late 1990s)
o guideline for a structured approach to execute a data mining process
o six phases (the whole process can restart several times)

, o updated versions of CRISP-DM based on new demands
▪ IBM: ASUM-DM (Analytics
Solutions Unified Method
for Data Mining)
▪ SAS: SEMMA (Sampling,
Exploring, Modifying,
Modeling, Assessing)
▪ Microsoft: TDSP (Team
Data Science Process)
• Phases of CRISP-DM
o 1) Business Understanding
▪ Understanding the
business goal
▪ Situation assessment
▪ Translating the business goal to a data mining objective
▪ Development of a project plan
o 2) Data Understanding
▪ Considering data requirements
▪ Initial data collection, exploration, and quality assessment
o 3) Data Preparation
▪ Selection of required data
close
▪ Data cleaning
dependency
▪ Data transformation and enrichment
o 4) Modeling
▪ Selection of the appropriate modeling technique
▪ Training and test set creation for evaluation
▪ Development and examination of alternative modeling algorithms
▪ Fine tuning the model parameters
o 5) Model Evaluation
▪ Evaluation of the model in the context of the business success criteria
▪ Model approval
o 6) Deployment
▪ Reporting of the findings
▪ Planning and development of deployment procedure
▪ Deployment of the model
▪ Development of a maintenance or update plan
▪ Review of the project and planning the next steps

,• Team Data Science Process (TDSP)
o TDSP is Microsoft’s new version of CRISP-DM
o Key components:
▪ A data science lifecycle definition
▪ A standardized project structure
▪ Infrastructure and resources recommended for data science projects
▪ Tools and utilities recommended for project execution.




o TDSP Infrastructure and Resources for Data Science Projects: TDSP provides
recommendations for managing shared analytics and storage infrastructure
such as:
▪ Cloud file systems for storing datasets
▪ Databases
▪ Big Data (SQL or Spark) clusters
▪ Machine learning service
o TDSP provides recommendations for R and Python
• Data Science Trajectory / Process
o Raw Data → Process Data → Clean Data (suppress noise, add missing data)
o from exploratory data analysis you can go back to get more data or proceed
o data product can be used in real world

, • A Data Scientist’s Role in This Process
o Initial step: What data is needed?
o Data can come from different fields
o raw data can be from different type of data
o process and clean data to suppress noise and discard outliers
o data scientists formulate a hypothesis and research question which should be
answered within a study → we need to know where we are going to




Big Data
• big data combines different sources of data (e.g., social media, transactions,
enterprise data…)
• Big data can be defined as a collection of diverse and large amounts of data that is
hard to process with conventional data processing platforms.
→ Doug Laney’s explanation: big data = 3 V’s (volume, velocity, variety) plus value
→ big data = high volume, high velocity, and high variety of information
o “Data are becoming the new raw material of business: Economic input is
almost equivalent to capital and labor” (Economist, 2010)
o “Information will be the “21th Century Oil”” (Gartner company, 2019)
$8.64
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada


Documento también disponible en un lote

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
hannahgruber Tilburg University
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
102
Miembro desde
3 año
Número de seguidores
63
Documentos
9
Última venta
2 semanas hace

4.3

8 reseñas

5
5
4
1
3
1
2
1
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes