Summary

Summary Data wrangling and data analysis

Rating

Sold

Pages

102

Uploaded on

18-11-2021

Written in

2021/2022

Applied Data Science Utrecht University (UU): Data handling and preparation, supervised & non-supervised machine learning, using SQL, Python, and R.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Universiteit Utrecht (UU)
Study: Master Applied Data Science
Course: Data Wrangling And Data Analysis (INFOMDWR)

All documents for this subject (3)

Document information

Uploaded on: November 18, 2021
Number of pages: 102
Written in: 2021/2022
Type: Summary

Subjects

sql
python
data extraction
data integration
data visualisation
machine learning
regression
classification
data preperation
missing data
data imputation
time series
text mining
r

Content preview

Silberschatz Et Al. 2019 – Database Systems Concepts
1.1. Database-System Applications
Database-management system (DBMS): collection of interrelated data and a set of
programs to access those data goal of a DBMS is information storage and manipulation

- Back-office: database internal of an organisation
- End-users: interaction between user and database within organisation

Two modes of databases usage:

- Online transaction processing: where large number users use the database, with
each user retrieving relatively small amounts of data, and performing small updates
- Data analytics: the processing of data to draw conclusions, and infer rules or decision
procedures, which are then used to drive business decisions

The field of data mining combines knowledge-discovery techniques invented by artificial
intelligence researchers and statistical analysts with efficient implementation techniques that
enable them to be used on extremely large databases

1.2. Purpose of Database Systems
File-processing system: store permanent records in various files, and it needs different
application programs to extract records from, and add records to, the appropriate files.
Disadvantages organizational information in file-processing system:

- Data redundancy and inconsistency: different programmers / structures /
programming languages or double data per identifier over different groups
o Redundancy leads to higher storage and costs
o Inconsistency leads to disagreement of data
- Difficulty in accessing data: conventional file-processing environments do not
allow needed data to be retrieved in a convenient and efficient manner. More
responsive data-retrieval systems are required for general use
- Data isolation: because data is scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult
- Integrity problems: data values stored in the data base must satisfy certain types of
consistency constraints, because new data and software may be dissimilar
- Atomicity problems: a computer system is subject to failure; data transfer must be
atomic — it must happen in its entirety or not at all
- Concurrent access anomalies: systems must allow multiple users to update data
simultaneously. The system must maintain some form of supervision
- Security problems: not every user of the database system should be able to access
all the data

1

, 1.3. View of Data
The data models can be classified into four different categories:

- Relational model: collection of tables to represent both data and the relationships
among those data (record-based model; matrix / excel sheet)
- Entity-relationship (E-R) model: collection of basic objects, called entities, and
relationships among these objects
- Semi-structured data model: permit the specification of data where individual
data items of the same type may have different sets of attributes (JSON / XML)
- Object-based data model: database systems allow procedures to be stored in the
database system and executed by the database system (Java, C++, or C#)

Database-system users are not computer trained, developers hide the complexity from users
through several levels of data abstraction, to simplify users’ interactions with the system:

- Physical level: lowest level of abstraction describes how the data are stored. The
physical level describes complex low-level data structures in detail
- Logical level: next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures
- View level: highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database

Instance: collection of information stored in the database at a particular moment

Schema: overall design of the database (physical; logical schema; view level subschema)

1.4. Database Languages
Database systems provide a data-definition language (DDL) to specify database schema
and a data-manipulation language (DML) to express database queries and updates (SQL)

Database systems implement only integrity constraints testable with minimal overhead:

- Domain constraints: domain of possible values must be associated with every
attribute (for example, integer types, character types, date/time types)
- Referential integrity: ensure that a value that appears in one relation for a given set
of attributes also appears in a certain set of attributes in another relation
- Authorisation: differentiate among users as far as type of access they are permitted
on various data values in the database; read / insert / update / delete authorisation

Data-definition language: SQL provides a rich DDL that allows one to define tables with
data types and integrity constraints

2

$8.01

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Samme

4.0

(1)

Get to know the seller

Samme Universiteit Utrecht

View profile

Sold

Member since

4 year

Number of followers

Documents

Last sold

3 months ago

4.0

1 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Samme. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.01. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 57429 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Summary Data wrangling and data analysis

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?