100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Data Science and Society Midterm Summary

Beoordeling
-
Verkocht
1
Pagina's
48
Geüpload op
02-10-2019
Geschreven in
2019/2020

These notes are a summary of all the lectures, literature and assignments needed for the UU Data Science and Society midterm exam.

Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
2 oktober 2019
Aantal pagina's
48
Geschreven in
2019/2020
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Data Science & Society
Notes of the Lectures, Assignments, and Literature

Not representative of all the matter
I left out material I considered obviou s knowledge
Some parts are copied from other sum maries
Skipped some lecture material since it ’s already covered in Literature




Content
Week 1 – Stair Reynolds – IS ......................................................................................................... 2
Lecture 1 – Catching up with SQL .................................................................................................. 2
Assignment 1: Bash Fundamentals ............................................................................................... 3
Lecture 2 – Applied Data Science for Student Empowerment ..................................................... 4
Lecture 3 – The Knowledge Discovery Process for Societal Impact ........................................... 5
Week 2 – Chapman – CRISP-DM 1.0 ............................................................................................. 6
Week 2 – Davenport – Data Scientist: Sexiest Job of 21st Century .............................................. 9
Week 2 – Chang Grade – NIST Big Data Interoperability Framework ........................................ 10
Week 2 – Spruit, Lytras – Applied Data Science.......................................................................... 14
Week 2 – Braschler – ADS............................................................................................................ 16
Assignment 2 – Methods & Statistics in R .................................................................................. 19
Lecture 4 – Hadoop & MapReduce .............................................................................................. 20
Lecture 5 – Methodology, Statistics and Pitfalls ......................................................................... 23
Week 3 – Lazer – Google Flu: Big Data Traps ............................................................................. 25
Week 3 – Broniatowski, Lazer – Twitter: Big Data Opportunities ............................................... 27
Week 4 – Dean, Ghemawat – MapReduce .................................................................................. 28
Week 4 – Chambers, Zaharia – Spark Guide [Chapters 1-3] ...................................................... 31
Week 4 – Ambrose – Big Data in historical perspective ............................................................. 35
Assignment 3 – MapReduce in Hadoop & Spark ........................................................................ 42
Lecture 6 - NoSQL, Spark & Big Data ............................................................................................ 43
Lecture 7 – Statistics 2 ................................................................................................................. 48




1

,Week 1 – Stair Reynolds – IS
Quite basic concepts of Information Systems. Recommend skimming through the pages and
read the bold definitions and meaning.

Lecture 1 – Catching up with SQL
• Some definitions
o Create: Creation of database objects
o Alter: Modify the structure and/or the characteristics of database objects
o Drop: Deletion of database objects
o Truncate: Deletion of data in tables without altering the structure
• The core parts of SQL:
o Data definition language (DDL): Used to define database structures
▪ CREATE TABLE, DROP TABLE
o Data manipulation language (DML): define, update and request data (queries)
▪ INSERT (add row), UPDATE (modify values in existing row/collection of
rows), DELETE (delete a row/collection of rows), SELECT (select rows),
DISTINCT (addition to select, to prevent duplicate rows are shown),
WHERE (addition for criteria records should meet), AND/OR/NOT
(addition for multiple matching criteria), BETWEEN (self-explanatory),
ORDER BY (to sort results), GROUP BY (for subtotals), HAVING (to limit
how much data is shown)
▪ Built-in SQL functions
• COUNT: the number of rows that match the criteria
• MIN: minimal value in certain column
• MAX: maximum value in certain column
• SUM: of values in certain column
• AVG: average of values in certain column
▪ A query retrieves data from one or more tables and creates a new
(temporary) table
▪ Subqueries are queries that are used as input for another query




2

,Assignment 1: Bash Fundamentals

Command Action

cat Displays output of a file

cd Change directory

cd .. Go up one directory

chmod Change permission to read, write and execute (000 = none, 777 = all)

cp Copy file to given directory

echo Returns given value (functionality of the echo commands can vary) (‘-e’ is used
if you use escapes in your string, such as ‘\n’)

grep Filters a given input

mkdir Create a directory

mv Move file to another directory

ls Returns names of files and directories in your current directory (use ‘-l’ for
additional information)

paste Merge two output streams by column (use ‘>’ to save it in a new file)

pwd Return full path of current directory

rm Remove a file (use ‘rm -r’ for an entire directory including all content)

sort Sort an input, ‘-r’ to reverse and ‘-n’ for numeric

| Use output of command before ‘|’ as input for the command after ‘|’

> Get the output of the command and write it to a file

>> Get the output of a command and add it to a file

* Placeholder for every character or sequence

~ Home directory

# Indicated comment (everything on the same line after ‘#’ is ignored)

\n \t Character for new line and a tab

Bash scripts
You can create a bash-script to bundle a number of commands. This is just a text-file with a
.sh extension. You can execute the script by typing the path to the script in the command line.




3

, Lecture 2 – Applied Data Science for Student Empowerment
Applied Data Science is where Analytical Applications are combined with Data Science

Data Science:

- Theoretical
- Algorithms

Applied Data Science:

- Solution-oriented
- Meta-Algorithmic Models

Citizen Data Science:

- Applied
- Automated Software Tools

Self-Service Capability: “To empower non-
data scientist with automated software
tools and meta-algorithmic models to self-
service their own data analyses on their
own data sources in a reliable, usable, and
transparent manner.




4
€6,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
samoyediran4 Universiteit Utrecht
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
32
Lid sinds
7 jaar
Aantal volgers
20
Documenten
3
Laatst verkocht
1 jaar geleden

1,0

2 beoordelingen

5
0
4
0
3
0
2
0
1
2

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen