Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Applied Data Science

Rating
-
Sold
-
Pages
141
Uploaded on
09-10-2021
Written in
2021/2022

What is data science? With the major technological advances of the last two decades, coupled in part with the internet explosion, a new breed of analysist has emerged. The exact role, background, and skill-set, of a data scientist are still in the process of being defined and it is likely that by the time you read this some of what we say will seem archaic. In very general terms, we view a data scientist as an individual who uses current computational techniques to analyze data. Now you might make the observation that there is nothing particularly novel in this, and subse- quenty ask what has forced the definition.1 After all statisticians, physicists, biologisitcs, finance quants, etc have been looking at data since their respec- tive fields emerged. One short answer comes from the fact that the data sphere has changed and, hence, a new set of skills is required to navigate it effectively. The exponential increase in computational power has provided new means to investigate the ever-growing amount of data being collected every second of the day. What this implies is the fact that any modern data analyst will have to make the time investment to learn computational techniques necessary to deal with the volumes and complexity of the data of today. In addition to those of mathematics and statistics, these software skills are domain transferable and so it makes sense to create a job title that is also transferable. We could also point to the “data hype” created in the industry as a culprit for the term data science with the science creating an aura of validity and facilitating LinkedIn headhunting.

Show more Read less
Institution
Course

Content preview

Applied Data Science




Ian Langmore Daniel Krasner

,2

,Contents

I Programming Prerequisites 1

1 Unix 2
1.1 History and Culture . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Standard streams . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Pipes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 In a nutshell . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.2 More nuts and bolts . . . . . . . . . . . . . . . . . . . 10
1.6 End Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Version Control with Git 13
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 What is Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Setting Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Online Materials . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Basic Git Concepts . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Common Git Workflows . . . . . . . . . . . . . . . . . . . . . 15
2.6.1 Linear Move from Working to Remote . . . . . . . . . 16
2.6.2 Discarding changes in your working copy . . . . . . . 17
2.6.3 Erasing changes . . . . . . . . . . . . . . . . . . . . . 17
2.6.4 Remotes . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6.5 Merge conflicts . . . . . . . . . . . . . . . . . . . . . . 18

3 Building a Data Cleaning Pipeline with Python 19
3.1 Simple Shell Scripts . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Template for a Python CLI Utility . . . . . . . . . . . . . . . 21

i

, ii CONTENTS

II The Classic Regression Models 23

4 Notation 24
4.1 Notation for Structured Data . . . . . . . . . . . . . . . . . . 24

5 Linear Regression 26
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Coefficient Estimation: Bayesian Formulation . . . . . . . . . 29
5.2.1 Generic setup . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.2 Ideal Gaussian World . . . . . . . . . . . . . . . . . . 30
5.3 Coefficient Estimation: Optimization Formulation . . . . . . 33
5.3.1 The least squares problem and the singular value de-
composition . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.2 Overfitting examples . . . . . . . . . . . . . . . . . . . 39
5.3.3 L2 regularization . . . . . . . . . . . . . . . . . . . . . 43
5.3.4 Choosing the regularization parameter . . . . . . . . . 44
5.3.5 Numerical techniques . . . . . . . . . . . . . . . . . . 46
5.4 Variable Scaling and Transformations . . . . . . . . . . . . . 47
5.4.1 Simple variable scaling . . . . . . . . . . . . . . . . . . 48
5.4.2 Linear transformations of variables . . . . . . . . . . . 51
5.4.3 Nonlinear transformations and segmentation . . . . . 52
5.5 Error Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6 End Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 Logistic Regression 55
6.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1.1 Presenter’s viewpoint . . . . . . . . . . . . . . . . . . 55
6.1.2 Classical viewpoint . . . . . . . . . . . . . . . . . . . . 56
6.1.3 Data generating viewpoint . . . . . . . . . . . . . . . . 57
6.2 Determining the regression coefficient w . . . . . . . . . . . . 58
6.3 Multinomial logistic regression . . . . . . . . . . . . . . . . . 61
6.4 Logistic regression for classification . . . . . . . . . . . . . . . 62
6.5 L1 regularization . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.6 Numerical solution . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6.1 Gradient descent . . . . . . . . . . . . . . . . . . . . . 67
6.6.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . 68
6.6.3 Solving the L1 regularized problem . . . . . . . . . . . 70
6.6.4 Common numerical issues . . . . . . . . . . . . . . . . 70
6.7 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.8 End Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Connected book

Written for

Institution
Course

Document information

Uploaded on
October 9, 2021
Number of pages
141
Written in
2021/2022
Type
Class notes
Professor(s)
Daniel krasner
Contains
All classes

Subjects

$7.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
aounabbas

Get to know the seller

Seller avatar
aounabbas Exam
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
4 year
Number of followers
0
Documents
3
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions