Exam (elaborations)

DATA SCIEN Predictive Modeling Project.html.

Rating

Sold

Pages

Uploaded on

16-03-2023

Written in

2022/2023

In [205]: from ets import load_boston import pandas as pd import numpy as np import seaborn as sns import t as plt import as sm from _selection import train_test_split from r_model import LinearRegression from er import KMeans from cs import mean_squared_error from ers_influence import variance_inflation_fac tor import math 1.1. Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, Data types, shape, EDA). Perform Univariate and Bivariate Analysis. In [88]: df = _csv("cubic_") () In [89]: () Out[88]: Unnamed: 0 carat cut color clarity depth table x y z price 0 1 0.30 Ideal E SI1 62.1 58.0 4.27 4.29 2.66 499 1 2 0.33 Premium G IF 60.8 58.0 4.42 4.46 2.70 984 2 3 0.90 Very Good E VVS2 62.2 60.0 6.04 6.12 3.78 6289 3 4 0.42 Ideal F VS1 61.6 56.0 4.82 4.80 2.96 1082 4 5 0.31 Ideal F VVS1 60.4 59.0 4.35 4.43 2.65 779 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [90]: In [91]: ibe(include="all").transpose() In [92]: () Out[89]: Unnamed: 0 Unnamed: 0 carat cut color clarity clarity depth table x y z price .11 Premium G SI1 62.3 58.0 6.61 6.52 4.09 5408 .33 Ideal H IF 61.9 55.0 4.44 4.42 2.74 1114 .51 Premium E VS2 61.7 58.0 5.12 5.15 3.17 1656 .27 Very Good F VVS2 61.8 56.0 4.19 4.20 2.60 682 .25 Premium J SI1 62.0 58.0 6.90 6.88 4.27 5166 Out[90]: (26967, 11) Out[91]: count unique top freq mean std min 25% 50% 75% max Unnamed: 0 26967 NaN NaN NaN .85 1 6742..5 26967 carat 26967 NaN NaN NaN 0. 0. 0.2 0.4 0.7 1.05 4.5 cut 26967 5 Ideal 10816 NaN NaN NaN NaN NaN NaN NaN color 26967 7 G 5661 NaN NaN NaN NaN NaN NaN NaN clarity 26967 8 SI1 6571 NaN NaN NaN NaN NaN NaN NaN depth 26270 NaN NaN NaN 61.7451 1.41286 50.8 61 61.8 62.5 73.6 table 26967 NaN NaN NaN 57.4561 2. 59 79 x 26967 NaN NaN NaN 5.72985 1.12852 0 4.71 5.69 6.55 10.23 y 26967 NaN NaN NaN 5.73357 1.16606 0 4.71 5.71 6.54 58.9 z 26967 NaN NaN NaN 3.53806 0. 0 2.9 3.52 4.04 31.8 price 26967 NaN NaN NaN 3939.52 4024. 18818 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [93]: l().sum() In [94]: dups = cated() print('Number of duplicate rows = %d' % (())) <class '.DataFrame'> RangeIndex: 26967 entries, 0 to 26966 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 26967 non-null int64 1 carat 26967 non-null float64 2 cut 26967 non-null object 3 color 26967 non-null object 4 clarity 26967 non-null object 5 depth 26270 non-null float64 6 table 26967 non-null float64 7 x 26967 non-null float64 8 y 26967 non-null float64 9 z 26967 non-null float64 10 price 26967 non-null int64 dtypes: float64(6), int64(2), object(3) memory usage: 2.3+ MB Out[93]: Unnamed: 0 0 carat 0 cut 0 color 0 clarity 0 depth 697 table 0 x 0 y 0 z 0 price 0 dtype: int64 Number of duplicate rows = 0 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [95]: for column in ns: if df[column].dtype == 'object': print((),': ',df[column].nunique()) print(df[column].value_counts().sort_values()) print('n') CUT : 5 Fair 781 Good 2441 Very Good 6030 Premium 6899 Ideal 10816 Name: cut, dtype: int64 COLOR : 7 J 1443 I 2771 D 3344 H 4102 F 4729 E 4917 G 5661 Name: color, dtype: int64 CLARITY : 8 I1 365 IF 894 VVS1 1839 VVS2 2531 VS1 4093 SI2 4575 VS2 6099 SI1 6571 Name: clarity, dtype: int64 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [96]: lot(df['carat'],color='black',rug=True ) In [97]: lot(df['depth'],color='black',rug=True ) Out[96]: <._subplots.AxesSubplot at 0xca0> Out[97]: <._subplots.AxesSubplot at 0x70> Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [98]: lot(df['table'],color='black',rug=True ) In [99]: lot(df['price'],color='black',rug=True ) Out[98]: <._subplots.AxesSubplot at 0x16416b42340> Out[99]: <._subplots.AxesSubplot at 0x50> Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [100]: lot(df['x'],color='black',rug=True ) In [101]: lot(df['y'],color='black',rug=True ) Out[100]: <matplotl

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Great Lakes Maritme Academy
Course: DATA SCIEN

All documents for this subject (1)

Document information

Uploaded on: March 16, 2023
Number of pages: 82
Written in: 2022/2023
Type: Exam (elaborations)
Contains: Unknown

Subjects

data scien
predictive modeling project

$9.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Studygreatsolutions

3.8

(79)

Get to know the seller

Studygreatsolutions Yale University

View profile

Sold

277

Member since

3 year

Number of followers

199

Documents

3501

Last sold

1 week ago

Studygreatsolutions

Hello FELLOW NURSES! I'm here to make nursing school a little bit EASIER. Discover the best Nursing Test Banks, Case studies, Assignments, Reviews, Study Guides & any other study Materials [Show Less]

3.8

79 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Studygreatsolutions. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45736 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

DATA SCIEN Predictive Modeling Project.html.

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?