100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DATA SCIEN Predictive Modeling Project.html.

Rating
-
Sold
-
Pages
82
Uploaded on
16-03-2023
Written in
2022/2023

In [205]: from ets import load_boston import pandas as pd import numpy as np import seaborn as sns import t as plt import as sm from _selection import train_test_split from r_model import LinearRegression from er import KMeans from cs import mean_squared_error from ers_influence import variance_inflation_fac tor import math 1.1. Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, Data types, shape, EDA). Perform Univariate and Bivariate Analysis. In [88]: df = _csv("cubic_") () In [89]: () Out[88]: Unnamed: 0 carat cut color clarity depth table x y z price 0 1 0.30 Ideal E SI1 62.1 58.0 4.27 4.29 2.66 499 1 2 0.33 Premium G IF 60.8 58.0 4.42 4.46 2.70 984 2 3 0.90 Very Good E VVS2 62.2 60.0 6.04 6.12 3.78 6289 3 4 0.42 Ideal F VS1 61.6 56.0 4.82 4.80 2.96 1082 4 5 0.31 Ideal F VVS1 60.4 59.0 4.35 4.43 2.65 779 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [90]: In [91]: ibe(include="all").transpose() In [92]: () Out[89]: Unnamed: 0 Unnamed: 0 carat cut color clarity clarity depth table x y z price .11 Premium G SI1 62.3 58.0 6.61 6.52 4.09 5408 .33 Ideal H IF 61.9 55.0 4.44 4.42 2.74 1114 .51 Premium E VS2 61.7 58.0 5.12 5.15 3.17 1656 .27 Very Good F VVS2 61.8 56.0 4.19 4.20 2.60 682 .25 Premium J SI1 62.0 58.0 6.90 6.88 4.27 5166 Out[90]: (26967, 11) Out[91]: count unique top freq mean std min 25% 50% 75% max Unnamed: 0 26967 NaN NaN NaN .85 1 6742..5 26967 carat 26967 NaN NaN NaN 0. 0. 0.2 0.4 0.7 1.05 4.5 cut 26967 5 Ideal 10816 NaN NaN NaN NaN NaN NaN NaN color 26967 7 G 5661 NaN NaN NaN NaN NaN NaN NaN clarity 26967 8 SI1 6571 NaN NaN NaN NaN NaN NaN NaN depth 26270 NaN NaN NaN 61.7451 1.41286 50.8 61 61.8 62.5 73.6 table 26967 NaN NaN NaN 57.4561 2. 59 79 x 26967 NaN NaN NaN 5.72985 1.12852 0 4.71 5.69 6.55 10.23 y 26967 NaN NaN NaN 5.73357 1.16606 0 4.71 5.71 6.54 58.9 z 26967 NaN NaN NaN 3.53806 0. 0 2.9 3.52 4.04 31.8 price 26967 NaN NaN NaN 3939.52 4024. 18818 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [93]: l().sum() In [94]: dups = cated() print('Number of duplicate rows = %d' % (())) <class '.DataFrame'> RangeIndex: 26967 entries, 0 to 26966 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 26967 non-null int64 1 carat 26967 non-null float64 2 cut 26967 non-null object 3 color 26967 non-null object 4 clarity 26967 non-null object 5 depth 26270 non-null float64 6 table 26967 non-null float64 7 x 26967 non-null float64 8 y 26967 non-null float64 9 z 26967 non-null float64 10 price 26967 non-null int64 dtypes: float64(6), int64(2), object(3) memory usage: 2.3+ MB Out[93]: Unnamed: 0 0 carat 0 cut 0 color 0 clarity 0 depth 697 table 0 x 0 y 0 z 0 price 0 dtype: int64 Number of duplicate rows = 0 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [95]: for column in ns: if df[column].dtype == 'object': print((),': ',df[column].nunique()) print(df[column].value_counts().sort_values()) print('n') CUT : 5 Fair 781 Good 2441 Very Good 6030 Premium 6899 Ideal 10816 Name: cut, dtype: int64 COLOR : 7 J 1443 I 2771 D 3344 H 4102 F 4729 E 4917 G 5661 Name: color, dtype: int64 CLARITY : 8 I1 365 IF 894 VVS1 1839 VVS2 2531 VS1 4093 SI2 4575 VS2 6099 SI1 6571 Name: clarity, dtype: int64 Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [96]: lot(df['carat'],color='black',rug=True ) In [97]: lot(df['depth'],color='black',rug=True ) Out[96]: <._subplots.AxesSubplot at 0xca0> Out[97]: <._subplots.AxesSubplot at 0x70> Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [98]: lot(df['table'],color='black',rug=True ) In [99]: lot(df['price'],color='black',rug=True ) Out[98]: <._subplots.AxesSubplot at 0x16416b42340> Out[99]: <._subplots.AxesSubplot at 0x50> Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD In [100]: lot(df['x'],color='black',rug=True ) In [101]: lot(df['y'],color='black',rug=True ) Out[100]: <matplotl

Show more Read less











Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
March 16, 2023
Number of pages
82
Written in
2022/2023
Type
Exam (elaborations)
Contains
Unknown

Subjects

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Studygreatsolutions Yale University
View profile
Follow You need to be logged in order to follow users or courses
Sold
277
Member since
3 year
Number of followers
199
Documents
3501
Last sold
1 week ago
Studygreatsolutions

Hello FELLOW NURSES! I'm here to make nursing school a little bit EASIER. Discover the best Nursing Test Banks, Case studies, Assignments, Reviews, Study Guides & any other study Materials [Show Less]

3.8

79 reviews

5
37
4
15
3
10
2
9
1
8

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions