Summary

Summary Health Data Analysis using python

Rating

Sold

Pages

Uploaded on

25-08-2023

Written in

2023/2024

Get a deeper understanding of analysing data using python, this is a follow through project with the source codes.Feel free to message me to understand the concept found in the book before purchasing it.

Institution

Module

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Module: Computer information systems

All documents for this subject (502)

Document information

Uploaded on: August 25, 2023
Number of pages: 12
Written in: 2023/2024
Type: Summary

Subjects

python
analysis
libraries
machinelearning
scikitlearn
coding
data
science

Content preview

Health Insurance Data Analysis
This notebook provides a structured analysis of the health insurance dataset. We'll cover Exploratory Data
Analysis (EDA), Regression Analysis, Classification Analysis.

Introduction:
The health insurance industry plays a pivotal role in the healthcare ecosystem, acting as an intermediary
between healthcare providers and patients. Accurately predicting insurance claims can be of paramount
importance to such companies, allowing them to set premiums appropriately, manage risks, and maintain
profitability. The dataset under scrutiny offers a snapshot of various factors that could influence insurance
claims, including age, BMI, number of children, smoking habits, and regular exercise routines. Through this
analysis, we aim to uncover patterns, relationships, and insights that can guide insurance companies in their
decision-making processes.

In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_squared_log_er
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, accuracy_
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics

Exploratory Data Analysis (EDA)
In this section, we'll explore the dataset's distributions, relationships, and potential outliers.

In [2]: # Load the data
data = pd.read_csv('health_insurance.csv')
data.head()

Out[2]: age sex weight bmi hereditary_diseases no_of_dependents smoker city diabetes regular_ex

0 60 male 64 24.3 NoDisease 1 0 NewYork 0 0

1 49 female 75 22.6 NoDisease 1 0 Boston 1 1

2 32 female 64 17.8 Epilepsy 2 1 Phildelphia 1 1 A

3 61 female 53 36.4 NoDisease 1 1 Pittsburg 1 0

4 19 female 50 20.6 NoDisease 0 0 Buffalo 1 0 H

, In [3]: # Distribution of key variables
fig, ax = plt.subplots(1, 3, figsize=(20, 5))

sns.histplot(data['claim'], ax=ax[0], kde=True)
ax[0].set_title('Distribution of Claim Amounts')

sns.histplot(data['bmi'], ax=ax[1], kde=True)
ax[1].set_title('Distribution of BMI')

sns.histplot(data['age'], ax=ax[2], kde=True)
ax[2].set_title('Distribution of Age')

plt.tight_layout()
plt.show()

# Correlation heatmap
correlation_matrix = data.corr()
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

# Boxplots for key variables
fig, ax = plt.subplots(1, 3, figsize=(20, 5))

sns.boxplot(y=data['claim'], ax=ax[0])
ax[0].set_title('Boxplot of Claim Amounts')

sns.boxplot(y=data['bmi'], ax=ax[1])
ax[1].set_title('Boxplot of BMI')

sns.boxplot(y=data['age'], ax=ax[2])
ax[2].set_title('Boxplot of Age')

plt.tight_layout()
plt.show()

C:\Users\ttgmo\AppData\Local\Temp\ipykernel_9896\3467862294.py:17: FutureWarning: The de
fault value of numeric_only in DataFrame.corr is deprecated. In a future version, it wil
l default to False. Select only valid columns or specify the value of numeric_only to si
lence this warning.
correlation_matrix = data.corr()

$4.19

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

DataCraftsmanML

Also available in package deal

Get to know the seller

DataCraftsmanML EIP

View profile

Sold

Member since

2 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller DataCraftsmanML. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.19. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 57595 documents were sold in the last 30 days Founded in 2010, the go-to place to buy revision notes and other study material for 16 years now

Summary Health Data Analysis using python

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning straight away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?