Final Exam HAP 671
Please use this document to provide your answers to the questions. Copy
and paste any graphs, results or your code into this document and upload
this document. Format your responses to look good (to the extent possible
same font size and type, no colors unless needed to make a point, etc.). I
grade both the accuracy of response and the clarity of the presentation.
In all questions provide all code which you used to compute your
results.
PART ONE:
USING DATA FROM final.bak, ANSWER THE QUESTIONS:
1. Download the file final.bak and restore the file to your MS SQL
Management Studio. Clean your data. Explain the logic you used for
data cleaning. Which errors did you find while cleaning the data?
I started by removing any rows with an icd9 value of null. This excludes
records that do not include a diagnosis. The following step removes
rows that do not have a date of diagnosis. This deletes records that do
not have a diagnosis date. The following step removes rows with a
death date earlier than the date of diagnosis. That scenario should not
be possible in the real world. These procedures would ensure that the
data is clean.
use final
-- Cleaning data --
--1.Removing nulls
select *from dbo.Admission where icd9 is null
---- transfering the data to another table
select * into #clean1 from dbo.Admission where icd9 is not null
--2.Check nulls in AgeatDX
select * into #clean2 from #clean1 where AgeAtDX is not null
---3. Removing Zombies
select distinct id into #cleaneddata from #clean2 where AgeAtDeath < AgeAtDX
, select * into dbo.datacleanAdmissions from #clean2 where id not in (select * from #cleaneddata)
select top 10 * from dbo.datacleanAdmissions
Please use this document to provide your answers to the questions. Copy
and paste any graphs, results or your code into this document and upload
this document. Format your responses to look good (to the extent possible
same font size and type, no colors unless needed to make a point, etc.). I
grade both the accuracy of response and the clarity of the presentation.
In all questions provide all code which you used to compute your
results.
PART ONE:
USING DATA FROM final.bak, ANSWER THE QUESTIONS:
1. Download the file final.bak and restore the file to your MS SQL
Management Studio. Clean your data. Explain the logic you used for
data cleaning. Which errors did you find while cleaning the data?
I started by removing any rows with an icd9 value of null. This excludes
records that do not include a diagnosis. The following step removes
rows that do not have a date of diagnosis. This deletes records that do
not have a diagnosis date. The following step removes rows with a
death date earlier than the date of diagnosis. That scenario should not
be possible in the real world. These procedures would ensure that the
data is clean.
use final
-- Cleaning data --
--1.Removing nulls
select *from dbo.Admission where icd9 is null
---- transfering the data to another table
select * into #clean1 from dbo.Admission where icd9 is not null
--2.Check nulls in AgeatDX
select * into #clean2 from #clean1 where AgeAtDX is not null
---3. Removing Zombies
select distinct id into #cleaneddata from #clean2 where AgeAtDeath < AgeAtDX
, select * into dbo.datacleanAdmissions from #clean2 where id not in (select * from #cleaneddata)
select top 10 * from dbo.datacleanAdmissions