EXPLORATION
Predicting medical diagnosis using the principles of Bayes’
Theorem
Lakshmi Sameeksha Dumpala
Grade 11
,Rationale:
I have chosen to center this exploration around the idea of predicting medical diagnosis based on
data from pre-existing records. Given the symptoms of earlier patients who had already been
diagnosed.
Take the case of the flu. Records of previous patients who were diagnosed with or without the
flu were taken and their symptoms were studied and noted. Now, using this data I will calculated
the conditional probability for a patient having the flu given a certain set of symptoms. This can
be done using what is known as the Bayes theorem.
Predictive analytics is the branch of the advanced analytics which is used to make predictions
about unknown future events. In this particular exploration, I wanted to tap into this branch of
study, in particular the use of Bayes theorem. My mother is a doctor and my whole life I have
grown up around hospitals. In many of them I have seen patients come in with all sorts of
symptoms, the diagnosis take a long time to happen, with hundreds of different possibilities and
millions of ways to treat them. Perhaps if this diagnostic process could be sped up by predicting
a few diagnoses, along with the likeliness of each, based on hundreds upon hundreds of pre-
existing records of patients with their symptoms and diagnosis stored, patients can begin
receiving treatment sooner. In my exploration I have restrained the prediction to one of two
outcomes based on different combinations of symptoms.
This exploration can be extended further using more advanced codes and algorithms to predict
more than one outcome along with the likeliness of the patient to be suffering from each. Bayes
theorem finds application in not only the medical field. It’s is the basis of many more, like the
prompts in web browsers, the occurrence of certain traits in genetics. Even in the prediction of
the presence of God. Its not just a mathematical concept, it’s a lifestyle.
What is the Bayes Theorem ?
A theorem describing how the conditional probability of each of a set of possible causes for a
given observed outcome can be computed from knowledge of the probability of each cause and
the conditional probability of the outcome of each cause.
Bayes' theorem is stated mathematically as the following equation:[2]
where A and B are events.
• P(A) and P(B) are the probabilities of A and B without regard to each other, i.e.
prior probabilities.
• P(A | B), a conditional probability, is the probability of observing event A given that B is
true. It is also called the posterior probability.
, • P(B | A) is the probability of observing event B given that A is true, ie the likelihood.
Application in Research Topic
Take the case of the flu diagnosis. To determine if the patient’s diagnosis we must calculate the
probability of the patient having the flu and not having the flu, given the conditions for all the
specified symptoms are satisfied. This is done by calculating the posterior probability of each
scenario. To do so we must first take into account the evidence, that is the probability of each
symptom occurring should the patient have the flu or not.
We can make an assumption that all the symptoms are conditionally independent of each
other. This means that the existence of one condition does not depend upon the existence of
the other. In such a scenario we can simply take the cumulative product of all the conditions.
Meaning:
P( B1,B2,B3,B4 | A) = P(B1| A) * P(B2| A) * P(B3| A) * P(B4|
A) P(A | B1,B2,B3,B4) = P( B1,B2,B3,B4 | A) * P(A)
P(A | B1,B2,B3,B4) = P(B1| A) * P(B2| A) * P(B3| A) * P(B4| A) *
P(A)
Where B1, B2, B3 and B4 are the symptoms occurring. We calculate the probability of each
symptom occurring in all the cases where the patient has been diagnosed with the flu. When we
multiply all of these according to the equation above we get the evidence to calculate the
posterior probability. In the same way we calculate the probability of each symptom occurring if
the patient was not diagnosed with the flu.
From all the records we have we need to calculate the probability of any patient having the flu,
i.e. the ration of the number of patients diagnose with flu to the total number of patients. With
this we have the second element needed to calculate the posterior probability. The same is done
for all patients who do not have the flu.
In this situation we are calculating the probability of a patient having the flu and the patient not
having the flu, and comparing them. The one that is greater is more likely to occur given the set
of symptoms. So in both the cases the probability of the symptoms occurring is one as that is a
given piece of information. Hence the denominators can be omitted.
Finally, to actually calculate the posterior probability of one of the two events occurring given a
completely new set of symptoms we take the product of the pre existing probability of any
diagnosis being that of flu and the probabilities of each individual symptom occurring in the case
of the situation being positive for flu.
This can be expressed through the following algorithm: