Applied Math Essay 2
The common index for strength of evidence, or p-value, which represents the chance that
results could have been a product of random probability, was considered a reliable way to ensure
the significance of results. However, recent research says otherwise, and it is now no longer
taught in schools. Results are determined significant if the p-value is less than 0.05 as the
probability for these results to be products of chance is less than 5%. One reason that the p-value
is no longer considered reputable is p-hacking. This is when an experiment is manipulated so that
a certain result can be produced. A study is done so that certain hypotheses or theories may be
validated by experimental research. When a study is manipulated (e.g., using p-hacking), an
experimenter may yield successful but faulty information that is determined as true may have
very negative impacts.
Four research degrees of freedom were tested in the SN&S computer simulations to see
how researcher degrees of freedom impact the chance of a false-positive test as well as type I and
II errors. P-values were recorded after running four situations through the simulation and then
again when run through a combination of simulations. When the situations were run in
combination through the simulation, the false-positive rate increase to 61% (these estimates are
said to be conservative as many other common degrees of freedom were not considered).
Through the use of p-hacking, these false positives can be decreased by simply recording results
that are significant enough for the experimenter’s desires. This can be done by choosing certain
variables that do not cause an increase in p-value or varying the degrees of freedom used. This
problem lies within how results are determined to be significant, not the p-value index; this is the
issue that must be tackled.
The fifth solution has the most potential for reducing p-hacking. The article previously
stated that it was common for researchers to report only data that “worked” in their experiments.
This form of p-hacking is the hardest to prohibit as researchers themselves are the only people
who handle the data before it is finalized and not people who read the final report of their
findings. Telling researchers to include all data will not fix this problem as it would still be easy
to exclude key recordings. By using method five, stats that are included in eliminated
observations will indicate if there has been a discrepancy in “reliable results” that affect the
research questions answered by experimental data. There is a plethora of work to be done
regarding the prevention of p-hacking and the use of unreliable data.
The common index for strength of evidence, or p-value, which represents the chance that
results could have been a product of random probability, was considered a reliable way to ensure
the significance of results. However, recent research says otherwise, and it is now no longer
taught in schools. Results are determined significant if the p-value is less than 0.05 as the
probability for these results to be products of chance is less than 5%. One reason that the p-value
is no longer considered reputable is p-hacking. This is when an experiment is manipulated so that
a certain result can be produced. A study is done so that certain hypotheses or theories may be
validated by experimental research. When a study is manipulated (e.g., using p-hacking), an
experimenter may yield successful but faulty information that is determined as true may have
very negative impacts.
Four research degrees of freedom were tested in the SN&S computer simulations to see
how researcher degrees of freedom impact the chance of a false-positive test as well as type I and
II errors. P-values were recorded after running four situations through the simulation and then
again when run through a combination of simulations. When the situations were run in
combination through the simulation, the false-positive rate increase to 61% (these estimates are
said to be conservative as many other common degrees of freedom were not considered).
Through the use of p-hacking, these false positives can be decreased by simply recording results
that are significant enough for the experimenter’s desires. This can be done by choosing certain
variables that do not cause an increase in p-value or varying the degrees of freedom used. This
problem lies within how results are determined to be significant, not the p-value index; this is the
issue that must be tackled.
The fifth solution has the most potential for reducing p-hacking. The article previously
stated that it was common for researchers to report only data that “worked” in their experiments.
This form of p-hacking is the hardest to prohibit as researchers themselves are the only people
who handle the data before it is finalized and not people who read the final report of their
findings. Telling researchers to include all data will not fix this problem as it would still be easy
to exclude key recordings. By using method five, stats that are included in eliminated
observations will indicate if there has been a discrepancy in “reliable results” that affect the
research questions answered by experimental data. There is a plethora of work to be done
regarding the prevention of p-hacking and the use of unreliable data.