CITI - Data storage, collection and sharing
Replicability: a researcher’s ability “to duplicate the results of a prior study if the same
procedures are followed but new data are collected. That is, a failure to replicate a scientific
finding is commonly thought to occur when one study documents relations between two or more
variables and a subsequent attempt to implement the same operations fails to yield the same
relations with the new data” (Bollen et al. 2015).
Reproducibility: a researcher’s ability “to duplicate the results of a prior study using the same
materials and procedures as were used by the original investigator. So, in an attempt to
reproduce a published statistical analysis, a second researcher might use the same raw data to
build the same analysis files and implement the same statistical analysis to determine whether
they yield the same results” (Bollen et al. 2015).
Scientific data: “the recorded factual material commonly accepted in the scientific community
as necessary to validate and replicate research findings, regardless of whether the data are used
to support scholarly publications. Scientific data do not include laboratory notebooks,
preliminary analyses, completed case report forms, drafts of scientific papers, plans for future
research, peer reviews, communications with colleagues, or physical objects, such as laboratory
specimens” (NIH 2020).
Data by discipline:
pg. 1
, CITI - Data storage, collection and sharing
Data: never neutral → product of decisions made throughout the processes of collecting,
processing, and analyzing phenomena; how information is visualized; and where and how
information can be accessed = biased
Data management: the process of validating, organizing, protecting, maintaining, and
processing scientific data to ensure the accessibility, reliability, and quality of the scientific data
→ DMP: data management plan
Data lifecycle: “a high level overview of the stages involved in successful management and
preservation of data for use and reuse” (DataONE n.d.). Multiple versions of a data lifecycle
exist, with differences attributable to variations in practices across domains or communities.
pg. 2
, CITI - Data storage, collection and sharing
`
Research planning: include Data Management Plan (DMP), Data sharing plan, free access to
underlying data, Data management and sharing plan (DMSP) how researchers manage and share
data.
DMP: a formal document that outlines what you will do with your data during and after a
research project. Most researchers collect data with some form of plan in mind, but [it is] often
inadequately documented and incompletely thought out. Many data management issues can be
handled easily or avoided entirely by planning ahead. With the right process and framework, it
[does not] take too long and can pay off enormously in the long run”
NIH Data management sharing plan (DMSP)
pg. 3
, CITI - Data storage, collection and sharing
→ includes stakeholders that are available to help write a DMP:
- Sponsored programs
- Research offices
- Library
- Information technology
- Institutional Review Board
Case:
- Project generate original data while using existing data → must ensure existing data valid
and reliable for new use; carefully evaluate methods and procedures used for existing
data
- Project should be checked for legal and methodological issues
pg. 4