"the recorded information (regardless of the form or the media in which it may exist) necessary to support or validate a research project’s observations, findings or outputs." - Research Data Management Policy
There are many different definitions of research data. Here are just a few examples to illustrate that there is no universal consensus on a definition, although there are many similarities.
"Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical)." - Concordat on open research data
"... are research materials resulting from primary data collection or generation, or derived from existing sources intended to be analysed in the course of a research project." - ESRC research data policy
"representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship." - Digital Curation Centre
Research data can be primary data, created by a researcher from first-hand sources, or secondary data generated by someone else.
Not all research data is digital. Research data also includes non-digital formats, e.g. handwritten field notes and diaries, medical samples, artefacts etc.
The types of research data you work with will affect the way you manage your data. For example, irreplaceable research data requires good back up procedures to reduce the risk of data loss, combining data points from different sources requires good management practices to prevent data corruption.
In this video researchers from different disciplines talk about research data, what it means to them and the types of data they work with.
Research data can also be grouped into five main types based on the method of collection:
Observational data: captured through observation of a behaviour or activity - in real time and typically irreplaceable. For example: remote sensing data, experience and opinion surveys, archaeological samples, abundance data from monitoring threatened species
Experimental data: captured from lab equipment or generated in controlled environments - often reproducible but this can be expensive. For example: gene sequences, chromatograms, magnetic field data, clinical trial data
Simulation data: generated from test models where model and metadata (about the model, code, computing environment, input conditions) are more important than the output data generated. For example: climate models, economic models
Derived data: resulting from processing or combining existing data points, often from different data sources. For example: compiled databases, aggregated census data, corpora collected by text mining
Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example: gene banks, an archive of historical images, national statistics archives, chemical structures
Software and code can fall under the definition of research data, and yet often it simply supports the research data.
For research data, the deciding factor is whether the software or code is necessary to enable others to verify your findings or reproduce your methodology. If a publication does not include sufficient detail for others to validate the work unambiguously, you should consider archiving and sharing the software or code (with full documentation on any dependencies) as part of the research data.
Guidance, alongside worked examples and good practice recommendations is provided in the EPSRC guide How does software fit into EPSRC's research data policy?
You can find further guidance on the software and computer code page.
supporting records: e.g. applications, proposals, consents, ethical approvals, contracts, licences, data management plans, communications with data suppliers/funders/partners, governing policies and protocols, metadata
published and unpublished outputs: e.g. publications, theses, reports, presentations, dissemination, end products/deliverables (tools, web content, e-resources etc.)
reporting: e.g. external reports (to REF, HESA, funders etc.), audits, documenting research impact (metrics, feedback, reviews, bibliometrics, publicity, impact case studies etc.)
This data relating to research contains recorded information - evidence - about research activity. It is therefore important to manage these research records both during and beyond the life of a project.
The records management web pages, together with the University's Information and Records Management Policy provide guidance. The corporate retention schedule defines how long records should be kept before being deleted/destroyed, reviewed or transferred to the University Archive.