Subject Guides: Research data management: a practical guide: What is research data

The University defines research data as...

"the recorded information (regardless of the form or the media in which it may exist) necessary to support or validate a research project’s observations, findings or outputs." - Research Data Management Policy

There are many different definitions of research data. Here are just a few examples to illustrate that there is no universal consensus on a definition, although there are many similarities.

"Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical)." - Concordat on open research data

"... are research materials resulting from primary data collection or generation, or derived from existing sources intended to be analysed in the course of a research project." - ESRC research data policy

"representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship." - Digital Curation Centre

Types of research data

Research data can be primary data, created by a researcher from first-hand sources, or secondary data generated by someone else.

Not all research data is digital. Research data also includes non-digital formats, e.g. handwritten field notes and diaries, medical samples, artefacts etc.

Research data can take many forms, it may be:

text documents
spreadsheets
questionnaires, transcripts of interviews
audio and video recordings
photographs, film
collections of digital objects
artwork, an interpretation
bibliographies, annotations, archival notes
primary and secondary sources
stemmas, finding aids, text corpora, critical apparatus
thematic research collections, critical editions
laboratory notebooks, log books
field notebooks, diaries, focus group notes
test and survey responses
slides, artefacts, specimens, samples
models, algorithms,scripts.

The types of research data you work with will affect the way you manage your data. For example, irreplaceable research data requires good back up procedures to reduce the risk of data loss, combining data points from different sources requires good management practices to prevent data corruption.

Odum Institute: What are data

In this video researchers from different disciplines talk about research data, what it means to them and the types of data they work with.

Classification of research data

Research data can also be grouped into five main types based on the method of collection:

Observational data: captured through observation of a behaviour or activity - in real time and typically irreplaceable. For example: remote sensing data, experience and opinion surveys, archaeological samples, abundance data from monitoring threatened species

Experimental data: captured from lab equipment or generated in controlled environments - often reproducible but this can be expensive. For example: gene sequences, chromatograms, magnetic field data, clinical trial data

Simulation data: generated from test models where model and metadata (about the model, code, computing environment, input conditions) are more important than the output data generated. For example: climate models, economic models

Derived data: resulting from processing or combining existing data points, often from different data sources. For example: compiled databases, aggregated census data, corpora collected by text mining

Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example: gene banks, an archive of historical images, national statistics archives, chemical structures

Software and computer code

Software and code can fall under the definition of research data, and yet often it simply supports the research data.

For research data, the deciding factor is whether the software or code is necessary to enable others to verify your findings or reproduce your methodology. If a publication does not include sufficient detail for others to validate the work unambiguously, you should consider archiving and sharing the software or code (with full documentation on any dependencies) as part of the research data.

Guidance, alongside worked examples and good practice recommendations is provided in the EPSRC guide How does software fit into EPSRC's research data policy?

You can find further guidance on the Research software page.

Data relating to research

Research data is not:

supporting records: e.g. applications, proposals, consents, ethical approvals, contracts, licences, data management plans, communications with data suppliers/funders/partners, governing policies and protocols, metadata
published and unpublished outputs: e.g. publications, theses, reports, presentations, dissemination, end products/deliverables (tools, web content, e-resources etc.)
reporting: e.g. external reports (to REF, HESA, funders etc.), audits, documenting research impact (metrics, feedback, reviews, bibliometrics, publicity, impact case studies etc.)

This data relating to research contains recorded information - evidence - about research activity. It is therefore important to manage these research records both during and beyond the life of a project.

The records management web pages, together with the University's Information and Records Management Policy provide guidance. The corporate retention schedule defines how long records should be kept before being deleted/destroyed, reviewed or transferred to the University Archive.

Research data management: a practical guide

What is research data

Research data is the raw materials collected, processed and studied as a direct result of doing research.