Skip to Main Content
University of York Library
Library Subject Guides

Research data management: a practical guide

Citing data

A practical guide to help you manage your research data well, covering best practice for the successful organisation, storage, documentation, archiving and sharing of research data.

Citing

Data citation aids data discovery, enables data reuse, recognises and can reward data creators, and allows the impact of data to be tracked.

There are two elements to data citation. If you are publishing a dataset as part of your research output you will be expected to provide an accessibility statement, often referred to as a data access statement, in your publications. If you are reusing third party data as part of your research you will be expected to cite the dataset.

Data access statements

Data access statements are used in publications to describe where supporting data can be found and under what conditions they can be accessed.

Data access statements are required for publications arising from publicly-funded research. They are a requirement of many funders' data policies, the University's Research Data Management Policy, and are a requirement of the UKRI open access policy which states:

"UKRI requires in-scope research articles to include a Data Access Statement, even where there are no data associated with the article or the data are inaccessible."

A requirement of publishers

Data access statements are also required by many publishers. Some journals now provide a separate section in papers for the data access statement. If no such section exists, you can place your statement in the acknowledgements section.

What to include in your statement

One objective of the data access statement is to aid data discovery. Accordingly, data access statements should include a persistent identifier (e.g. a Digital Object Identifier (DOI)) which links directly to the dataset or to supporting documentation that describes the data in detail.

  • If data is openly available, the name of the data repository should be provided along with any persistent identifier.
     
  • If the data is not openly available, the data access statement should direct users to a permanent record that describes any access constraints or conditions that must be satisfied for access to be granted.
     
  • If there are legal or ethical reasons why data cannot be made available, described them in your statement.
     
  • If you did not collect or create the research data yourself but instead used existing data obtained from another source, this source should be cited.

A simple 'contact the author' instruction to share data is no longer considered sufficient; access to data can become impossible when contact details change.

DOIs

A DOI is a persistent, unique identifier for digital objects such as journal articles or datasets. Using a DOI in a data access statement or data citation enables users to find and cite the data, even if its online location is moved.

How to get a DOI

If you archive your data with a data repository: You should ask the repository for a persistent identifier (such as a DOI) to cite within your published papers.

If you archive your data with Research Data York: A DOI is minted when a PURE dataset record is validated by library staff.


A data access statement should be included in submitted papers, even if a persistent identifier has not yet been issued. The statement should be updated to include any persistent identifiers or accession numbers as they become available, typically when the manuscript is accepted for publication.

Example data access statements

An example data access statement for data that cannot be archived and shared but where a PURE dataset record is created:

Cite the dataset as suggested by the data owner/publisher, or explain where the data can be accessed.

Data citation

two papers with pointing arrows

Citing third party data

The same principles of attribution and credit apply to research data as they do to other research outputs such as journal articles or books.

Good data citation will acknowledge the original author/producer and will provide the information necessary for others to identify and locate the data you have reused.

The format and placement of a data citation may vary by journal or publisher but the core elements to include are:

  • Creator(s) of the Data (Publication Date): Title. Publisher. Persistent Identifier.

Where appropriate you may also wish to include information about the Version and Resource Type.

More information on citing data can be found in the Digital Curation Centre guide how to cite datasets and link to publications. Also useful is the ESRC data citation: what you need to know guidance.