Subject Guides: Research data management: a practical guide: Describing your data

Why you should document your data

Good documentation:

helps you to understand your own data when you need to come back to it.
enables you to find and use your own data quickly and easily.
helps in the sharing of data with others - data repositories ask for documentation and/or metadata about the data for deposit.
provides context to minimise the risk of misunderstanding or misuse.
is essential to the longer term preservation of data as a record of provenance, licensing and access arrangements.
is required to make data FAIR:
- Findable: keywords, DOIs, controlled vocabularies, repository metadata standards, metadata exchange.
- Accessible: metadata is openly available (wherever possible) even if the data isn't.
- Interoperable: standards and controlled vocabularies.
- Reuseable: describing data - What? Why? When? Where? - so that others can understand it.

When

When should you create documentation?

You need to think about the documentation you might need for your data at an early stage of your project and while data collection and analysis is being carried out. It is much easier to record relevant information about your dataset when it is fresh in your memory to ensure that key details are not forgotten.

If you plan to deposit/share your data with a data repository, you should contact the repository in the early stages of your project to discuss their metadata and/or documentation requirements.

What

What should you document?

There are no hard and fast rules. You should aim to provide enough information so that someone working in your research field can understand and reuse the data without having to contact you. At a minimum, your documentation should include information on who created the data, how the data was gathered and used, and for what purpose.

Try to look at your data with a fresh pair of eyes and imagine trying to reuse it in five or 10 years time. Think about the level of information you might need in order to fully understand it.

The level and type of documentation you will need will depend on a number of factors:

Standards within your discipline or subject area. Identify metadata standards for your discipline using:
- Research Data Alliance Metadata standards catalog
- FAIRhsaring.org
The nature of your research and the types of data that you are creating or collecting
The reuse potential of your data. If you envisage your data being widely reused beyond the period of your research you may need a higher level of documentation to enable this.
The retention period for your research data. If your data is to be kept (and remain usable) for a long period it is more likely that as time progresses, more documentation will be required to make sense of it. Knowledge that we may assume today, may not be so obvious in twenty years time.

You understand your research data better than anyone else, so you are best placed to make the final decision about the level of documentation needed.

The UK Data Service Document your data web pages give more detail on documentation and metadata.

Where

Where should you record your documentation?

Documentation can be in any form that is appropriate to your research and the dataset it describes.

Documentation can be recorded:

in a readme file.
Include a readme file - a plain text file with the file name 'readme' to encourage users to read it before looking at the content - with your data.

Cornell University provides a Guide to writing "readme" style metadata. We provide a readme file template, an example of a generic readme template that you can download and use to document your data.
within the data file.
Providing information within the file itself can be a good solution if the information is needed to correctly interpret a file or a database, and some file formats allow the recording of information in addition to the data content.

Examples: comments in code, a separate sheet explaining all variables in an Excel file, Waveform audio .wav can be tagged with metadata and metadata can be embedded in the XMP format.
in the folder where the dataset is stored.
Store all documentation (e.g. electronic lab notebooks, protocols, methodology reports, an .xml file extracted from an instrument, annotations exported from NVivo) in the folder for the dataset. To aid navigation use a logical folder structure and meaningful file names.
in a separate metadata file.
In some disciplines special file formats or data structures have been developed to record supporting information.

For example, the UK Historic Environment Data Standard MIDAS Heritage, a cultural heritage standard for recording information on heritage assets (e.g. buildings, archaeological sites, artefacts).

Whatever form your documentation takes, you need to ensure that it is accessible alongside your data if anyone needs to use it to interpret your results. For example, a readme file - is typically - located at the root of your dataset.

Utrecht University: The ins and outs of metadata and data documentation

A video about metadata and documentation, with advice on integrating metadata into spreadsheets and an overview of the types of metadata.

Metadata (or data about data) is a structured and machine-readable description of data. Metadata often follows international standards that serve specific research disciplines, data types or purposes.

A good example of metadata are the bibliographic records on YorSearch, the University Library catalogue.

Data repositories expose descriptive metadata (title, creator, organisation, abstract, keywords etc.) e.g. the University exposes descriptive metadata about York datasets through the York Research Database.

Data can be described at different levels

Project-level documentation provides high-level information about the aims of the study, what the research questions were, methods of data collection, instruments used, how the data were processed, who collected the data and when, and how the data can be accessed.
File-level documentation provides descriptions of the contents of a folder or dataset including details of data types, file formats used, and relations between files contained in the folder or dataset. A README.txt file is a form of documentation commonly used for this purpose.
Variable-level documentation provides definitions and explanations of variables, values, units of measurement, missing values and any other codes or abbreviations used. This information can be embedded within a data file or documented separately as a data dictionary or codebook or included within a README file.