Writing documentation to accompany your research data will help you understand your data and how you derived it, it will also help other researchers to interpret and reuse your data.
Data documentation is the contextual and descriptive information required to make data discoverable, understandable, accessible and reusable. Without it future users, including yourself, may not be able to interpret your data.
Examples: a description of methodology, dataset structure, database schema, data dictionaries, user guides, laboratory notebooks, protocols, a codebook (with information about the study, files, and variables), questionnaires, interview schedules, equipment settings, instrument calibration, references to related data/sources or to software/computer code.
You need to think about the documentation you might need for your data at an early stage of your project and while data collection and analysis is being carried out. It is much easier to record relevant information about your dataset when it is fresh in your memory to ensure that key details are not forgotten.
If you plan to deposit/share your data with a data repository, you should contact the repository in the early stages of your project to discuss their metadata and/or documentation requirements.
There are no hard and fast rules. You should aim to provide enough information so that someone working in your research field can understand and reuse the data without having to contact you. At a minimum, your documentation should include information on who created the data, how the data was gathered and used, and for what purpose.
Try to look at your data with a fresh pair of eyes and imagine trying to reuse it in five or 10 years time. Think about the level of information you might need in order to fully understand it.
The level and type of documentation you will need will depend on a number of factors:
You understand your research data better than anyone else, so you are best placed to make the final decision about the level of documentation needed.
The UK Data Service Document your data web pages give more detail on documentation and metadata.
Documentation can be in any form that is appropriate to your research and the dataset it describes.
Documentation can be recorded:
Whatever form your documentation takes, you need to ensure that it is accessible alongside your data if anyone needs to use it to interpret your results. For example, a readme file - is typically - located at the root of your dataset.
A video about metadata and documentation, with advice on integrating metadata into spreadsheets and an overview of the types of metadata.
Metadata (or data about data) is a structured and machine-readable description of data. Metadata often follows international standards that serve specific research disciplines, data types or purposes.
A good example of metadata are the bibliographic records on YorSearch, the University Library catalogue.
Data repositories expose descriptive metadata (title, creator, organisation, abstract, keywords etc.) e.g. the University exposes descriptive metadata about York datasets through the York Research Database.
File-level documentation provides descriptions of the contents of a folder or dataset including details of data types, file formats used, and relations between files contained in the folder or dataset. A README.txt file is a form of documentation commonly used for this purpose.
Variable-level documentation provides definitions and explanations of variables, values, units of measurement, missing values and any other codes or abbreviations used. This information can be embedded within a data file or documented separately as a data dictionary or codebook or included within a README file.