Skip to Main Content
University of York Library
Library Subject Guides

Copyright: a Practical Guide

Copyright and Generative AI

Copyright and Generative AI

Guidance to provide students and staff with a better understanding of copyright implications when using generative AI tools in their work

Conversation iconFurther support

  • The home page for this Practical Guide provides contact information for further support from teams across the Library and Archives
  • General guidance on generative AI including training sessions from the Library can be found in Generative AI: a Practical Guide
  • IT Services provide broader guidance on which generative AI tools are recommended at the University, and policies relating to their use

At a glance

Lower risk

Higher risk

✔️ Uploading a relevant proportion of copyrighted material into one of the University’s recommended generative AI tools, with assurances that your activity is covered by a legal exception (e.g. for non-commercial research, private study, teaching or computational analysis/text and data mining) and with full acknowledgement provided;

❌ Uploading copyrighted material to a generative AI tool which does not provide assurances regarding data privacy and use of prompts and uploads for training purposes;

✔️ Using openly-licensed material (e.g. under a Creative Commons licence), as long as you follow the specific requirements of the licence such as providing full acknowledgement;

❌ Using material which may be sensitive, confidential, personal or subject to specific licensing agreements;

✔️ Using material with specific assurances/permissions obtained from the rights holder, or material which is not subject to copyright protection (public domain).

❌ Using material which has not been obtained legally.

Introduction

Brain iconGenerative AI tools analyse text, images, code and other material for machine-learning purposes, retaining information and creating new content in response to prompts provided by users.

Material may have become part of the AI model’s training dataset as a result of web crawling, licensing agreements with content providers such as publishers, or user-upload files and interactions (prompts). This raises new and complex considerations in terms of how the tools we use intersect with the rights of authors and creators, and how AI can be used safely and ethically in the context of teaching, learning and research activities. Considerations will depend on the tool(s) and material used, and the specific activity and context in each case. 

The University’s recommended AI tools are Google Gemini, which can be used to generate, analyse and summarise text as well as help you find information, and NotebookLM, an AI powered assistant that helps with research and note-taking. Note that these tools function differently in terms of how they interact with copyright material; Gemini works with a range of sources in its training dataset which may or may not be cited, whereas NotebookLM is designed for users to provide uploaded or linked sources which it then works with.

There are still many contested and unresolved challenges and questions, so current University guidance advises caution when it comes to using copyright material in the context of generative AI tools. This is an area of rapid change, and it’s beneficial for users to stay informed about developments around how these tools work and emerging issues in relation to copyright. We will endeavour to keep this guidance up to date, but it is not comprehensive in scope and should not be interpreted as legal advice. 

There are also important, related concerns around using AI tools with data which is sensitive, confidential, personal or subject to specific licensing agreements. These considerations are not addressed specifically in this Practical Guide, but are covered by the University guidance linked below. 

University guidance

Free list done check vectorIt is important to be aware of the following University guidance, all of which emphasises the risks involved when using generative AI tools and copyright material.

You should read the guidance relevant to your own work and context for examples of acceptable and unacceptable uses of generative AI:

The University has also signed up to the Russell Group principles on generative AI in education (Feb 2024), which outlines our role in supporting students and staff to become AI-literate. This includes understanding the opportunities, limitations and ethical issues around plagiarism and copyright infringement. 

Key issues

Jisc logoJisc has issued guidance on Generative AI and copyright law and practice in education (Mar 2024) which raises some 'challenging questions' as a starting point for navigating copyright issues:

▷ If I use generative AI tools to generate content, am I at risk of infringing the rights of someone whose work was used to train the model?

  • Mainstream generative AI tools including Google Gemini and NotebookLM claim to be designed to avoid reproducing copyrighted content exactly, or at length (See How Gemini for Google Cloud works). However, copyright material technically does not need to be reproduced verbatim in order for an infringement to arise, which raises the possibility of unintentional or unknowing infringement by its users.
  • Another unresolved issue is the lack of transparency around the sources used to train these models, and whether this has been done with the appropriate authorisation or knowledge of rights holders. There are various ongoing and high-profile court cases in this area between authors, publishers and developers.
  • Attribution remains a key issue with AI tools. Authors have a moral right to be recognised as the creator of their original work and Gemini and NotebookLM will strive to provide citations for any quotations they reproduce, but it is arguably impossible for them to fully attribute all the information sources they have been trained on. The related issue of generative AI tools ‘hallucinating’, fabricating or misrepresenting citations is well-known, which is why users are advised to double-check their responses and original sources.

▷ Will I as a user of the tool own the work that is created by my prompts? Or do authors whose works are used to train AI systems have an ownership claim? Or is the AI tool the author?

  • Google does not claim ownership over original content generated by their AI tools (see Google Terms of Service), but the copyright status of such works is disputed.
  • Internationally, the consensus seems to be that AI-generated works belong to the public domain and are therefore free of authorship, ownership and copyright protection. However, under UK law computer generated works without a human author may be subject to copyright for a period of 50 years from the point of their creation (CDPA S.178). The owner in this case is “the person by whom the arrangements necessary for the creation of the work are undertaken”, but it’s unclear who this should be in the context of AI-generated outputs (the user, the developer of the tool, the creators of the training material, or all of the above?).
  • Referencing styles: a Practical Guide has a section on acknowledging and referencing generative AI which is related to these ideas around ownership of generated material.

▷ Does the material that I input to a generative AI tool become part of the Generative AI tool database for others to use?

  • As the University is a Google Workspace for Education institution, Gemini and NotebookLM do not use your uploads and prompts to train their models if you access their tools using your York login credentials.
  • Unsupported tools, especially those which are free to use, may not carry these same assurances regarding user data or may require you to opt out of training. Unless you have been assured otherwise, you should assume that the prompts and uploads you provide to a generative AI tool and the responses they offer will be retained as part of the training database and shared, directly or indirectly, with developers and other users.
  • Related to this question, a number of large publishers have engaged in licensing their scholarly content directly to AI developers for use as training data, or have asked their authors to sign contractual addendums to opt in to potential future licensing agreements.

▷ If I input someone else’s content into a generative AI tool and content is generated could this be an infringement of their copyright?

  • The current consensus (in the UK at least) seems to be that the process of uploading content into a generative AI tool for training purposes is considered a ‘restricted act’ under copyright law (CDPA S.16).
  • For this reason it is normally expected that rights holder permission is needed before content can be inputted, and some publishers and content providers are including restrictive AI clauses in their licences. However, the data privacy terms that come with the University’s supported tools offer some assurances when assessing the risk of infringement.
  • Legal exceptions and open licences should also be taken into consideration when assessing the risk of infringement, as discussed below.

Legal considerations

Scales iconThere is currently no legal framework in the UK around copyright and generative AI specifically. However, there are broad legal exceptions which may be applied when using copyright material in the context of private study, non-commercial research or as part of an assessment or teaching activity.

These exceptions are explored in further detail elsewhere in this Practical Guide (see Copyright law explained) and depend on the following considerations:

  • You must only copy material which is relevant to your research, study or teaching activity (consider using a short quotation or extract rather than a full article, for example);
  • The material must not be used to train a publicly-accessible AI tool or transferred to any other users, including contacts at other institutions or the general public;
  • You must have lawful access to the material used (e.g. through a Library subscription), and must not circumvent any technical protection measures or restrictions put in place by a rights holder or content provider (this is specifically in relation to text and data mining processes, which may require users to obtain a licence or access to an API or platform from the provider);
  • In the case of research, this must be for a non-commercial purpose (be mindful of activities which take place in collaboration with commercial partners, or with commercialisation in mind);
  • Sufficient acknowledgement must be provided, where possible and necessary.

If these assurances cannot be met, then you should avoid uploading copyright material unless you have specific permission to do so from the relevant rights holder. 

Note that UK legislation is more restrictive in terms of ‘fair dealing’ exceptions than countries such as the US, where concepts such as ‘transformative use’ may be applied to a broader set of purposes. This is significant in terms of the global scale of generative AI development, which is the basis upon which regulatory changes have been proposed to encourage innovation whilst protecting the interests of rights holders (see Gov.uk: Copyright and Artificial Intelligence consultation (Dec 2024)). These proposals have proved controversial, however, with concerns summarised by the likes of CREATe (the Centre for Regulation of the Creative Economy) in their consultation response working paper (Feb 2025). 

Any eventual changes or new guidance issued by the Government will hopefully provide clarity in this area, and this Practical Guide will be updated as and when appropriate. 

Open licensing and public domain material

Creative Commons heart iconGenerally speaking, the risk of infringement will be reduced when using openly-licensed materials such as open access journal articles and publications which have been made available under a Creative Commons licence with reduced restrictions on sharing and reuse.

It is important to note that open licenses operate within the legal framework of copyright and their own terms of use. For example, a basic requirement in all Creative Commons licences is that attribution must be provided, and in some cases there are restrictions on commercial use or onward sharing of derivative (modified) versions of the material, which could include AI-generated content.

There is also a lower risk when using public domain materials, which is where copyright protection has expired or has been waived by the rights holder. It is still considered good practice to attribute the authors of public domain materials, where possible.

Generative AI tools are emerging which have been developed entirely using openly-licensed and public domain materials (e.g. Common Pile v0.1), and initiatives are taking place to increase the availability of out-of-copyright works for use as training material (e.g. Harvard Institutional Data Initiative). 

For further guidance on Creative Commons licensing and public domain tools, see Creative Commons for Researchers: a Practical Guide and the For Research page (Using copyright material in your research).

The copyright guidance presented here is for general information only and does not constitute legal advice.

The University accepts no liability for any errors, omissions, or misleading statements in these pages, or for any loss which may arise from reliance on materials contained in these pages.