Subject Guides: Copyright: a Skills Guide: Copyright and Generative AI

At a glance

Acceptable uses:	Unacceptable uses (you must not do the following):
✔️ Uploading a relevant proportion of lawfully obtained copyrighted material into one of the University recommended generative AI tools using your York login credentials, with assurances that your activity is covered by a legal exception (e.g. for non-commercial research, private study or teaching), and with full acknowledgement provided.	❌ Uploading copyrighted material to a generative AI tool which does not provide assurances regarding data privacy and use of prompts and uploads for training purposes.
✔️ Uploading whole copyright materials into University recommended generative AI tools for text and data analysis in the context of non-commercial research (see Text and data mining: a practical guide).	❌ Using copyrighted material with a generative AI tool for the purposes of an activity not covered by a legal exception (e.g. commercial research purposes).
✔️ Using collaborative features to share material with colleagues within the University, for the purposes mentioned above.	❌ Using generative AI tools to transfer source material or generated outputs to users outside the University.
✔️ Using openly-licensed material (e.g. under a Creative Commons licence), as long as you follow the specific requirements of the licence such as providing full acknowledgement.	❌ Using material which may be sensitive, confidential, personal or subject to specific licensing agreements.
✔️ Using material with specific assurances/permissions obtained from the rights holder, or material which is not subject to copyright protection (public domain).	❌ Using material which has not been obtained legally.

Introduction

Generative AI tools analyse text, images, code and other material for machine-learning purposes, retaining information and creating new content in response to prompts provided by users.

Material may have become part of the AI model’s training dataset as a result of web crawling, licensing agreements with content providers such as publishers, or user-upload files and interactions (prompts). This raises new and complex considerations in terms of how the tools we use intersect with the rights of authors and creators, and how generative AI can be used safely and ethically in the context of teaching, learning and research activities. Considerations will depend on the tool(s) and material used, and the specific activity and context in each case.

The University’s recommended generative AI tools include Google Gemini, which can be used to generate, analyse and summarise text as well as help you find information, and NotebookLM, an AI powered assistant that helps with research and note-taking. Note that these tools function differently in terms of how they interact with copyright material; Gemini works with a range of sources in its training dataset which may or may not be cited, whereas NotebookLM is designed for users to provide uploaded or linked sources which it then works with.

There are still many contested and unresolved challenges and questions, so current University guidance advises caution when it comes to using copyright material in the context of generative AI tools. This is an area of rapid change, and it’s beneficial for users to stay informed about developments around how these tools work and emerging issues in relation to copyright. We will endeavour to keep this guidance up to date, but it is not comprehensive in scope and should not be interpreted as legal advice.

There are also important, related concerns around using generative AI tools with data which is sensitive, confidential, personal or subject to specific licensing agreements. These considerations are not addressed specifically in this Practical Guide, but are covered by the University guidance linked below.

University guidance

It is important to be aware of the following University guidance, all of which emphasises the risks involved when using generative AI tools and copyright material.

You should read the guidance relevant to your own work and context for examples of acceptable and unacceptable uses of generative AI:

IT Services general guidance on generative AI tools (Dec 2024): "You must not upload any copyrighted content, including journals, to any [generative AI] tools unless you have confirmed this is acceptable with the publisher".
Generative AI - Staff guidance and Taught students guidance (2024): “Avoid uploading personal/sensitive/copyrighted material to generative AI”
Artificial Intelligence use in assessment - Student guidance (Nov 2023): “Warnings: Data security, intellectual property and ethics: providing your own work to an individual or software has a degree of risk. Once you have shared your work, you cannot guarantee how that will then be used. The emergence of generative AI, for example, has raised serious concerns about how data is processed and used by the companies and software.”
Guidance on the use of generative AI in PGR programmes (2024): “Your research must meet legal and University expectations in terms of… intellectual property rights (including copyright)”; “Generative AI… may produce outputs that ignore intellectual property rights, for example by producing content that does not include appropriate acknowledgement or breaches copyright.”
Responsible AI Use in Research: Policy & Best Practice Document for Researchers (May 2025): “Research, especially unpublished work and third-party content, must not be shared with AI systems without assurances regarding data protection, copyright, intellectual property and without anonymisation.”

The University has also signed up to the Russell Group principles on generative AI in education (Feb 2024), which outlines our role in supporting students and staff to become AI-literate. This includes understanding the opportunities, limitations and ethical issues around plagiarism and copyright infringement.

Key issues

Jisc has issued guidance on Generative AI and copyright law and practice in education (Mar 2024) which raises some 'challenging questions' as a starting point for navigating copyright issues:

▷ If I use generative AI tools to generate content, am I at risk of infringing the rights of someone whose work was used to train the model?

Mainstream generative AI tools including Google Gemini and NotebookLM claim to be designed to avoid reproducing copyrighted content exactly, or at length (See How Gemini for Google Cloud works). However, copyright material technically does not need to be reproduced verbatim in order for an infringement to arise, which raises the possibility of unintentional or unknowing infringement by its users. This reinforces the idea that generative AI outputs always need to be checked - not just for errors or bias but also for copyright infringement.
Another unresolved issue is the lack of transparency around the sources used to train these models, and whether this has been done with the appropriate authorisation or knowledge of rights holders. There are various ongoing and high-profile court cases in this area between authors, publishers and developers.
Attribution remains a key issue with generative AI tools. Authors have a moral right to be recognised as the creator of their original work and Gemini and NotebookLM will strive to provide citations for any quotations they reproduce, but it is arguably impossible for them to fully attribute all the information sources they have been trained on. The related issue of generative AI tools ‘hallucinating’, fabricating or misrepresenting citations is well-known, which is why users are advised to double-check their responses and original sources.

▷ Will I as a user of the tool own the work that is created by my prompts? Or do authors whose works are used to train AI systems have an ownership claim? Or is the generative AI tool the 'author'?

Google does not claim ownership over original content generated by their generative AI tools (see Google Terms of Service), but the copyright status of such works is disputed.
Internationally, the consensus seems to be that AI-generated works belong to the public domain and are therefore free of authorship, ownership and copyright protection. However, under UK law computer generated works without a human author may be subject to copyright for a period of 50 years from the point of their creation (CDPA S.178). The owner in this case is “the person by whom the arrangements necessary for the creation of the work are undertaken”, but it’s unclear who this should be in the context of AI-generated outputs (the user, the developer of the tool, the creators of the training material, or all of the above?). Note that if a generated output closely resembles the source material used to train it then it may be considered an infringing copy and should not be disseminated.
Referencing styles: a Practical Guide has a section on acknowledging and referencing generative AI which is related to these ideas around ownership of generated material.

▷ Does the material that I input to a generative AI tool become part of the Generative AI tool database for others to use?

As the University is a Google Workspace for Education institution, Gemini and NotebookLM do not use your uploads and prompts to train their models if you access their tools using your York login credentials.
Unsupported tools, especially those which are free to use, may not carry these same assurances regarding user data or may require you to opt out of training. Unless you have been assured otherwise, you should assume that the prompts and uploads you provide to a generative AI tool and the responses they offer will be retained as part of the training database and shared, directly or indirectly, with developers and other users.
Related to this question, a number of large publishers have engaged in licensing their scholarly content directly to AI developers for use as training data, or have asked their authors to sign contractual addendums to opt in to potential future licensing agreements.

▷ If I input someone else’s content into a generative AI tool and content is generated could this be an infringement of their copyright?

The current consensus (in the UK at least) seems to be that the process of uploading content into a generative AI tool for training purposes is considered a ‘restricted act’ under copyright law (CDPA S.16).
For this reason it is normally expected that rights holder permission is needed before content can be inputted, and some publishers and content providers are including restrictive AI clauses in their licences. However, the data privacy terms that come with the University’s supported tools offer some assurances when assessing the risk of infringement.
Legal exceptions and open licences should also be taken into consideration when assessing the risk of infringement, as discussed below.

Legal considerations

There is currently no legal framework in the UK around copyright and generative AI specifically. However, there are broad legal exceptions which may be applied when using copyright material in the context of private study, non-commercial research or as part of an assessment or teaching activity.

These exceptions are explored in further detail elsewhere in this Practical Guide (see Copyright law explained) and depend on whether one of the University recommended generative AI tools are being used (i.e. Google Workspace for Education Gemini and NotebookLM), as well as the following considerations:

You must:
- only copy material which is relevant to your research, study or teaching activity (consider using a short quotation or extract rather than a full article, for example), or;
- upload the entire copyright material only if it is for text and data analysis in the context of non-commercial research (see Text and data mining: a practical guide).
The material must not be used to train a publicly-accessible AI tool or transferred to any other users, including contacts at other institutions or the general public.
You must have lawful access to the material used (e.g. through a Library subscription), and must not circumvent any technical protection measures or restrictions put in place by a rights holder or content provider (this is specifically in relation to text and data mining processes, which may require users to obtain a licence or access to an API or platform from the provider).
In the case of research, this must be for a non-commercial purpose (be mindful of activities which take place in collaboration with commercial partners, or with commercialisation in mind).
The copyright material, and in some cases the output, should not be transferred to third parties outside of the University.
Sufficient acknowledgement must be provided, where possible and necessary.

If these assurances cannot be met, then you should avoid uploading copyright material unless you have specific permission to do so from the relevant rights holder.

Note that UK legislation is more restrictive in terms of ‘fair dealing’ exceptions than countries such as the US, where concepts such as ‘transformative use’ may be applied to a broader set of purposes. This is significant in terms of the global scale of generative AI development, which is the basis upon which regulatory changes have been proposed to encourage innovation whilst protecting the interests of rights holders (see Gov.uk: Copyright and Artificial Intelligence consultation (Dec 2024)). These proposals have proved controversial, however, with concerns summarised by the likes of CREATe (the Centre for Regulation of the Creative Economy) in their consultation response working paper (Feb 2025).

Any eventual changes or new guidance issued by the Government will hopefully provide clarity in this area, and this Practical Guide will be updated as and when appropriate.

Open licensing and public domain material

Generally speaking, the risk of infringement will be reduced when using openly-licensed materials such as open access journal articles and publications which have been made available under a Creative Commons licence with reduced restrictions on sharing and reuse.

It is important to note that open licenses operate within the legal framework of copyright and their own terms of use. For example, a basic requirement in all Creative Commons licences is that attribution must be provided, and in some cases there are restrictions on commercial use or onward sharing of derivative (modified) versions of the material, which could include AI-generated content.

There is also a lower risk when using public domain materials, which is where copyright protection has expired or has been waived by the rights holder. It is still considered good practice to attribute the authors of public domain materials, where possible.

Generative AI tools are emerging which have been developed entirely using openly-licensed and public domain materials (e.g. Common Pile v0.1), and initiatives are taking place to increase the availability of out-of-copyright works for use as training material (e.g. Harvard Institutional Data Initiative).

For further guidance on Creative Commons licensing and public domain tools, see Creative Commons for Researchers: a Practical Guide and the For Research page (Using copyright material in your research).

Copyright: a Skills Guide

Copyright and Generative AI