Table of Contents
Since the topic of artificial intelligence (AI) skyrocketed in early 2023 and became entrenched in the general public, the average programmer has probably been bombarded with articles and social media posts about using AI as a tool to write code. Supposedly, using AI to generate code significantly expedites software development processes and enhances efficiency.
AI can autonomously produce code based on vast datasets and preexisting codebases, blurring the lines between human and machine creativity. However, certain caveats come with the advent of tools enabling AI-generated code generation. The fundamental question is: who can be considered the creator of AI-generated code and, thus, the holder of copyrights?
Copyright of AI-generated code
Software developers have long relied on copyright protection to safeguard their creations. The source code of a program is protected under the Copyright Act (CA) of Estonia as a work of authorship, akin to literary works. Copyright grants exclusive rights to the creators, including the right to reproduce, distribute, and display the work, as well as the right to create derivative works. Copyright thus enables the author of a computer program to prevent the unauthorised use, replication, or distribution of its source code by other persons.
If a person writes a line of code, they channel their creative and intellectual freedom in the process, creating a computer program presumably protected by copyright. In fact, under existing copyright laws in most jurisdictions, copyright protection is extended to human authors, not to machines or AI systems. The latter is because of the premonition that only humans can be creative, while AI operates by algorithms and cannot exercise creativity. As a result, code solely generated by AI does not qualify for copyright protection as opposed to human-created code.
The lack of human involvement in the creative process makes attributing authorship to AI-generated code challenging. This is especially relevant with tools like OpenAI’s GPT and GitHub’s Copilot, where the user enters a prompt and receives a code snippet in response. During this process, the user only enters a general description or idea of the expected result as the prompt. Ideas, however, are abstract and not copyrightable under the CA. Because of the latter, the code snippet a user receives is not copyrightable, as they do not hold control over the resulting source code.
The user only expresses their idea in the prompt, receiving in return a code snippet that presumably correlates with the prompt. The resulting code snippet is made automatically, governed by the parameters of the AI model and based on training data. Neither the user nor OpenAI or GitHub have direct creative control over the code that the AI generates and thus, the code is without an author and ultimately not copyrightable.
Including AI-generated code in existing projects
While a code snippet generated with AI tools is not copyrightable in and of itself, it can and usually is incorporated into a larger project. By incorporating AI-generated code into a larger codebase where the developer exercises creative freedom, the source code of the program as a whole is copyright-protected under the CA. The author thus holds exclusive rights over the source code as a whole and may prohibit other persons from exploiting it.
Can AI-generated code match already existing code?
Users of AI code generators have to tread carefully, though, because using the code snippets does not entirely exclude them from the possibility of committing copyright infringement. In a nutshell, copyright gives the author exclusive rights over their work, giving them the right to prohibit other people from using the work. A person who unlawfully uses copyright-protected works infringes the author’s rights.
As AI models are trained on vast datasets, including copyrighted materials, there is a risk that the AI-generated code may reproduce or resemble copyrighted code without explicit authorisation on behalf of the author. This opens up the possibility of copyright infringement claims by the original creators against users of AI-generated code.
In some cases, source code authors have uploaded their code to GitHub and attached a GPLv3 or similar strong copyleft license to the source code. While a copyleft license permits the creation of derivative works, the license is ‘infective’, meaning that any program built on top of the licensed software must include the same GPLv3 license.
Some AI models are trained on GitHub or similar code repositories and there remains a risk that the AI generator outputs code identical to existing code licensed under copyleft licenses. Although the code is open source and can be copied, the GPLv3 license states that the user must also publish the program entirely under the GPLv3 license. In this case, the user risks receiving a copyright claim if they commercialise the program instead of attaching the compulsory GPLv3 license to the program.
How to mitigate the risk of infringing copyrights
To mitigate potential copyright issues, developers must be vigilant about the code that they use in projects, avoiding the usage of copyrighted materials without proper permissions or licenses. Unfortunately, AI generator service providers have not implemented robust mechanisms to identify and exclude copyrighted content during code generation.
One approach would be to manually check AI-generated code against public repositories, identifying any overlaps. However, some tools, like Copilot, have a toggle that checks the output against code found in public repositories. When applying this toggle, Copilot does not offer the user source code that matches an existing source code repository. A user should switch on this toggle to minimise the risk of copyright infringement.
Privacy concerns about using AI-generated code
Transmitting personal or confidential data to AI generator service providers may introduce privacy and security concerns. When developers utilise third-party AI services, they often share substantial amounts of sensitive information in the form of prompts, such as proprietary algorithms or user data. This practice poses the risk of data breaches or unauthorised access, potentially compromising the security of the entire software development process.
To address these concerns, developers should carefully evaluate the AI service providers they engage with, ensuring that adequate data protection measures are in place. Contracts and agreements with AI service providers should clearly outline data usage and security protocols to safeguard confidential information. As an example, GPT trains their AI model based on user prompts as a default setting. Turning this setting off in the application may contribute to protecting privacy, since sensitive data is not used to train AI models, thus eliminating the possibility of data disclosure to other users of the service.
For some projects, usage of GPT or Copilot may be prohibited altogether because of strict confidentiality requirements. In these instances, the developers should refrain from using AI code generation altogether. Alternatively, they may seek to use private LLM solutions, such as separating the data on-premises or in a separate virtual server. This mitigates the risks even further, ensuring that confidential or sensitive data is not compromised.
Conclusion
The rise of AI-generated code presents intriguing copyright implications for software developers. As AI increasingly becomes an integral part of the development process, clarifying the copyright status of AI-generated code becomes vital. While AI-generated code lacks copyright protection, developers must navigate the legal landscape carefully and proactively to avoid potential infringement issues. Moreover, the responsible use and transmission of personal or confidential data to AI service providers must be managed with utmost care to protect both the developers’ interests and the security of the data itself.
We understand that this can be a complex terrain to navigate. So, we are here to help. Contact us today.