Copyright and privacy implications of using artificial intelligence to generate code

Since the topic of artificial intelligence (AI) skyrocketed in early 2023 and became entrenched in the general public, the average programmer has probably been bombarded with articles and social media posts about using AI as a tool to write code. Supposedly, using AI to generate code significantly expedites software development processes and enhances efficiency 

However, certain caveats come with the advent of tools that can generate AI-generated code. AI can autonomously produce code based on vast datasets and preexisting codebases, blurring the lines between human and machine creativity. This raises the fundamental question: Who can be considered the creator of AI-generated code and, thus, the holder of copyrights? 

Software developers have long relied on copyright protection to safeguard their creations. The source code of a program is protected under the Copyright Act (CA) of Estonia as a work of authorship, akin to literary works. Copyright grants exclusive rights to the creators, including the right to reproduce, distribute, and display the work, as well as the right to create derivative works. Copyright enables the author of a computer program to prevent the unauthorised use, replication, or distribution of its source code by other persons. 

Copyright of AI-generated code

If a person writes a line of code, they channel their creative and intellectual freedom to form coherent and usable source code. In fact, under existing copyright laws in most jurisdictions, copyright protection is extended to human authors, not to machines or AI systems. The latter is because of the premonition that only humans can be creative, while AI operates by algorithms and cannot exercise creativity. As a result, AI-generated code does not qualify for copyright protection like human-created code. The lack of human involvement in the creative process makes attributing authorship to AI-generated code challenging 

Authorship and using AI tools to generate code

This is especially relevant with tools like OpenAI’s GPT and GitHub’s Copilot, where the user enters a prompt and receives a piece of source code. The user only enters a general description or idea of the expected result as the prompt. Ideas, unlike works, are more abstract and not copyrightable under the CA. It is therefore regarded that the prompt the user enters is not copyrightable. Because of the latter, the resulting source code is not copyrightable, as the user does not hold control over the resulting source codethe user only expresses their idea. Still, the result is governed by how the AI model is trained. Neither the user nor OpenAI or GitHub have direct creative control over the code that the AI generates, and the code is without an author and not copyrightable. 

Including AI-generated code in existing projects

If the AI-generated code is integrated into a larger codebase where the creator of the project exercises creative freedom, then the source code is protected under the CA as a whole. Users of AI code generators have to tread carefully, though, because using the code snippets does not entirely exclude them from the possibility of committing copyright infringement. In some cases, the code given by the AI generator is already protected by copyright 

 Can AI-generated code match already existing code?

Copyright gives the author an exclusive right over their creation, and they can refrain other people from using the work. A person who unlawfully uses works protected by copyright infringes the author’s rights. As AI models are trained on vast datasets, including copyrighted materials, there is a risk that the AI-generated code may reproduce or resemble copyrighted code without explicit authorisation. This opens up the possibility of copyright infringement claims by the original creators against users of AI-generated code 

Using AI-generated code and licensing issues

In some cases, source code authors have uploaded their code to GitHub and attached a GPLv3 licence to the source code. Some AI models are trained on GitHub code repositories. The AI generator may output code identical to existing code. Although the code is open source and can be copied, the GPLv3 license states that the user must also publish the program entirely under the GPLv3 license. In this case, the user risks receiving a copyright claim if they do not attach the GPLv3 license to the program. 

How to mitigate the risk of infringing copyrights

To mitigate potential copyright issues, developers must be vigilant about the data used to train AI models. They should ensure that the datasets are carefully curated, avoiding copyrighted materials without proper permissions or licenses. Unfortunately, AI generator service providers have not implemented robust mechanisms to identify and exclude copyrighted content during code generation. Some tools, like Copilot, have a toggle that checks the output against code found in public repositories, Copilot does not offer you code that matches with code from an existing repository. A user should switch on this toggle to minimise the risk of infringement. 

Privacy concerns about using AI-generated code

However, even with diligent efforts, transmitting personal or confidential data to AI generator service providers may introduce privacy and security concerns. When developers utilise third-party AI services, they often share substantial amounts of sensitive information, such as proprietary algorithms or user data. This practice poses the risk of data breaches or unauthorised access, potentially compromising the security of the entire software development process. 

To address these concerns, developers should carefully evaluate the AI service providers they engage with, ensuring that adequate data protection measures are in place. Contracts and agreements with AI service providers should clearly outline data usage and security protocols to safeguard confidential information. 

 Conclusion

The rise of AI-generated code presents intriguing copyright implications for software developers. As AI increasingly becomes an integral part of the development process, clarifying the copyright status of AI-generated code becomes vital. While AI systems lack copyright protection, developers must navigate the legal landscape carefully and proactively to avoid potential infringement issues. Moreover, the responsible use and transmission of personal or confidential data to AI service providers must be managed with utmost care to protect both the developers’ interests and the security of the data itself.

We understand that this can be a complex terrain to navigate. So, we are here to help. Contact us today.

Get the latest about Hedman law firm

Hedman

Our memberships:
FinanceEstonia, Lexing®,
Estonian Service Industry Association,
Estonian Chamber of Commerce and Industry,
EstVCA, EstBan, FECC,
IBA & IBA European regional Forum