Out-Law / Die wichtigsten Infos des Tages

OUT-LAW NEWS Lesedauer: 4 Min.

German legal opinion questions applicability of TDM exception for training of AI

09 Oct 2024, 9:37 am

In Germany, a legal opinion was published on the question in how far ‘text and data mining’ (TDM) exception to copyright as set out in Article 4 of the DSM Copyright Directive allows for data sets being compiled and used to train AI systems.

The legal opinion was commissioned by a group called Authors‘ Rights Initiative (“Initiative Urheberrecht”). Sebastian Stober, professor in computer science at the university of Magdeburg, and Tim W. Dornis, professor in civil law and intellectual property at the university of Hannover wrote the opinion, which was published on 30 September.

Training AI systems requires the gathering and deploying of large amounts of data, some of which may be copyright protected works. Obtaining individual authorisation from all relevant authors is hardly possible in practice. This is why exceptions and limitations to copyright end up in the focus of those developing and testing the algorithms behind complex AI systems. The scraping and copying of data to compile the required training sets is a business of its own.

Article 3 of the DSM Copyright Directive provides a text and data mining exception for reproduction and extraction made by research organisations and cultural institutions, and Article 4 provides a TDM exception for everyone, notably commercial businesses, although with an opt-out option for rightsholders.

After the launch of ChatGPT, it was discussed whether Article 4 of the DSM Copyright Directive covered the training of AI models with works from the internet, as the works are always reproduced at least temporarily. The views to that question were and still are highly diverse. The legal opinion now published is 217 pages strong. It combines the technical perspective with a legal assessment of the matter. According to the authors, when gathering data and using such data for the purpose of training AI systems, there are typically actions that constitute a reproduction of works protected by copyright within the meaning of Article 2 of Copyright Directive.

The legal opinion also highlights that during the training of the AI systems additional copyright-relevant acts of reproduction occur "inside" the respective AI model. It states that although there are no dedicated memory mechanisms in place, there is some sort of memorising of the training data in all current generative AI models. Moreover, the use of such models is argued to involve the reproduction and redesigns of the works used for the training of the underlying AI model.

Dornis and Stober also take the position that, by offering services building on trained AI models, the service providers are making available copyright-protected works to the public within the meaning of EU copyright law. It is argued that the current spectrum of limitations and exceptions to copyright cover and justify the interferences associated with the training of generative AI models only in very few and “practically irrelevant” constellations. Most importantly, the authors take the position that the exception for text and data mining (TDM) does not apply to the training of generative AI models.

“Artificial Intelligence and data always come as a pair”, Nils Rauer, a copyright law expert at Pinsent Masons, said. “Without robust training sets no AI can be developed to a stage that the system can be put on the market. Equally, if no constant data feedback is safeguarded, the application cannot learn. Data is therefore crucial. However, most of the data available is subject to proprietary rights. This can cause difficulties. The numerous class action proceedings currently pending in the U.S. show the problem quite well. Copyright as well as data privacy are at issue.”

In Rauer’s experience, the core of the dispute is always the same: May the developer rely upon limitations and exceptions to copyright or is prior consent of the right holder required. “In the U.S., the limitation of fair use is in debate. In Europe, we have more bespoke limitations instead of one general principle such as fair use. Amongst the most relevant exceptions is the text and data mining exception in Articles 3 and 4 DSM Copyright Directive.”

“The legal opinion takes a deep dive both into the technical and the legal side of the topic. What is true is the fact that genuinely the TDM exception was not designed with the focus on training AI models. However, the exception is drafted in a way that it is open to ‘move with the technical development and needs’”, Anna-Lena Kempf, a Frankfurt-based copyright law expert at Pinsent Masons.

Kempf and Rauer call into question whether it is actually true that ‘inside’ the AI model you find a copy of a protected work – as Dornis and Stober believe to be true. “With ChatGPT, for instance, you have very small tokens of only four characters that are put together to sentences if a user prompts a question into the system. The tokens as such can hardly be seen as copyright protected items and the overall corps of tokens does not form a work either.”

“Also, memorising information is not the same as creating a copy of a specific work. If you reach the point that no specific work is stored ‘inside’ the model, then the provider offering the tool and the internet user using the tool both do not form acts relevant under copyright law.”

Rauer also argues that the definition of text and data mining in the DSM Directive is much broader than what the study suggests: “The TDM exception is not limited to extracting semantic information. So, whilst it is yet to be clarified if the TDM exception can be relied upon in each individual case of training AI models, the conclusions drawn by the authors of the current studies must be called into question. Notably, the European legislator clearly deems Art. 4 DSM Copyright Directive applicable. For, in Art. 53(1) lit. c) of the AI Regulation (EU) 2024/1689, explicit reference is made to Art. 4(3) DSM Copyright Directive. Having said this, the TDM exception as well as the U.S. principle of fair use are no ‘carte blanch’ for developers of AI models, one has to look at the specifics of the case and cannot rule out the applicability of TDM exception completely.”

Kempf also points to a most recent decision of the Regional Court of Hamburg, “The judges rightly rule in favour of the TDM exception being applicable. The focus is on whether the criteria of the exception are met in the very case or not.”