Out-Law News 7 min. read
19 Dec 2024, 1:13 pm
Policymakers should consider how data protection law in Europe can be adapted to better support AI development in light of a new opinion issued by the European Data Protection Board (EDPB), experts have said.
Andreas Carney, Kathryn Wynn, and Nils Rauer of Pinsent Masons were commenting after the EDPB expressed views on how the General Data Protection Regulation (GDPR) applies to the processing of data in the AI context.
Part of the opinion addressed whether data “‘absorbed’ in the parameters of the model” constitutes ‘personal data’ – a question central to whether the processing of that data is subject to the GDPR’s strict rules. The EDPB said it often will be.
“Even when an AI model has not been intentionally designed to produce information relating to an identified or identifiable natural person from the training data, information from the training dataset, including personal data, may still remain ‘absorbed’ in the parameters of the model, namely represented through mathematical objects,” the EDPB said. “They may differ from the original training data points, but may still retain the original information of those data, which may ultimately be extractable or otherwise obtained, directly or indirectly, from the model.”
“Whenever information relating to identified or identifiable individuals whose personal data was used to train the model may be obtained from an AI model with means reasonably likely to be used, it may be concluded that such a model is not anonymous,” it said.
The EDPB went on to outline what AI developers would need to do to demonstrate that their models are anonymous – and therefore outside of the GDPR’s scope.
In this regard, it said both the likelihood of data about a person used in the training of the model being extracted from the model and the likelihood of obtaining the data from running queries through that model, would need to be “insignificant for any data subject”. Likelihood must be determined “using reasonable means”.
The EDPB added that “a thorough evaluation of the likelihood of identification” is very likely to be needed before a conclusion on the anonymous nature of AI models can be reached. That evaluation, it said, should factor in, on an objective basis, “all the means reasonably likely to be used” by the controller or another person to identify individuals, and further consider any unintended use, reuse or disclosure of the model.
Andreas Carney
Partner
The conclusions the EDPB has reached risk adding to concerns AI developers have expressed recently about the European approach to data protection compliance potentially hampering AI innovation
Carney, Wynn and Rauer said the EDPB’s views on the broad scope of the definition of ‘personal data’ and around anonymisation of AI models may have profound implications for AI development in Europe.
Rauer said: “It is important to understand that AI systems generally do not rely on the raw training data being stored in the system itself. This is also true for personal data. The information AI systems build on – notably gen-AI – is stored in the form of meta data from which the answer to a prompted query is generated anew. The output might qualify as personal data, but it does not come ‘from the shelf’ but is rather generated by the algorithm.”
“The mere fact that an AI system is involved in processing personal data does not change the overall principles of data privacy. As underlined by the EDPB, the same underlying principles must be applied. What might change is the level of risk, and the difficulty in being transparent about the processing. AI is about learning and the exchange of data to allow for a learning curve, thus things get more complex,” he said.
Carney said: “The reasons for the EDPB’s opinion are understandable – it is applying established principles of data protection law to new technology. This is not surprising, but the conclusions it has reached risk adding to concerns AI developers have expressed recently about the European approach to data protection compliance potentially hampering AI innovation.”
“A more adaptable approach – one that evolves as understanding of the way AI works improves – would be welcome. Developments in AI are moving fast so this would need to happen quite quickly. Without it, there is a risk that investment in AI in Europe will reduce and the associated benefits of the technology – such as its potential to boost productivity and deliver improved health and social outcomes – will not be felt by businesses and people in Europe to the extent that they could be,” he said.
Kathryn Wynn
Partner
One option policymakers could consider … is for a new derogation from the GDPR regime to be created, to reduce the compliance obligations pertaining to the data that sits in AI models
Wynn said the EDPB’s views around the status of data ‘absorbed’ into the AI model will not be welcomed by organisations, particularly when considered alongside the practical challenges the EDPB has said developers would face to show that the data is anonymous.
“Truly anonymising data is challenging– the threshold for anonymisation is essentially a moving target as the likelihood of re-identification increases as technology develops,” Wynn said. “In practice, the use of someone’s data to train AI models is not necessarily always used to make or inform decisions about them. Rather, it can be used, for example, to make decisions about similar data subjects, where they share a similar risk or consumer profile.”
“The potential harm to the privacy of the person whose data is being used to train the AI model could, depending on the circumstances, be relatively minimal and may be further reduced through security and pseudonymisation measures. However, the way in which the EDPB is interpreting the law would require organisations to meet burdensome, and in some cases impractical, compliance obligations around purpose limitation and transparency, in particular,” she said.
“One option policymakers could consider to address this problem, which is a real threat to European competitiveness in AI development, is for a new derogation from the GDPR regime to be created, to reduce the compliance obligations pertaining to the data that sits in AI models – the data that is essentially converted into code or meta data – so that even if it does constitute personal data there are fewer barriers to its processing,” Wynn added.
Rauer said that obligations facing developers and users of AI under the EU AI Act are also relevant considerations for businesses.
He said: “AI systems involve a certain level of so-called ‘blackbox’ obscurity, particularly if a deployer of an AI system has licensed the AI algorithm from a third-party provider. Given that privacy laws rest on transparency and adequate information being provided to the data subject, these two concepts may run counter. So, it is a question of balance – allowing AI to do its job and providing adequate transparency and control over what it does. In that respect, the transparency provisions within the AI Act on the one hand and in the GDPR on the other hand need to be interpreted and applied in a consistent manner.”
The EDPB’s opinion was issued after the UK Information Commissioner’s Office (ICO) published its own response to a series of consultations it ran on gen-AI and data protection issues earlier this year.
In its report, the ICO said highlighted what it described as “misconceptions” in the context of gen-AI use and data protection compliance. One of those misconceptions concerns the type of data that compliance efforts are focused on.
The ICO said: “Many organisations focus their generative AI compliance efforts around PII (personally identifiable information). However, to ensure compliance in the UK they should be considering processing of any ‘personal data’ (which is a broader and legally defined concept in the UK). Organisations must not undertake compliance based on a fundamental misunderstanding or miscommunicate their processing operations.”
Wynn said that, in different ways, both the EPDB and ICO have flagged the challenges businesses face in demonstrating that AI models are anonymous.
“In EU and UK data protection law, the concept of a ‘motivated intruder’ is relevant to determining whether data input to, or output from, an AI model constitutes personal data. It requires developers to consider whether a motivated intruder – whether a malicious hacker or a normal user acting with benign intentions – would be able to make connections from the data that would enable them to identify individuals from that data, even if the data in isolation is not attributable to an individual.”
The EDPB opinion and ICO response address a wide range of other data protection issues in the AI context, including how businesses using AI can lawfully process personal data when doing so.
According to the ICO, based on current practices, businesses can only scrape personal data from the internet to use to train gen-AI models if they have a valid ‘legitimate interest’ in that activity.
The GDPR provides that if the processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, personal data processing can be lawful. However, the ‘legitimate interests’ ground can only be relied upon for processing personal data if the interests cited by the controller are not “overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data […]”.
The EDPB’s opinion focused, in part, on how businesses can demonstrate the appropriateness of legitimate interest as a legal basis in the AI development and deployment phases. In doing so, it did not expressly rule out the possibility that other lawful bases for processing personal data in an AI context might be relied upon by businesses – including in the context of web scraping.
Malcolm Dowden of Pinsent Masons said the question of what constitutes a ‘legitimate interest’ has been the subject of debate and litigation for years.
“The extent to which AI-related innovation can, in and of itself, be considered to be a valid legitimate interest that businesses can cite in the context of AI-related personal data processing, has been the subject of discussion by UK law makers recently as part of their scrutiny of the Data (Use and Access) Bill before the UK parliament,” Dowden said.
“Advocates of AI suggest that data processing in the AI context drives innovation and brings inherent social good and benefits that constitute a ‘legitimate interest’, for data protection law purposes. Opponents believe that view does not account for AI-related risks, such as to privacy, discrimination or from the potential dissemination of ‘deep fakes’ or disinformation,” he said.
“Where bodies like the ICO and EDPB land on this issue is important, because it has the potential to remove barriers some see to AI development in Europe – notwithstanding that processing based on any legitimate interest acknowledged in this regard would still need to be shown to be necessary and not overridden by the rights and freedoms of data subjects,” he said.