Out-Law News 3 min. read
09 May 2024, 9:38 am
Recently issued guidance from the Dutch data protection authority on web-scraping may impact the business model of information services providers and cause concern for those who rely on those services for a wide range of purposes, an expert has said.
The Dutch data protection authority, Autoriteit Persoonsgegevens (AP), issued the statement (26 pages / 2.6 MB) on 1 May warning that the practice of web-scraping to acquire personal data for purposes such as training artificial intelligence (AI) models is “almost always a violation of the General Data Protection Regulation (GDPR)”.
Wouter Seinen, cyber, data and technology expert at Pinsent Masons, said: “The most outstanding point to me is that the AP decided to zoom out and look at data harvesting practices as a general theme, to the conclusion that many practices that are currently quite commonplace may actually be violating GDPR. This may impact businesses and make companies nervous who rely on services varying from anti-money laundering to know your customer controls to direct marketing.”
Web-scraping relies on using a computer agent or web-crawling bot to automatically locate and collect data from the internet – for example, by scanning social media. Scraping almost always includes the collection of personal data and, because of its automated nature, can rapidly obtain personal data from many people.
The guidance aims to give clarity to when web-scraping can be used in line with GPDR rules, with firms required to show a “legitimate interest” in processing personal data even in situations where the information scraped is publicly available online. The position of the AP rather heavily relies on the view that “legitimate interests” could only be interests that are specifically protected by law and that “purely commercial interests” will not qualify. However, this position is currently under review.
The guidelines set out the factors businesses should take into account on a case-by-case basis to determine whether they have a legitimate interest in the data being processed via web-scraping. These include the way in which data is processed; the purpose of processing the personal data; and any safeguards to protect the interests of the data subjects.
“The takeaway for business that are involved in collecting vast sets of data from the public internet or even ‘enriching’ their data using software tools are at risk of being put on the spot by supervisory authorities, so it would be commendable to check whether an up to data legitimate interest assessment is in place or other documentation evidencing how the use case was assessed for GDPR compliance,” said Seinen.
The AP’s statement identified “widespread misunderstanding” that scraping is allowed simply because the information scraped is publicly available on the open web. It said that, just because information is public, it does not automatically mean that scraping should be allowed. For instance, if someone posts on social media that they have recently won the lottery or had an operation, this does not give permission for that data to be scraped and processed by an organisation.
Instead, individuals must be informed of the legitimate interest claimed by the person or organisation collecting and processing their data and given the right to object, in line with the GDPR. However, “the automated and large-scale nature of web-scraping makes this unlikely in practice”, said Malcolm Dowden, data protection law expert at Pinsent Masons.
The guidance recognises exceptions to the scope of the GDPR. For example, data scraping to train an algorithm allowing users outside of the EU to generate images or computer codes is out of scope if the controller is established outside the EU and does not provide goods or services.
However, the AP said that data scraping almost always scrapes personal data, meaning businesses are required to be compliant with Article 5(1) of the GDPR. Article 5(1)(a) requires personal data to be processed lawfully, fairly and in a transparent manner in relation to the data subject.
The latest Dutch guidance follows recent investigations carried out by the AP. This includes an investigation into credit score report service, Experian, as part of the authority’s enforcement priorities.
The EU and UK data protection authorities have repeatedly expressed concerns in the past about the risk posed by web-scraping to the rights and freedoms of individuals. This includes guidance from the UK Information Commission’s Office (ICO) highlighting hurdles faced by AI developers, and the UK data law compliance requirements for businesses using web-scraping.
“The increase in web-scraping monitoring, both in the Netherlands and beyond, will require firms to be more vigilant as to the way they read and use personal data online,” said Seinen.
“Close security by data protection authorities of web-scraping activities used to obtain training data for AI models is likely to fuel greater use of ‘synthetic data’, generated by an algorithm originally trained on ‘real’ data to produce augmented or replacement data. It may also fuel greater use of data licencing deals,” Dowden said.