As more companies begin to adopt artificial intelligence (AI) such as chatbot ChatGPT into their business applications, data privacy risks start to come to the fore.

ChatGPT is a form of generative AI, an AI chatbot that uses natural language processing to create human-like conversations, trained by human feedback and reward models for reinforcement learning.

Risks for adopting AI

AI systems generally require large amounts of data to simulate human behaviours and this can be collected in different ways. For example, companies active on multiple online platforms can obtain data from a large number of sources. Some is provided directly by users, such as contact information or purchase history. Others are collected 'behind the scenes' such as collecting via the use of cookies and other tracking technologies.

Turning to ChatGPT specifically, the method of collecting the data on which ChatGPT is based has not yet been disclosed, but data protection experts warn that it is not legal to obtain training data by simply trawling the internet.

A common concern about this process is that AI may 'learn' from people's prompts and provide this information which may or may not involve personal data to others who enquire about the matter in question.

Currently, the language model AI or ChatGPT does not automatically add information from a query to its model for other users to query. In other words, the information entered in the query box will not result in that data being included in ChatGPT or language for others to use. However, the query will be visible to the organisation which provided the language model. 

These queries are stored and will be used to develop the next version of the language model. This may mean that the developer or provider or its partners or contractors are able to read the queries and possibly incorporate them into future versions in some way. Therefore, the terms of use and privacy policy need to be thoroughly understood before users ask sensitive questions.

A question may be sensitive because of the data contained in the query, or because of who is asking the question and when. For example, if a manager is caught asking "how best to fire an employee?" Or if someone asks an exposed health or relationship question. It is important to note that information from multiple queries using the same login name will be aggregated.

As more organisations produce language models, cybersecurity becomes another risk, namely that queries stored online may be hacked, corrupted or, more likely, accidentally made publicly accessible. This could include potentially user identifiable information.

Jennifer Wu

Jennifer Wu

Partner

Accuracy of data and data security are the key risks in companies adopting this. It is necessary for companies to do an internal assessment to see whether and to what extent AI will be adopted.

Key data privacy concerns

The terms of use and privacy policy are key. Providers need to ensure that data collection, processing and storage are compliant with data protection laws. 

Providers should ask themselves the following questions:

  • consent or purpose of data collection: is there a new purpose to collection of data? 
  • data usage: how will the data be used? 
  • data sharing: Is data shared in isolation or in aggregation with other organisations? Is it available to the vendor’s researchers or partners?
  • data security: what are the protections and measures adopted? Is encryption used?  
  • cyberbreach: Do you have a crisis management plan? 
  • data accuracy:  How to correct outdated or inaccurate information? 
  • anonymisation: can personal data be anonymised?

There will be a need to update data privacy notices depending on how AI is adopted into your business. 

We are processing your request. \n Thank you for your patience. An error occurred. This could be due to inactivity on the page - please try again.