As more companies begin to adopt artificial intelligence (AI) such as chatbot ChatGPT into their business applications, data privacy risks start to come to the fore.
ChatGPT is a form of generative AI, an AI chatbot that uses natural language processing to create human-like conversations, trained by human feedback and reward models for reinforcement learning.
AI systems generally require large amounts of data to simulate human behaviours and this can be collected in different ways. For example, companies active on multiple online platforms can obtain data from a large number of sources. Some is provided directly by users, such as contact information or purchase history. Others are collected 'behind the scenes' such as collecting via the use of cookies and other tracking technologies.
Turning to ChatGPT specifically, the method of collecting the data on which ChatGPT is based has not yet been disclosed, but data protection experts warn that it is not legal to obtain training data by simply trawling the internet.
A common concern about this process is that AI may 'learn' from people's prompts and provide this information which may or may not involve personal data to others who enquire about the matter in question.
Currently, the language model AI or ChatGPT does not automatically add information from a query to its model for other users to query. In other words, the information entered in the query box will not result in that data being included in ChatGPT or language for others to use. However, the query will be visible to the organisation which provided the language model.
These queries are stored and will be used to develop the next version of the language model. This may mean that the developer or provider or its partners or contractors are able to read the queries and possibly incorporate them into future versions in some way. Therefore, the terms of use and privacy policy need to be thoroughly understood before users ask sensitive questions.
A question may be sensitive because of the data contained in the query, or because of who is asking the question and when. For example, if a manager is caught asking "how best to fire an employee?" Or if someone asks an exposed health or relationship question. It is important to note that information from multiple queries using the same login name will be aggregated.
As more organisations produce language models, cybersecurity becomes another risk, namely that queries stored online may be hacked, corrupted or, more likely, accidentally made publicly accessible. This could include potentially user identifiable information.
Jennifer Wu
Partner
Accuracy of data and data security are the key risks in companies adopting this. It is necessary for companies to do an internal assessment to see whether and to what extent AI will be adopted.
The terms of use and privacy policy are key. Providers need to ensure that data collection, processing and storage are compliant with data protection laws.
Providers should ask themselves the following questions:
There will be a need to update data privacy notices depending on how AI is adopted into your business.