Europe: Chatbots and the GDPR – is the tension irreconcilable?
The last year has seen significant advances in the use of chatbots, exemplified by the ChatGPT service developed by OpenAI. The service is underpinned by both a language model and a knowledge base. The language model allows it to generate text by predicting which string of words is most likely to follow on from a user prompt. Peter Church, Counsel at Linklaters LLP, discusses how chatbots can be used, their relationship with legislation, and the issues both developers and users may encounter.
Use in practice
Many businesses are now either investigating or actively deploying this kind of technology within their business. For example, an energy company is using chatbots to answer customer emails. The technology is said to be doing the work of 250 employees and achieving an 80% satisfaction rate, higher than the 65% achieved by workers.
These talents extend to legal advice. We tested a chatbot on 50 hard questions on the General Data Protection Regulation (GDPR) and English contract law. The results ranged from surprisingly good to terribly wrong. On the surprisingly good side, the chatbot correctly identified that a public hospital would need to appoint a data protection officer (DPO) because it processes health data on a large scale and is a public authority.
On the terribly wrong side, the chatbot sometimes makes things up. This is best exemplified by the real case of a New York lawyer who used a chatbot to write a brief in his client's personal injury case. The problem was that the brief contained a range of entirely fictional case references. The statistical predictions from the language model suggested likely names for cases supporting his arguments, but those cases just didn't exist.
The general view within the industry is that it will take some time to fully understand the range of potential use cases for the large language models on the market now, without factoring in the potential future improvements to that technology.
The GDPR as the 'law of everything'
There is currently little specific regulation of chatbots or artificial intelligence (AI). This will change in the EU when the AI Act is adopted (with similar specific AI laws expected in China and the US). However, it will be a number of years before the AI Act applies and other jurisdictions (such as the UK) do not intend to pass specific AI regulation.
This means that the GDPR – in its role as the 'law of everything' – is currently the primary tool to regulate the use of chatbots in the UK and the EU1. The issues it raises depend on whether you are a developer or user of this technology. To take each in turn.
GDPR issues for developers of Chatbots
Chatbot developers will have to comply with the full range of GDPR obligations - from the need to provide privacy notices, to the appointment of DPOs, to the need to comply with the rules on transborder dataflow.
However, there are three areas of concern that raise potentially fundamental compliance questions.
- Legal basis for training the AI: The AI chatbots are trained on gigantic datasets in order to 'learn' how language works (i.e. to build a statistical model of what series of words are most likely) and to ingest knowledge. Given the size of these datasets, the only legal basis capable of justifying the training of this type of chatbot is the legitimate interests test (Article 6(1)(f) of the GDPR). The data protection authorities' views on this highly subjective test will be interesting. The question of whether the interests of the developer in creating the chatbot are outweighed by the interests of the individual is likely to be highly fact specific. For example, what measures are used to police the quality and sensitivity of the data in the datasets, what rights individuals have, etc. Having said that, if a data protection regulator were to conclude that a chatbot developer was not able to rely on the legitimate interests test to train their chatbot, that would raise serious doubts about the legality of training AI models more generally in the EU.
- Accuracy: The magic of these AI chatbots is their ability to not just regurgitate knowledge but to use the language model to apply that knowledge in new and interesting ways. Unfortunately, this means they sometimes just make stuff up, as the New York lawyer discussed at the start of the article discovered to his peril. The question is how these 'hallucinations' can be reconciled with the requirement in the GDPR that personal data must be accurate (Article 5(1)(d) of the GDPR). The answer is likely to be that the output should probably be viewed in context – i.e., be seen as fantasy and not fact. The English courts have already accepted that the accuracy principle must be applied in context and does not always mean the personal data must be literally true, e.g., see AB v Chief Constable of British Transport Police  EWHC 2749 in which the High Court decided that a policeman's notebook needed to accurately record what he was told about an incident, but did not have to accurately reflect what actually happened. Regulatory views on this principle and its application to chatbot 'hallucinations' will be very important to the development of this technology.
- Individual rights: Individuals have a range of rights, including rights to object and rights to erasure (Articles 17 and 21 of the GDPR). Those rights apply to training data and AI, just as much as they apply to any other system. However, applying those rights in this context raises two problems. The first is that the training datasets are massive. Searching terabytes of data to identify all the personal data about a particular individual and deleting it is not a trivial undertaking. Second, the chatbot will likely need to be retrained once the training dataset has been cleansed. This again may not be trivial.
These are all live issues. The Italian data protection authority ('Garante') initially banned ChatGPT from processing personal data about Italian citizens based on (amongst other things) the concerns set out above. That ban was reversed a month later following amendments to the service. However, investigations by other data protection regulators are ongoing and they might well result in a further ban or a demand for significant changes to the service.
Alongside the issues under the GDPR, chatbot developers will have to consider a host of other legal issues under intellectual property (IP) law, competition law, and the like. However, those issues are outside the scope of this note.
Issues for users of chatbots
The issues faced by users of chatbots are slightly different. Businesses looking to deploy AI chatbots should consider the issues set out below. These issues are drawn more widely to consider those arising under the GDPR and other key areas of law:
- Don't trust it; don't rely on it: As set out above, the output of these AI chatbots can be half fact; half fiction. Added to that, the answers it produces are often very convincing and so tempting to rely on. In practice, where the chatbot is interfacing directly with customers (such as a chatbot on a website) its operation should be tightly scoped and controlled. It is also important to keep it closely supervised (e.g., continually reviewing or sampling its output) to make sure it continues to answer in an appropriate manner and that there is no 'model drift.' Where chatbots are being used within a business (e.g., as 'AI assistants'), their output should be properly checked by someone with sufficient intelligence and expertise to confirm if it is right or not. Again, because these chatbots produce such convincing answers, this is not always easy to work out. You might deploy the chatbot so it only refers to a tightly defined and trusted knowledge base, and it can provide links back to the underlying knowledge base so you can check its conclusions. For example, you might create a knowledge base made of up-to-date privacy laws, regulatory guidance, and case law, and require the large language model to only consider that material when answering questions.
- Don't give it your secrets: You should carefully consider the risks of inputting confidential information or personal data into a chatbot (including chatbots interfacing directly with customers). For example, a lawyer inputting details of their clients into a chatbot might breach client confidentiality, data protection laws, and professional ethics. This is particularly the case given the risk of training data for AI systems reaching a wide number of recipients is well known.
- Be alert to IP risks: There are a number of unresolved IP issues raised by generative AI. Some relate to whether developers can use copyright works to train their models. This is largely a concern for the developers. However, it is possible that the output from chatbots could also infringe copyright, for example, when it reproduces an existing copyright work. Due to the way chatbots work, this risk is not currently thought to be significant (because the output text is likely to be combined from multiple sources) but should be considered, particularly where the output is published.
- Be careful about bias and discrimination. You should consider carefully the risk that the chatbot exhibits biased or discriminatory behavior that has been inherited from the data it was trained on. This might include gender bias by, for example, assuming references to nurses are to female nurses, and references to software engineers are to male software engineers.
- Computer says no: The GDPR contains restrictions on automated decisions that produce legal effects on an individual or otherwise significantly affect them. These restrictions are getting increasing attention (including in a number of pending cases in the Court of Justice of the European Union2) so they need to be addressed carefully.
- Contracting for AI – controller/processor: All of these issues need to be factored into any contract with the AI provider. That contract will need to address a wide range of issues, but from a GDPR perspective, the key issue will be whether the provider of the AI systems acts as a controller or processor. If the provider is a processor, the contract with need appropriate processor language under Article 28 of the GDPR. Similarly, if the project involves the transfer of personal data to a non-adequate third country then it is likely that Standard Contractual Clauses and Transfer Impact Assessment will be needed.
- Impact assessment: Finally, it is important to properly document your compliance. From a GDPR perspective, this primarily means completing a Data Protection Impact Assessment (and a legitimate interests assessment). Regulators will very much expect that these impact assessments will be carried out and kept up to date as the projects progress.
While this Insight article provides a high-level overview of the GDPR issues, the interaction between AI and the law is complex and subtle.
Peter Church3 Counsel
Linklaters LLP, London
1. In the UK, the UK GDPR applies. However, it doesn't materially diverge from the EU GDPR on these issues. Therefore, in this Insight article, the term 'GDPR' is intended to encompass both the GDPR and the UK GDPR.
2. See for example the pending judgment in SCHUFA Holding and Others (C-634/21).
3. The author contributed the data protection chapter to the book Artificial Intelligence: Law and Regulation by Edward Elgar that provides a deeper drive into these issues.