Hong Kong: The privacy and ethical risks of generative AI cannot be ignored
The emergence of artificial intelligence (AI), particularly with the introduction of powerful generative AI-powered chatbots like Open AI's ChatGPT, Google LLC's Bard, Microsoft Corporation's Bing Chat, Baidu, Inc's ERNIE Bot, and Alibaba's Tongyi Qianwen, has captured considerable attention this year. These powerful language tools are revolutionizing human-technology interactions due to their increasing ability to generate text indistinguishable from those written by humans. Generative AI is also being used for generating other content such as images, videos, computer codes, etc. That said, various experts have warned that advancing the development of AI technologies without appropriate safeguards could cause detrimental effects to humanity. In fact, in July 2023, seven tech companies jointly expressed their voluntary commitment to developing AI responsibly according to the principles of safety, security, and trust1. Ada Chung Lai-Ling, Privacy Commissioner for Personal Data, Hong Kong, China, discusses the considerations and risks regarding the use of generative AI, as well as the ever-evolving regulatory landscape.
Generative AI fever
'Generative AI' can generally be defined as 'algorithms that can be used to create new content, including audio, code, images, text, simulations, and videos.' Owing to its 'magical' capabilities to respond to a vast range of requests and generate new and convincingly human content based on prompts, as well as its accessibility in the form of chatbots, search engines, and image-generating online platforms, generative AI has exploded in popularity.
Generative AI has the potential to transform, if not reshape, various sectors. In June 2023, a global management consulting firm estimated that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually to the global economy2. Tech giants are already exploring the use of generative AI technologies for their productivity software, which could provide a productivity boost for countless businesses downstream. A popular type of generative AI model, Large Language Models (LLM), upon which AI chatbots are based, can increase efficiency by assisting with drafting documents, crafting personalized content and business ideas, responding to customer inquiries, and more. In the legal industry, some law firms have started to use generative AI tools to automate and enhance various aspects of legal work, such as contract analysis, due diligence, litigation, and regulatory compliance. In the education sector, by using the right prompts, teachers may enlist the assistance of generative AI to develop lesson plans, and students may use the same to facilitate their understanding of new knowledge.
Privacy and ethical risks
Along with its growth and potential benefits, generative AI also presents a myriad of privacy and ethical risks. AI chatbots based on LLM leverage deep learning technology to analyze and learn from massive amounts of unstructured data without supervision. This data often comprises publicly available text scraped from the internet, which may include personal data. For instance, reports have indicated that as many as 300 billion words may have been scraped from the internet to train AI chatbots. Insofar as data protection laws are concerned, personal data is typically required to be collected from data subjects on an informed basis and through fair means. As many AI developers rarely disclose details about their training datasets, it remains to be seen if such requirements are indeed complied with.
Apart from input data, the outputs of AI chatbots may also pose privacy concerns. Depending on whether an opt-out option is available, user conversations may become new training data for the LLMs behind an AI chatbot. If users inadvertently feed personal data to an AI chatbot, the data is susceptible to misuse beyond the original purpose without consent, thereby contravening the limitation of the use of data principle. Indeed, an AI chatbot may produce an output response containing personal data that has been removed from the original context.
In addition, generative AI developers may run into challenges concerning the rights of data subjects to access and correct their personal data, as well as the retention of such personal data. Where outdated and/or inaccurate personal data has been stored in the LLM, many developers of AI chatbots have already admitted that accessing, correcting, and deleting such data could be difficult, if not impossible.
Furthermore, the data security risks of storing large amounts of data in an AI chatbot's model and database caused by malicious external threats and accidental leakage are apparent. In March 2023, an AI chatbot suffered a major data breach, exposing the titles of conversation history, the names, email addresses, and last four digits of the credit card numbers of some of its users3. Then in June 2023, over 100,000 account credentials from the same AI chatbot were reportedly leaked and placed on the dark web for sale. In this regard, data protection laws generally require developers and operators of AI chatbots who are data users to ensure that the personal data that they hold is protected against unauthorized or accidental access, processing, erasure, loss, or use.
To further complicate the picture, copyright issues, the ethical risks of inaccurate content, and discriminatory and biased output behind the use of generative AI cannot be overlooked. As the data scraped from the internet for training generative AI models may be copyrighted, concerns have been emerging about whether the use of content generated by these AI models may result in a breach of such rights. For example, various media associations in Japan issued a joint statement in August 2023 to express concerns regarding how the use of copyrighted material for machine learning purposes may 'unreasonably prejudice the interests of the copyright owner.' Moreover, the 'garbage in, garbage out' problem has always been an issue for AI models. Owing to inaccurate training data, AI chatbots may provide incorrect yet seemingly plausible information, which is often referred to as 'hallucination.' If training data for AI models have embedded elements of bias and prejudice (such as those relating to racial, gender, and age discrimination), the AI may in turn generate discriminatory content or biased output. Besides, generative AI models can be easily exploited by bad actors. A case in point is 'deepfake,' where fake audio, images, or videos would be synthesized and deployed to spread fake news or harmful propaganda. AI chatbots could also be asked to generate codes for malware.
Regulatory landscape of AI
While there remains a lack of consensus regarding how AI should be best regulated, various measures have been adopted or proposed by governments or regulators around the world to address the risks involved.
As part of their initial response, some governments and regulators have been issuing guidelines on AI and recommending organizations deploying generative AI in their operations to pay heed to AI governance frameworks and ethical principles. In Hong Kong, for example, the Office of the Privacy Commissioner for Personal Data issued the 'Guidance on the Ethical Development and Use of Artificial Intelligence'4 in 2021 to help organizations develop and use AI systems in a privacy-friendly and ethical manner and in compliance with the local privacy law. The Guidance recommends internationally recognized ethical AI principles covering accountability, human oversight, transparency and interpretability, fairness, data privacy, beneficial AI, and reliability, robustness, and security. In September 2021, the Government of Mainland China also issued the 'Guidance on the Ethics of the New Generation AI'5, which adopts similar principles such as enhancing human well-being, promoting fairness and justice, and protecting privacy and safety.
To take it further, Mainland China seeks to directly regulate generative AI through the 'Interim Measures for the Management of the Services by Generative AI'6 issued by the Cyberspace Administration of China (CAC), which came into effect on August 15, 2023. Apart from expressly requiring providers of generative AI products and services to comply with the Personal Information Protection Law (PIPL), the Interim Measures for the Management of Generative Artificial Intelligence Services also prohibit the generation of illegal and harmful contents and require providers of certain generative AI products and services to submit a security assessment to the CAC before public launch. In addition, comprehensive AI legislation has also been included in Mainland China's 2023 legislative plan of the State Council.
On the other hand, we note that the EU is close to passing the AI Act7, which suggests a prescriptive risk-based approach to regulating all AI systems (including generative AI) and bans certain high-risk AI systems, while Canada is considering a similar law, namely the Artificial Intelligence and Data Act8, which is going through the legislative process. The current US administration is also developing an executive order to promote 'responsible innovation' in relation to AI in the US9.
With AI advancing at a breakneck speed, the importance of robust AI governance which ensures the ethical and responsible development and use of AI simply cannot be overstated. We would call for all stakeholders, including regulators and tech companies, to join hands in ensuring that applicable laws are complied with and core ethical principles such as fairness, transparency, and security are embedded in the development and use of AI. While we are keeping a close eye on the global development in the regulation of new technology, we would remind tech companies that, whatever regulations or standards are put in place, they bear responsibilities in the first place to ensure the lawful and ethical development and use of AI so that the new technology is used for human good.
Ada Chung Lai-Ling Privacy Commissioner for Personal Data, Hong Kong, China
1. See: https://www.whitehouse.gov/wp-content/uploads/2023/07/Ensuring-Safe-Secure-and-Trustworthy-AI.pdf
2. See: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#key-insights
3. See: https://openai.com/blog/march-20-chatgpt-outage
4. See: https://www.pcpd.org.hk/english/resources_centre/publications/files/guidance_ethical_e.pdf
5. See: https://www.most.gov.cn/kjbgz/202109/t20210926_177063.html
6. See: http://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm
7. See: https://www.europarl.europa.eu/doceo/document/TA-9-2023-0236_EN.pdf
8. See: https://ised-isde.canada.ca/site/innovation-better-canada/en/artificial-intelligence-and-data-act
9. See: https://www.whitehouse.gov/briefing-room/statements-releases/2023/03/02/fact-sheet-biden-harris-administration-announces-national-cybersecurity-strategy/