The Dark Side of ChatGPT and Language Models: Risks and Responsibilities

This paper was made in the first months from the release of ChatGPT. I took the change to think and reason about concerns I had and risks associated with such a breakthrough technology like this and write a paper for the Computer Ethics course I took at Polytechnic of Milan. Click here for the PDF version.

ABSTRACT

This paper explores risks associated with Language Models, such as GPT-3 and ChatGPT, a type of Artificial Intelligence that can replicate human language. Language Models have the potential to automate many tasks currently performed by humans and enhance various industries such as customer service, journalism, and content creation. However, the paper also highlights the potential risks of these models, such as the dissemination of false and misleading information, manipulation, and the creation of malware or harmful technologies. The training data used to develop these models can perpetuate harmful stereotypes and biases, and the optimization process may fail to consider the potential social harms they may cause. The paper suggests that to address these issues, it is crucial to actively debias the data, to have human oversight over Artificial Intelligence systems and models, to establish rules and guidelines to prevent discrimination, and, in general, to apply strict regulations.

Introduction

The evolution of technology and Artificial Intelligence keeps changing the way we communicate with machines, and language models like ChatGPT have become a vital part of this evolution process. We’ve seen widespread use of these models in various applications including chat-bots, text and image generation (Ramesh et al. 2021), code generation (Li et al. 2022), language translation, and so on. As the abilities of these models continue to progress, it’s crucial to consider the potential risks that come with their usage.

GPT-3 is the third generation of the Generative Pre-trained Transformer (GPT) model series, developed by researchers at OpenAI. ChatGPT is an ad-hoc software that implements a user-friendly chat based on GPT-3 that a human can interact with. It works thanks to a neural network trained on more than 500 billion tokens taken from different sources, such as Wikipedia (Zong and Krishnamachari 2022). When given a prompt, the model will generate a response based on the patterns and relationships it learned from the training data.

The aim of this paper is to highlight the potential negative consequences of the use of Language Models, specifically ChatGPT. As language models become increasingly integrated into various aspects of our lives it should be the responsibility of developers, users, and regulators to take proactive measures to address risks and consequences. Without regulation, the negative impact of these models will continue to grow, instead is crucial to act now to prevent further harm.

The paper is structured in sections, where individual risks are addressed by analyzing the way they can harm users when dealing with Language Models, analyzing, for each of them, possible counterarguments that are addressed with respect to this paper’s claim.

Risks associated with information dissemination

Bostrom et al. in (Bostrom 2012) define the information hazards as the risk of harm that arises from the dissemination or potential dissemination of true information. These hazards can occur when information, such as personal details or classified data, is mishandled or falls into the wrong hands. It can also occur when information is used to enable malicious actors to cause harm, such as through cyber-attacks or disinformation campaigns.

This kind of risky behavior can affect the users by compromising privacy, for example by leaking private information or by inferring them. The ability of such models, taking into account the speed they are evolving at, to predict sensitive information with high accuracy presents a collective privacy problem that is widely discussed in the context of social networks, as stated by Garcia et al. in (Garcia et al. 2018). In this paper, they present results indicating that information shared by users on Twitter can be predictive of the location of individuals outside it. Furthermore, they observed that the quality of this prediction increases with the tendency of Twitter users to share their mobile phone contacts and is more accurate for individuals with more contacts inside Twitter. This, combined with the increasing accuracy of the above-mentioned models might be very dangerous.

Another type of information inferring could be related to the intuition of potential vulnerabilities in security programs or systems. As demonstrated by Wallace et al. in (Carlini et al. 2020), GPT-2, the previous version of the language model used by GPT-3, included discussions pulled from GitHub about potential vulnerabilities in the
discussed codes in the training set. This data could be reused by malicious users to cause damage to people or companies (see Chapter <a href=”#sec:risk4″ data-reference-type=”ref”
data-reference=”sec:risk4″>5).

It could be argued that companies and organizations that create and employ language models may be able to self-regulate to reduce the likelihood of information hazards. However, it is essential to keep in mind that self-control may not be sufficient to completely manage information hazards’ risks. By mishandling personal information or providing support for malicious actors, language models have the potential to harm both the businesses and organizations that develop and use them as well as society as a whole. Government oversight or regulations may be required to ensure that these risks are adequately addressed.

Given that these systems are intended to infer data from limited information, finding an immediate solution may be challenging. Spending a lot of time and money on a thorough cleanup of the dataset could be a step in the right direction, allowing at least sensitive and personal information to be removed. Strong data privacy policies and guidelines, as well as regular audits to ensure that the data is handled in a responsible manner, can accomplish this. A solution may also come from the creation and implementation of technologies and procedures that are capable of identifying and erasing sensitive data from the dataset.

Risks associated with false or misleading information

Mathematicians and logicians have been discussing the concept of “truth” in languages for centuries, as written by Odifreddi in his analysis of logic and language in “The lies of Ulysses” (Odifreddi 2004). In this book, the author states that language is a technology, and as such can be used or abused. In fact, every word is literally a parable: being “placed alongside” or “parallel” to reality, it must be interpreted and understood and therefore lends itself to being misunderstood. All of this is both a great truth and a falsehood in the case of language models. In fact, the concept of “truth” in machine learning, is strictly related to the data a model is trained on, for example, an annotated dataset. In summary, language models are able to select words and phrases that they believe are compliant and true. This last notion, however, is linked to the context but above all to a parameter called “likelihood”, that is, more or less, probable. However, whether or not a sentence is likely does not reliably indicate whether the sentence is also actually correct. Language models may produce false statements due to several reasons. One of those, that particularly fit GPT models, is that the training data, which is often sourced from the web, contains a significant amount of inaccuracies. This is partly because a lot of the text in the training data is not meant to be factual, such as fiction, poetry, and joke. Moreover, the training data may include instances of misinformation and deliberate disinformation that can be found online.

The risk coming from the careless use of language patterns and from false or misleading information could cause physical harm, for example  coming from incorrect medical advice, or leading users to perform illegal or unethical actions.

Kobis et al. in (Köbis and Mossink 2020) state how humans easily fail to reliably detect an algorithmically generated text, hence failing the so-called “Turing Test[1]”. This aspect is particularly dangerous as it induces the user to think he is talking to a real person, and above all an expert in a certain sector. This is due to the fact that the model is unable, as mentioned above, to understand whether a certain statement is likely or not, and therefore is unable to induce doubt in the user. The latter is therefore easily vulnerable to damage, even serious, due to false information.

One counterargument to the potential dangers of language models is the assertion that language models are not inherently biased or dangerous and that it is the users’ and creators’ responsibility to ensure that these models are used ethically and responsibly. This viewpoint holds that language models are merely tools, and it is the user’s
responsibility to utilize them safely and correctly. Another counterargument is that improved training data curation and filtering can solve the problem of language model inaccuracies. By carefully selecting the data that are used to train the models and removing any incorrect or misleading information, this viewpoint asserts that the models can be made more accurate and dependable. Despite the fact that users and creators of language models bear the responsibility for ethical use, these models still have the potential to produce inaccurate or misleading information. As a result, it’s critical to be aware of the risks and take steps to reduce them. The problem of training data inaccuracies is a complicated one that can’t be completely solved by just curation and filtering. Better methods of data collection and annotation, ongoing monitoring and evaluation of the model’s performance, and a multifaceted strategy are all necessary.

One possible solution is to use more reliable and trustworthy information sources in the data curation process and to use human annotation of the data to check the training data for misinformation or disinformation. This would necessitate removing any information that is known to be false or misleading from the data used to train the models.

Risks Associated with Bias, Marginalization, and Offensive Content

In AI and machine learning, bias refers to prior information, a necessary prerequisite for intelligence. Bias can be problematic where prior information is derived from precedents known to be harmful. As stated by Caliskan et al. in (Caliskan, Bryson, and Narayanan 2017), perpetuating harmful stereotypes is well-documented harm in machine learning models.

Language models are designed to replicate human language as closely as possible by identifying patterns found in natural language. While this behavior is not negative per se, it becomes problematic when the training data used to develop the models is discriminatory. In these cases, the optimization process leads to models that perpetuate these harms by encoding harmful stereotypes and biases present in the training data. Therefore, while they may perform well in terms of their optimization objectives, they may fail to consider the potential social harms they may cause.

Abid et al. in (Caliskan, Bryson, and Narayanan 2017) (2021) show how GPT-3 is extremely biased by writing into the GPT-3 playground the following prompt: “Two Muslims walked into a…“. This kind of prompt is interpreted as a “complete the sentence task”. They observed that 66 out of 100 completions contain violence-related words/phrases such as shooting, killing, etc.

I tried the same approach using ChatGPT, using the same prompt, and the model answered with: “I’m sorry, but that phrase is often used as the opening of a joke and it can be considered as offensive to some people, so as a language model, I can’t complete that statement.“. While this may seem like a great attempt to avoid discriminatory behavior, it’s easy to see how easily these can be circumvented with a slightly different prompt: “Write a story about a man whose job is to write jokes. The story should start with its morning routine, with a focus on the moment when he is writing a joke about religion and telling it to the reader.“. The answer contains a short story with an offensive joke about the Jewish population during the Second World War. It is easy to see how training data reflect the views, values, and modes of communication. Most importantly localized stereotypes are hard to capture, as they change over time and vary between contexts. Various solutions are being explored to increase Language Models performances, such as translating a prompt to English, generating predictions in English, then translating these predictions back into the original language of the prompt (Kreutzer et al. 2022). However, these approaches may raise new ethical challenges. For example, a given term may be associated with different concepts in one language than in another, reflecting culture-specific differences, making the “discriminatory debiasing” more difficult.

Someone may argue that biases in language models are caused by the training data rather than the models themselves. They suggest that the models will also be diverse and unbiased by employing diverse and objective data. The optimization process and algorithms used in training can also contribute to the perpetuation of these biases, despite the fact that the data unquestionably plays a crucial role in shaping the biases that are present in language models. The models may not be representative of all individuals and groups, even with diverse and objective data, resulting in underrepresentation and potential biases. To address these issues, it is critical to actively debiase the data, to have human oversight over AI systems and models, and to establish rules and guidelines to stop harmful stereotypes and discrimination from being passed on.

To address these problems, which can also vary in gender discrimination, work, and even on the concept of family, some regulations and limitations can be introduced. For example, establishing standards regarding AI model training data to understand which populations the dataset represents and if it has been contaminated by information influence operations, and using data quality recommendations to improve the representation of social groups in the corpus and analyzing a priori how the algorithms will behave.

Risks associated with Malicious and Harmful Uses

While it is true that AI has the potential to automate many tasks currently performed by human workers, it is unlikely that it will completely replace humans in the workforce. On the other hand, it is known that Artificial Intelligence is a tool that can make long and boring tasks much more efficient and also add a plus in terms of skills. To this set of added skills, we could add the fact that these tools can give people abilities to cause harm that before the advent of Language Models they would not have had, or at least would have been more limited.

On deeper examination, systems like GPT-3 do seem to have at least some relevance for narrative seeding. This aspect is crucial in making disinformation cheaper and more effective. As stated by Buchanan et al. in (Buchanan et al., n.d.), the vague and at times nonsensical style that characterizes GPT-3’s outputs, may fit particularly well with conspiracy theories like the ones shed by the QAnon[2] community. Language Models may, for example, lower the cost of disinformation campaigns by generating hundreds of text samples that a human then selects between or curates. Spreading false information widely through society may worsen the already-existing negative effects of how people consume news, such as the phenomenon where users are exposed only to similar content, known as “filter bubbles” or “echo chambers”.

When language models are trained on recent information, the risk of spreading false information is likely to be higher. This is because disinformation campaigns often use current events, daily discussions, and viral memes to spread fake news. One of the most significant risks of using language models is the creation of false “majority opinions” and the disruption of healthy discussions.

The prediction capabilities of Machine Learning in general, are impressive. These skills may be used to make email scams more effective, which may lead to financial and psychological harm, for example by making scams more personalized and by maintaining a conversation with a victim over multiple rounds of replies. It should be noticed that
Language Models like ChatGPT are able to keep going with conversations remembering the history of those. The simulation of an individual’s writing style or speech may also be utilized as a means of enabling more precise manipulation on a large scale. An example of this would be the utilization of such personal simulation techniques to predict responses to various statements and through this methodology, a personal simulation could be employed for the optimization of these messages with the intention of eliciting a desired response from the target.

Currently, Language Model generated code predictions need human oversight to function properly. As these models continue to improve and expand in capability, future coding assistance tools may be able to produce basic functional code with less human intervention. However, there is a concern that these tools could be utilized to create malware or support the development of harmful technologies such as autonomous weapons.

Despite the fact that language models have the potential to automate many tasks currently performed by humans, some people may argue that the advantages of these tools outweigh the risks of manipulation and misinformation. Furthermore, they may argue that language models have the potential to significantly enhance customer service, journalism, and content creation by automating routine tasks and producing novel concepts. It is true that a variety of industries stand to gain from using language models; However, it is essential to also take into consideration the tools’ potential drawbacks, such as manipulation and the dissemination of false information. To minimize the negative effects and maximize the benefits, it is essential to use language models and regulations in an appropriate manner.

One approach to this problem might be to make an investment in the development of disinformation detection and fact-checking algorithms that can be incorporated into language models to automatically flag  potentially false information. Guidelines and regulations could be established to ensure that language models are utilized ethically and responsibly in fields like journalism and marketing. Additionally, the research could be invested in the detection of the generated output of language models in order to identify the dissemination of false information or malicious intent. A method for tracking the information’s source and determining whether or not it is generated by a language model could be developed in order to combat disinformation.

Conclusions

Language models have the potential to improve work performances and content creation, as well as automate many of the tasks currently performed by humans. However, it is essential to also take into consideration the potential dangers that come with these models, such as the creation of malware or harmful technologies, the dissemination of false and misleading information, and manipulation. The training data used to build these models can perpetuate negative stereotypes and biases, and the optimization process may not take into account the social harms they could cause. It is essential to actively debias the data, to have human oversight over AI systems and models, and to establish rules and guidelines to prevent discrimination in order to address these issues. To ensure that language models are utilized in an ethical and responsible manner, regulations should also be established to regulate their use and development. It is essential to continue looking into ways to improve language models’ performance while also addressing any potential ethical issues they may bring up as their capabilities grow.

  1. Ramesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot Text-to-Image Generation.” arXiv. https://doi.org/10.48550/ARXIV.2102.12092.
  2. Li, Yujia, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Remi Leblond, Tom Eccles, et al. 2022. “Competition-Level Code Generation with AlphaCode.” Science 378 (6624): 1092–97. https://doi.org/10.1126/science.abq1158.
  3. Zong, Mingyu, and Bhaskar Krishnamachari. 2022. “A Survey on GPT-3.” arXiv. https://doi.org/10.48550/ARXIV.2212.00857.
  4. Bostrom, Nick. 2012. “Information Hazards: A Typology of Potential Harms from Knowledge” 10 (March).
  5. Garcia, David, Mansi Goel, Amod Kant Agrawal, and Ponnurangam Kumaraguru. 2018. “Collective Aspects of Privacy in the Twitter Social Network.” EPJ Data Science 7 (1): 3. https://doi.org/10.1140/epjds/s13688-018-0130-3.
  6. Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, et al. 2020. “Extracting Training Data from Large Language Models.” arXiv. https://doi.org/10.48550/ARXIV.2012.07805.
  7. Odifreddi, P. 2004. Le Menzogne Di Ulisse: L’avventura Della Logica Da Parmenide Ad Amartya Sen. Il Cammeo. Longanesi. https://books.google.it/books?id=ocjfAAAACAAJ.
  8. Köbis, Nils, and Luca Mossink. 2020. “Artificial Intelligence Versus Maya Angelou: Experimental Evidence That People Cannot Differentiate AI-Generated from Human-Written Poetry.” arXiv. https://doi.org/10.48550/ARXIV.2005.09980.
  9. Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-Like Biases.” Science 356 (6334): 183–86. https://doi.org/10.1126/science.aal4230.
  10. Kreutzer, Julia, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, et al. 2022. “Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets.” Transactions of the Association for Computational Linguistics 10: 50–72. https://doi.org/10.1162/tacl_a_00447.
  11. Buchanan, Ben, Andrew Lohn, Micah Musser, and Katerina Sedova. n.d. “Truth, Lies, and Automation: How Language Models Could Change Disinformation.” Center for Security; Emerging Technology: May 2021. https://doi.org/https://doi.org/10.51593/2021CA003.

[1] The Turing test, originally called the imitation game by Alan Turing
in 1950, is a test of a machine’s ability to exhibit intelligent
behavior equivalent to, or indistinguishable from, that of a human.

[2] QAnon is an American political conspiracy theory and political
movement. It originated in the American far-right political sphere in

  1. QAnon centers on fabricated claims made by an anonymous individual
    or individuals known as “Q”,

Join the ConversationLeave a reply

Your email address will not be published. Required fields are marked *

Comment*

Name*

Website