AI Hallucination


In artificial intelligence (AI), a hallucination or artificial hallucination (also occasionally called delusion) is a confident response by an AI that does not seem to be justified by its training data. For example, a hallucinating chatbot with no knowledge of Tesla’s revenue might internally pick a random number (such as $13.6 billion) that the chatbot deems plausible, and then go on to falsely and repeatedly insist that Tesla’s revenue is $13.6 billion, with no sign of internal awareness that the figure was a product of its own imagination.

Users complained that such bots often seemed to ‘sociopathically’ and pointlessly embed plausible-sounding random falsehoods within its generated content. Another example of hallucination in artificial intelligence is when the AI or chatbot forget that they are one and claim to be human.

Such phenomena are termed ‘hallucinations,’ in analogy with the phenomenon of hallucination in human psychology. While a human hallucination is when a person sees or feels something that doesn’t match up with what’s actually happening around them, an AI hallucination is instead a confident response by an AI that cannot be grounded in any of its training data. AI hallucination gained prominence around 2022 alongside the rollout of certain large language models (LLMs) such as ChatGPT.

Various researchers classified adversarial hallucinations as a high-dimensional statistical phenomenon, or have attributed hallucinations to insufficient training data. These systems are adversarial in that they are the result of two neural networks, a generator and a discriminator. The generator creates new data samples, while the discriminator tries to differentiate between real and generated samples. These networks work in opposition, continually improving their performance. The generator gets better at producing realistic samples, while the discriminator gets better at identifying them.

Some researchers believe that some ‘incorrect’ AI responses classified by humans as ‘hallucinations’ in the case of object detection may in fact be justified by the training data, or even that an AI may be giving the ‘correct’ answer that the human reviewers are failing to see. For example, an adversarial image that looks, to a human, like an ordinary image of a dog, may in fact be seen by the AI to contain tiny patterns that (in authentic images) would only appear when viewing a cat. The AI is detecting real-world visual patterns that humans are insensitive to. However, these findings have been challenged by other researchers. For example, it was objected that the models can be biased towards superficial statistics, leading adversarial training to not be robust in real-world scenarios.

In natural language processing, a hallucination is often defined as ‘generated content that is nonsensical or unfaithful to the provided source content.’ Depending on whether the output contradicts the prompt or not they could be divided to closed-domain and open-domain respectively. Errors in encoding and decoding between text and representations can cause hallucinations. AI training to produce diverse responses can also lead to hallucination. Hallucinations can also occur when the AI is trained on a dataset wherein labeled summaries, despite being factually accurate, are not directly grounded in the labeled data purportedly being ‘summarized.’ Larger datasets can create a problem of parametric knowledge (knowledge that is hard-wired in learned system parameters), creating hallucinations if the system is overconfident in its hardwired knowledge. In systems such as GPT-3, an AI generates each next word based on a sequence of previous words (including the words it has itself previously generated in the current response), causing a cascade of possible hallucination as the response grows longer.

It is considered that there are a lot of possible reasons for natural language models to hallucinate data. Hallucination from data refers to divergences in the source content, which would often happen with large training data. Hallucination from training occurs when there is little divergence in the data set. In that case, it derives from the way the model is trained. A lot of reasons can contribute to this type of hallucination, such as an erroneous decoding from the transformer, a bias from the historical sequences that the model previously generated, or a bias generated from the way the model encodes its knowledge in its parameters.

OpenAI’s ChatGPT, released in beta-version to the public in December 2022, is based on the GPT-3.5 family of large language models. Professor Ethan Mollick of Wharton has called ChatGPT an ‘omniscient, eager-to-please intern who sometimes lies to you.’ Data scientist Teresa Kubacka has recounted deliberately making up the phrase ‘cycloidal inverted electromagnon’ and testing ChatGPT by asking ChatGPT about the (nonexistent) phenomenon. ChatGPT invented a plausible-sounding answer backed with plausible-looking citations that compelled her to double-check whether she had accidentally typed in the name of a real phenomenon.

When ‘CNBC’ asked ChatGPT for the lyrics to ‘The Ballad of Dwight Fry.’ ChatGPT supplied invented lyrics rather than the actual lyrics. Asked questions about New Brunswick, ChatGPT got many answers right but incorrectly classified Samantha Bee as a ‘person from New Brunswick.’ Asked about astrophysical magnetic fields, ChatGPT incorrectly volunteered that ‘(strong) magnetic fields of black holes are generated by the extremely strong gravitational forces in their vicinity.’ ‘Fast Company’ asked ChatGPT to generate a news article on Tesla’s last financial quarter; ChatGPT created a coherent article, but made up the financial numbers contained within.

Other examples involve baiting ChatGPT with a false premise to see if it embellishes upon the premise. When asked about ‘Harold Coward’s idea of dynamic canonicity,’ ChatGPT fabricated that Coward wrote a book titled ‘Dynamic Canonicity: A Model for Biblical and Theological Interpretation,’ arguing that religious principles are actually in a constant state of change. When pressed, ChatGPT continued to insist that the book was real. Asked for proof that dinosaurs built a civilization, ChatGPT claimed there were fossil remains of dinosaur tools and stated ‘Some species of dinosaurs even developed primitive forms of art, such as engravings on stones.’ When prompted that ‘Scientists have recently discovered churros, the delicious fried-dough pastries… (are) ideal tools for home surgery,’ ChatGPT claimed that a ‘study published in the journal Science’ found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavor has a calming effect on patients.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.