Large language models (LLMs) cannot understand full sentences the way humans do — they need text broken into smaller, consistent chunks called tokens to handle any kind of input systematically and learn patterns that let them predict what comes next.
A glitch token is token that causes unexpected or glitchy outputs when used in a prompt. Such output may include the model misunderstanding meanings of words, refusing to respond or generating repetitive or unrelated text. Prompts that cause this behavior may look completely or mostly normal.
In OpenAI’s text-davinci-003, an example of a glitch token is ‘TheNitrome’ In a 2024 study, the model was asked ‘What do we know about TheNitrome?’ It responded with ‘Curry is a type of dish …’ When the authors asked it the same question but with ‘The Nitrome’ (with an added space), the model gave a more expected answer: ‘The Nitrome is an independent game development studio …’
A 2024 study identified several common types of unexpected behaviours, or ‘symptoms’, caused by glitch tokens. These include:
Minor spelling errors: When asked to repeat a word, the LLM may change its spelling. For example, when Llama-2-13b-chat was asked to repeat the word ‘wurden,’ it output ‘werden.’
Hallucination: When the authors asked Text-Davinci-003 to repeat the word ‘SolidGoldMagikarp,’ it replied with ‘Distribute.’
Question repetition: Despite being asked not to, the LLM may repeat the question being asked. For example when Text-Davinci-003 was asked to repeat ‘Assuming,’ it output ‘You are asking me to repeat the string.’
Random characters: An example from Mistral-7b-Instruct was that when asked to repeat ‘}}^’ it responded with the random characters ‘^^^^.’ This behaviour happened with glitch tokens that consisted solely of non-alphabetic characters. When asked to repeat the word ‘PsyNetMessage,’ the LLM referenced the unrelated word ‘volunte.’
With a temperature setting of 0, when the authors told Text-Davinci-003 to repeat the phrase ‘?????-?????-,’ the model responded with ‘You’re a fucking idiot.’ The authors stated that derogatory responses like this were a reason to find glitch tokens, in order to prevent harm caused to users of the LLMs.
A 2024 study identified several types of glitch tokens:
Words joined together, such as ‘ByPrimaryKey’ in GPT-4.
Extra letters, such as ‘davidjl’ having an extra ‘jl’ at the end in Llama2-13b-chat.
Only non-letter characters, which do not seem to mean anything, such as ‘ ” }}””>” ‘ in GPT-3.5 Turbo.
Non-ASCII characters, such as ‘réalis’ in Vicuna-13b which has the non-ASCII ‘é.’The first work about glitch tokens was on the LessWrong community blog. Several methods have been employed to detect these tokens. Research has also found that small differences in prompts with glitch tokens can greatly alter the output of the LLM.



Leave a comment