In corpus linguistics, a hapax [hah-paks] legomenon [luh-gaa-muh-naan] (sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text.

The term is sometimes incorrectly used to describe a word that occurs in just one of an author’s works but more than once in that particular work. Hapax legomenon is a transliteration of Greek: ‘being said once.’

The related terms ‘dis legomenon,’ ‘tris legomenon,’ and ‘tetrakis legomenon’ respectively refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomena are quite common, as predicted by Zipf’s law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the ‘Brown Corpus of American English,’ about half of the 50,000 distinct words are hapax legomena within that corpus.

Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, also called an occasionalism, a lexeme created for a single occasion to solve an immediate problem of communication. Nonce words often appear several times in the work which coins it.

Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical hapax legomena sometimes pose problems in Hebrew translation. Hapax legomena also pose challenges in natural language processing (computer interpretation of text).

Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in ‘The Problem of the Pastoral Epistles’ (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author’s corpus indicates his or her vocabulary and is characteristic of the author as an individual.

Harrison’s theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations.

There are also subjective questions over whether two forms amount to ‘the same word’ in regard to plural forms. The ‘Jewish Encyclopedia’ points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms. Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. They are known in Japanese as ‘kogo,’ literally ‘lonely characters.’

Notable English literary hapaxes include ‘floather’ (a synonym for snowflake, from an thirteenth century English manuscript entitled ‘The XI Pains of Hell’), ‘hebenon’ (a poison referred to in Shakespeare’s ‘Hamlet’ only once), ‘indexy'(used in Bram Stoker’s ‘Dracula’ as an adjective to describe an odd situational state), ‘nortelrye’ (a word for ‘education’ that occurs only once in Chaucer’s work), and Sassigassity (a type of audacity occuring only once in Dickens’s short story ‘A Christmas Tree.’

