Scunthorpe Problem

Dirty Words

The Scunthorpe problem is the unintentional blocking of websites, e-mails, forum posts, or search results by a spam filter or search engine because their text contains a string of letters that appear to have an obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.

The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of contexts, possibly across many cultures, which is an extremely difficult task. As a result, broad blocking rules may result in false positives affecting innocent phrases.

The problem was named after an incident in 1996 in which AOL’s profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town’s name contains the substring ‘cunt.’ Years later, Google’s opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names. In the months leading up to 1996, some web searches for ‘Super Bowl XXX’ were being filtered, because the Roman numeral for the game and the site (XXX) is also used to identify pornography.

In 1998, Jeff Gold attempted to register the domain name shitakemushrooms.com, but he was blocked by an InterNIC filter prohibiting the ‘seven dirty words’ which was active between 1996 and the transfer of control to ICANN in 1998. In 2000, a Canadian television news story on web filtering software found that the website for the Montreal Urban Community (Communauté urbaine de Montréal, in French) was entirely blocked because its domain name was its French acronym CUM (www.cum.qc.ca); ‘cum’ (among other meanings) is English-language slang for semen.

In 2010, it was reported that terms such as ‘lolita’ (as part of the search term ‘Nabokov lolita,’ a search term for Vladimir Nabokov’s novel ‘Lolita’), ‘lolicon,’ ‘incest,’ and ‘whorehouse’ (as part of the musical-with-a-book ‘The Best Little Whorehouse in Texas’s’ title) were hashed (encrypted) on Google’s voice search feature. Although the aforementioned terms were hashed on voice search, it was reported that ‘bestiality’ and ‘Lady Chatterley’s Lover’ were left uncensored.

Gareth Roelofse, the web designer for RomansInSussex.com, noted in 2004, ‘We found many library Net stations, school networks and Internet cafes block sites with the word ‘sex’ in the domain name. This was a challenge for RomansInSussex.co.uk because its target audience is school children.’ Also in 2004, it was reported that the Horniman Museum in London was failing to receive some of its e-mail because filters mistakenly treated its name as a version of the words ‘horny man.’ That year, Craig Cockburn reported that he was unable to use his surname (pronounced ‘Coburn’) with Hotmail. Separately he had problems with his workplace email because of the name of a pharmaceutical, that was often the subject line used on spam or scam emails, being cialis, occurring within his job title of software specialist. He was told by Hotmail to spell his name C0ckburn (with a zero instead of the letter ‘o’); Hotmail later reversed the ban. In 2010 he had a similar problem registering on the BBC site where again the first four characters of his surname caused a problem for the content filter.

In 2006, Linda Callahan, a resident of Ashfield, Massachusetts, was initially prevented from registering her name with Yahoo! as an e-mail address as it contained the substring Allah. Yahoo! later reversed the ban. In 2008, Dr. Herman I. Libshitz could not register an e-mail address containing his name from Verizon because his surname contained the substring ‘shit,’ and Verizon initially rejected his request for an exception. In a subsequent statement, a Verizon spokeswoman apologized for not approving his desired e-mail address.
In 2008, a news site run by the American Family Association filtered an Associated Press article on sprinter Tyson Gay, replacing instances of ‘gay’ with ‘homosexual,’ thus rendering his name as ‘Tyson Homosexual.’

In 2018, Natalie Weiner reported on social media that she was unable to create an account for herself on a website, because her last name is also a word used as slang for penis. It was reported that ‘hundreds’ of people replied saying this also affected them as well. Names of those replying included Ben Schmuck (last name is a Yiddish word for ‘penis’) and Arun Dikshit (last name is Sanskrit for one who teaches or provides knowledge, containing the substring ‘shit’).

Problems can occur with the words ‘socialism,’ ‘socialist,’ and ‘specialist’ because they contain the substring Cialis, the brand name for an erectile dysfunction medication commonly advertised in spam e-mails. Blocking of the word ‘specialist’ is liable to block emailed résumés and curricula vitarum and other material including job descriptions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.