Spamdexing

google penguin

black hat seo

In computing, spamdexing (also known as search spam or Search Engine Poisoning) is the deliberate manipulation of search engine indexes. The earliest known reference to the term is by Eric Convey in 1996 in an article, ‘Porn sneaks way back on Web,’ for ‘The Boston Herald.’ It involves a number of methods, such as repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Common spamdexing techniques can be classified into two broad classes: content spam (or term spam) and link spam. Content spam methods include keyword stuffing, hidden or invisible text, meta-tag stuffing, doorway pages, scraper sites, and article spinning. Link spamming methods include link farms, hidden links, Sybil attacks, spam blogs, page hijacking, buying lapsed domains (cybersquatting), and cookie stuffing.

Keyword stuffing involves the calculated placement of keywords within a page to raise the keyword count, variety, and density of the page. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is consistent with other sites created specifically to attract search engine traffic. Also, large webpages are truncated, so that massive dictionary lists cannot be indexed on a single webpage.

Unrelated hidden text is disguised by making it the same color as the background, using a tiny font size, or hiding it within HTML code. People screening websites for a search-engine company might temporarily or permanently block an entire website for having invisible text on some of its pages. However, hidden text is not always spamdexing: it can also be used to enhance accessibility. Meta-tag stuffing involved repeating keywords in the Meta tags, and using meta keywords that are unrelated to the site’s content. This tactic has been ineffective since 2005. Gateway or doorway pages are low-quality web pages created with very little content but are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have ‘click here to enter’ on the page.

Scraper sites sites, are created using various programs designed to ‘scrape’ search-engine results pages or other sources of content and create ‘content’ for a website. The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. Such websites are generally full of advertising (such as pay-per-click ads), or they redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names. Article spinning involves rewriting existing articles, as opposed to merely scraping content from other sites, to avoid penalties imposed by search engines for duplicate content. This process is undertaken by hired writers or automated using a thesaurus database or a neural network.

Link spam is defined as links between pages that are present for reasons other than merit. Link spam takes advantage of link-based ranking algorithms, which gives websites higher rankings the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm. Link farms are tightly-knit communities of pages referencing each other, also known humorously as mutual admiration societies.

A Sybil attack is the forging of multiple identities for malicious intent, named after the famous multiple personality disorder patient ‘Sybil’ (Shirley Ardell Mason). A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs (known as spam blogs). Spam blogs are blogs created solely for commercial promotion and the passage of link authority to target sites. Often these “splogs” are designed in a misleading manner that will give the effect of a legitimate website but upon close inspection will often be written using spinning software or very poorly written and barely readable content. They are similar in nature to link farms.

Page hijacking is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler but redirects web surfers to unrelated or malicious websites. Some link spammers monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages. Sowever Google resets the link data on expired domains. Some of these techniques may be applied for creating a Google bomb, this is, to cooperate with other users to boost the ranking of a particular page for a particular query.

Cookie stuffing involves placing an affiliate tracking cookie on a website visitor’s computer without their knowledge, which will then generate revenue for the person doing the cookie stuffing. This not only generates fraudulent affiliate sales, but also has the potential to overwrite other affiliates’ cookies, essentially stealing their legitimately earned commissions. Web sites that can be edited by users can be used by spamdexers to insert links to spam sites if the appropriate anti-spam measures are not taken. Automated spambots can rapidly make the user-editable portion of a site unusable. Programmers have developed a variety of automated spam prevention techniques to block or at least slow down spambots.

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.