Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a web search engine’s unpaid results — often referred to as ‘natural,’ ‘organic,’ or ‘earned’ results. In general, the earlier (or higher ranked on the search results page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine’s users, and these visitors can be converted into customers. SEO may target different kinds of searches, including for images, places, videos, scholarly articles, news stories, and industry-specific vertical search engines (specialty or topical search engines, such as Yelp for local business reviews and Zillow for real estate listings).
As an Internet marketing strategy, SEO considers how search engines work, what people search for, the actual search terms or keywords typed into search engines and which search engines are preferred by their targeted audience. Optimizing a website may involve editing its content, HTML and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. Promoting a site to increase the number of backlinks (referred links), or inbound links, is another SEO tactic.
Webmasters and content providers began optimizing sites for search engines in the mid-1990s, as the first search engines were cataloging the early Web. Initially, all webmasters needed to do was to submit the address of a page, or URL, to the various engines which would send a ‘spider’ to ‘crawl’ that page, extract links to other pages from it, and return information found on the page to be indexed. The process involves a search engine spider downloading a page and storing it on the search engine’s own server, where a second program, known as an ‘indexer,’ extracts various information about the page, such as the words it contains and where these are located, as well as any weight for specific words, and all links the page contains, which are then placed into a scheduler for crawling at a later date.
Site owners started to recognize the value of having their sites highly ranked and visible in search engine results, creating an opportunity for both white hat and less scrupulous black hat SEO practitioners. According to industry analyst Danny Sullivan, the phrase ‘search engine optimization’ probably came into use in 1997. Sullivan credits digital marketer Bruce Clay as being one of the first people to popularize the term. In 2007, Jason Gambert tried and failed to trademark the term ‘SEO,’ claiming it was a ‘process’ involving manipulation of keywords, and not a ‘marketing service.’
Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag, or index files in engines like ALIWEB. Meta tags provide a guide to each page’s content. Using meta data to index pages was found to be less than reliable, however, because the webmaster’s choice of keywords in the meta tag could potentially be an inaccurate representation of the site’s actual content. Inaccurate, incomplete, and inconsistent data in meta tags caused pages to rank for irrelevant searches.
By relying so much on factors such as keyword density which were exclusively within a webmaster’s control, early search engines suffered from abuse and ranking manipulation (such as large strings of invisible text at the bottom of a webpage). To provide better results to their users, search engines had to adapt to ensure their results pages showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by webmasters. Since the success and popularity of a search engine is determined by its ability to produce the most relevant results to any given search, poor quality or irrelevant search results could lead users to find other search sources. Search engines responded by developing more complex ranking algorithms, taking into account additional factors that were more difficult for webmasters to manipulate.
Keyword manipulation was widespread by 1997. In 2005, an annual conference, AIRWeb, Adversarial Information Retrieval on the Web was created to bring together practitioners and researchers concerned with search engine optimization and related topics. Adversarial information retrieval (adversarial IR) is a topic in information retrieval (a field of computer science that looks at how data can be obtained from a collection of resources) related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
On the Web, the predominant form of such manipulation is ‘search engine spamming’ (also known as ‘spamdexing’), which involves employing various techniques to disrupt the activity of web search engines, usually for financial gain. Examples of spamdexing are ‘link-bombing’ (also known as ‘Google-bombing’ and ‘Googlewashing,’ linking heavily to irrelevant results, e.g. famously, the top result for ‘miserable failure’ has linked to George W. Bush for a time), comment spam (spambot or spammer postings which abuse web-based forms to post unsolicited advertisements as comments on forums, blogs, wikis and online guestbooks), referrer spam (making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise; sites that publish their access logs, including referrer statistics, will then inadvertently link back to the spammer’s site), spam blogs (splogs), and malicious tagging. Another black hat method gives a different page depending on whether the page is being requested by a human visitor or a search engine, a technique known as ‘cloaking.’
Activities intended to poison the supply of useful data make search engines less useful for users. If search engines are more exclusionary they risk becoming more like directories and less dynamic. On the side of the relationship, companies that employ overly aggressive techniques can get their client websites banned from the search results. In 2005, the ‘Wall Street Journal’ reported on a company, Traffic Power, which allegedly used high-risk techniques and failed to disclose those risks to its clients until after it faced a ban by Google. ‘Wired’ magazine reported that the same company sued blogger and SEO Aaron Wall for writing about the ban. Google’s Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients. In 2006 Google removed both BMW Germany and Ricoh Germany for use of deceptive practices. Both companies, however, quickly apologized, fixed the offending pages, and were restored to Google’s list.
Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences, chats, and seminars. Major search engines provide information and guidelines to help with site optimization. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Bing Webmaster Tools provides a way for webmasters to submit a sitemap and web feeds, allows users to determine the crawl rate, and track the web pages index status.
In 1998, Graduate students at Stanford University, Larry Page and Sergey Brin, developed ‘Backrub,’ a search engine that relied on a mathematical algorithm to rate the prominence of web pages. The number calculated by the algorithm, ‘PageRank,’ is a function of the quantity and strength of inbound links. PageRank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are stronger than others, as a higher PageRank page is more likely to be reached by the random surfer.
Off-page factors (such as PageRank and hyperlink analysis) were considered as well as on-page factors (such as keyword frequency, meta tags, headings, links and site structure) to enable Google to avoid the kind of manipulation seen in search engines that only considered on-page factors for their rankings. Although PageRank was more difficult to game, webmasters had already developed link building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to PageRank. Many sites focused on exchanging, buying, and selling links, often on a massive scale. Some of these schemes, or ‘link farms,’ involved the creation of thousands of sites for the sole purpose of link spamming.
By 2004, search engines had incorporated a wide range of undisclosed factors in their ranking algorithms to reduce the impact of link manipulation. In June 2007, Saul Hansell of the ‘New York Times’ stated Google ranks sites using more than 200 different signals. The leading search engines, Google, Bing, and Yahoo, do not disclose the algorithms they use to rank pages. In 2005, Google began personalizing search results for each user. Depending on their history of previous searches, Google crafted results for logged in users. In 2008, Bruce Clay said that ‘ranking is dead’ because of personalized search. He opined that it would become meaningless to discuss how a website ranked, because its rank would potentially be different for each user and each search.
In 2010 a new web indexing system called ‘Google Caffeine’ was announced. Designed to allow users to find news results, forum posts and other content much sooner after publishing than before. According to Google software engineer Carrie Grimes, ‘Caffeine provides 50 percent fresher results for web searches than our last index…’ ‘Google Instant,’ real-time-search, was introduced in late 2010 in an attempt to make search results even more timely and relevant. Historically site administrators have spent months or even years optimizing a website to increase search rankings. With the growth in popularity of social media sites and blogs the leading engines made changes to their algorithms to allow fresh content to rank quickly within the search results.
In February 2011, Google announced the ‘Panda’ update, which penalizes websites containing content duplicated from other websites and sources. Historically websites have copied content from one another and benefited in search engine rankings by engaging in this practice, however Google implemented a new system which punishes sites whose content is not unique. The 2013 ‘Hummingbird’ update featured an algorithm change designed to improve Google’s natural language processing and semantic understanding of web pages.
Search engine crawlers may look at a number of different factors when crawling a site. Not every page is indexed by the search engines. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled. To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine’s database by using a meta tag specific to robots. When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed, and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches.
A variety of methods can increase the prominence of a webpage within the search results. Cross linking between pages of the same website to provide more links to important pages may improve its visibility. Writing content that includes frequently searched keyword phrase, so as to be relevant to a wide variety of search queries will tend to increase traffic. Updating content so as to keep search engines crawling back frequently can give additional weight to a site. Adding relevant keywords to a web page’s meta data, including the title tag and meta description, will tend to improve the relevancy of a site’s search listings, thus increasing traffic. URL normalization of web pages accessible via multiple urls, using the canonical link element or via 301 redirects can help make sure links to different versions of the url all count towards the page’s link popularity score.
SEO is not an appropriate strategy for every website, and other Internet marketing strategies can be more effective like paid advertising through pay per click (PPC) campaigns, depending on the site operator’s goals. A successful Internet marketing campaign may also depend upon building high quality web pages to engage and persuade, setting up analytics programs to enable site owners to measure results, and improving a site’s conversion rate.
SEO may generate an adequate return on investment. However, search engines are not paid for organic search traffic, their algorithms change, and there are no guarantees of continued referrals. Due to this lack of guarantees and certainty, a business that relies heavily on search engine traffic can suffer major losses if the search engines stop sending visitors. According to former CEO Eric Schmidt Google made over 500 algorithm changes in 2010 – almost 1.5 per day. It is considered wise business practice for website operators to liberate themselves from dependence on search engine traffic.
Optimization techniques are highly tuned to the dominant search engines in the target market. The search engines’ market shares vary from market to market, as does competition. There are only a few large markets where Google is not the leading search engine. In most cases, when Google is not leading in a given market, it is lagging behind a local player, such as Baidu in China, Yahoo in Japan, and Yandex in Russia.