The Deep Web (also called the Invisible Net, the Deep Web, Undernet, or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexable by standard search engines. It should not be confused with the ‘dark Internet,’ the computers that can no longer be reached via Internet, or with the distributed filesharing network ‘Darknet,’ which could be classified as a smaller part of the Deep Web.
Mike Bergman, founder of ‘BrightPlanet,’ credited with coining the phrase, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web’s information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot ‘see’ or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.
To discover content on the Web, search engines use web crawlers that follow hyperlinks. This technique is ideal for discovering resources on the surface Web but is often ineffective at finding deep Web resources. Researchers have been exploring how the deep Web can be crawled in an automatic fashion. Commercial search engines have also begun exploring alternative methods to crawl the deep Web. The Sitemap Protocol (first developed by Google) and mod oai are mechanisms that allow search engines and other interested parties to discover deep Web resources on particular Web servers.
The lines between search engine content and the deep Web have begun to blur, as search services start to provide access to part or all of once-restricted content. An increasing amount of deep Web content is opening up to free search as publishers and libraries make agreements with large search engines. In the future, deep Web content may be defined less by opportunity for search than by access fees or other types of authentication.