Deep Web

The Deep Web (also called the Invisible Net, the Deep Web, Undernet, or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexable by standard search engines. It should not be confused with the ‘dark Internet,’ the computers that can no longer be reached via Internet, or with the distributed filesharing network ‘Darknet,’ which could be classified as a smaller part of the Deep Web. Mike Bergman, founder of ‘BrightPlanet,’ credited with coining the phrase, has said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web’s information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot ‘see’ or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.

Deep Web resources may be classified into one or more of the following categories: Dynamic content; Unlinked content; Private Web (password-protected resources); Contextual Web (pages with content varying for different access contexts, e.g. IP addresses or previous navigation sequence); Limited access content (sites that limit access to their pages in a technical way, e.g., CAPTCHA); Scripted content (JavaScript, Flash, and some Ajax solutions); Non-HTML/text content (textual content encoded in images and video); and Non-HTTP content (FTP and Gopher protocol). To discover content on the Web, search engines use web crawlers that follow hyperlinks. This technique is ideal for discovering resources on the surface Web but is often ineffective at finding deep Web resources. Researchers have been exploring how the deep Web can be crawled in an automatic fashion. Commercial search engines have also begun exploring alternative methods to crawl the deep Web. The Sitemap Protocol (first developed by Google) and mod oai are mechanisms that allow search engines and other interested parties to discover deep Web resources on particular Web servers. The lines between search engine content and the deep Web have begun to blur, as search services start to provide access to part or all of once-restricted content. An increasing amount of deep Web content is opening up to free search as publishers and libraries make agreements with large search engines. In the future, deep Web content may be defined less by opportunity for search than by access fees or other types of authentication.

One Comment to “Deep Web”

  1. The Deep Web is largely boring for most people because it’s a vast database store but if you know where to look and what to look for it is truly amazing. More amazing still are the hidden networks. I knew diddly-squat until someone posted a link to the ebook Deep Web Secrecy and Security which shows you in very simple terms how to use these hidden networks to fend off corporate intrusion and unwanted attention, not to mention how to see banned websites and communicate securely.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 818 other followers