The private neighborhoods of the Invisible WebBy Pita Enriquez Harris
It turns out that as far as Web pages go, there's no such thing as equality; on the contrary, there's a strictly observed apartheid, just as for whites and coloureds in your laundry.
There are the egalitarian pages of the so-called 'Visible Web' (aka 'surface Web') - the pages that are open to all-comers - site visitors and search engines alike. Then there are the pages that live in private, deluxe neighbourhoods, behind security guards who keep out the riff-raff site-visiting robots of the global search engines - the so-called 'Invisible Web' (aka 'deep' or 'hidden' Web).
On the Web, however, it turns out that most of its byte-encoded inhabitants live in the metaphorical lap of luxury. Whereas most of the world's city-bound population lives in the open neighbourhoods with only the wealthy affording the closed neighbourhoods, according to a new study by BrightPlanet, on the Web the pages of the 'Invisible Web' may outnumber those of the Visible Web by around 500 to 1!
The implications of this are twofold. Firstly, when you search the Web, even with a combination of the largest of the well-known Web search engines, you are searching only a tiny fraction of the content accessible via the Web. Secondly, if you've paid money to have your company details advertised on the Web, be sure that it won't be in the Invisible Web, out of reach to all but those who Know Where To Look.
An example of a Visible Web site is InPharm.com. All of the content in InPharm is accessible to search engines, including the Flexipages directory. Not all Web site owners are so generous, however, locking up their information in databases which are not crawled by the search engine robot programs. Sometimes, for example with the owners of premium content databases such as Kompass or Dun and Bradstreet, this is because the publishers only want that information accessed directly on their own Web sites. In other cases it happens involuntarily, where site content is organised in a database, not always compatible with Web indexing technology.
Some key findings from the BrightPlanet study are as follows: (http://www.completeplanet.com/Tutorials/DeepWeb/summary03.asp)
- "The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web"
- "The deep Web is the largest growing category of new information on the Internet"
- "Total quality content of the deep Web is at least 1,000 to 2,000 times greater than that of the surface Web"
- "Deep Web content is highly relevant to every information need, market and domain More than half of the deep Web content resides in topic specific databases"
- "A full 95% of the deep Web is publicly accessible information – not subject to fees or subscriptions." (from which we learn that it mostly isn't due to the likes of Dun and Bradstreet!)
Major search engine companies differ on their approach to the Invisible Web - Altavista and Google apparently have no plans to index the Invisible We, whilst Matthew Hall, VP of engineering for Inktomi (the engine behind HotBot, MSN and others) believes it to be the future of search.
Meanwhile companies like Intelliseek and BrightPlanet are putting a major patch on the problem by providing directories of the search engines that search the Invisible Web, and software that alllows you to tap into hundreds of search engines, right from the desktop. Invisibleweb.com (www.invisibleweb.com) from Intelliseek lists over 10,000 searchable archives or databases, whilst CompletePlanet from BrightPlanet similarly lists databases within the 'deep Web'.
With software like BullsEye, Citeline Professional, Reference Manager and Lexibot you can search many of these databases directly from your own computer, downloading results locally. The great power of such software is not just in the additional sources that it makes available to a searcher, but also in the more sophisticated analysis and manipulation of results once they have been returned.
Such applications are in many ways the software equivalent of the Web-based 'vertical portals' ('vortals'), 'vertical search' ('verti-search'), 'vertical industry directories' ('vectories') - portals dedicated to assisting navigation in one particular industry. InPharm.com has its very own 'vectory' - the InPharm Knowledge Database, which lists and indexes pages from around 550 useful industry sites, hand-picked for the InPharm.com audience.
This year has seen the rise of the vortals and vectories, with companies like EoExchange and Business.com developing large collections of industry-specific sites and applying serious indexing power to them - the result being an industry-targeted search engine/directory with some of the muscle of the large Web search engines.
Moreover, the practice of putting a good quality directory of Web links on a site has grown in popularity. Since it turns out that search engines rank Web results partly based on algorithms that analyse linking patterns, it makes good Web marketing sense to equip your site with rich link content. In a recent survey of 100 b2b portal sites, researchers at The Oxford Knowledge Company found that 39% of such b2b sites did in fact have some sort of organised Web links directory. Since that survey VerticalNet, purveyors of 50-odd industry marketplaces, have added EoEnabled vertical search from EoExchange to all of their portals.
One has the sneaking feeling, however, of being deceived. Question: Weren't search engines supposed to solve all our problems of information access? Answer: Yes, but in fact they never did, and if you thought they did then your only mistake was to believe the hype.
The answer, for now (because on matters of the Web, answers are only good 'for now' - for tomorrow, hey, ask someone else - we might be in a totally different business by then) is that as always, it's not just How You Search but Where and Knowing Where to Look.
Knowledge and education would seem to triumph yet again over the quick fix; like all seekers for the universal panacea, I too am thwarted, forced to acknowledge once again, that there's no substitute for practice and doing your homework.
See also



