"In Search of the Rosetta Stone"By Pita Enriquez Harris
From: Online and CD-ROM Review August 1999
The Library of Babel
Whilst assisting on the Free Pint stand during Online Information 98, another Free Pint writer, Pablo Dubois (of the International Coffee Organisation) and I agreed that a short story by Jorge Luis Borges ought to be required reading for all information professionals. One of his most celebrated stories, "The Library of Babel" combines a uniquely Borgesian perspective on information and the library and was the inspiration for the Aedificium, scene of the dramatic conclusion of Umberto Eco’s "The Name of the Rose".
The relevance it holds for today’s information specialist is an eerie, almost prescient vision of the World Wide Web. The Library of Babel is a potentially infinite library, in which all knowledge is believed to reside, yet disorganised, labyrinthine, web-like in its interconnectivity and ultimately, seductively distracting: a place where people became (literally) lost for ever.
"The Library exists ab eterno. This truth, whose immediate corollary is the future eternity of the world, cannot be placed in doubt by any reasonable mind."
Borges conceived of the Library of Babel as a physical, yet infinite structure. Writing the story as part of a volume of fictions ("Ficciones", 1956), Borges, a blind librarian, had yet to encounter the world of screen-accessible electronic information. It may not have occurred to him that his library would soon exist in reality – in the ether of a world of interconnected computers.
It is a fable, a story imbued with magical qualities rather than a literally translated idea of a library which holds all human knowledge. For anyone reading it now, with experience as a searcher of the World Wide Web, there are will be inevitable, wry appreciations of the problems shared by the searchers (called inquisitors) of the Library of Babel. And also, of the abiding myth – the index of indexes, the catalogue of catalogues.
"We also know of another superstition of that time: that of the Man of the Book. On some shelf in some hexagon (men reasoned) there must exist a book which is the formula and perfect compendium of all the rest: some librarian has gone through it and he is analogous to a god. In the language of this zone vestiges of this remote functionary’s cult still persist. Many wandered in search of Him. For centuries they exhausted in vain the most varied areas. How could one located the venerated and secret hexagon which housed Him?"
A New Age of Babel
We now live in a new Age of Babel. We have tried to build the perfect information structure, like the Library of Babel, "unlimited and cyclical". We planned to use the perfect tools to navigate this library, tools with the power to interpret the secrets of every fragment of written human knowledge.
And we have suffered the same fate as the builders of the Tower of Babel. Our one language has splintered into thousands. Now, using a single language will help, provided that the information you seek resides in the appropriate neighbourhood. But universality, if it every really existed, has disappeared.
In the infancy of the WWW, one or maybe two languages sufficed to interrogate the sprawling database; Altavista’s crawling robot provided the pathway to documents deposited anywhere in the Web; Yahoo grouped information by a logical system of classification.
Librarians, experts for years in systems of classifying information, were not overly consulted, and an early opportunity to agree on universal rules of information classification was missed. Instead, evolutionary forces have imposed themselves as information struggles for supremacy. In truth, it is hard to envisage how it could have been otherwise. Written information, as a human construct, is likely to be subject to many of the organising principles of other human constructs, such as society. The day that information came to have currency (e.g. advertising) attached to it was the day that evolutionary forces came into play in the world of the Web.
Where once there were a handful of search engines, we now have thousands. The major crawling engines (Altavista, Northern Light, Hotbot, Excite, Infoseek, Lycos) no longer offer, even between all six, the chance to search all the whole Web. Indeed, it was recently reported in Nature that none of these engines search any more than 16% of the Web, with most indexing less than 10% of the Web. Perhaps more worrying is the speed at which this has happened. One year previously, the largest search engines could boast coverage of over a third of the Web – an accomplishment that has been halved in just twelve months. People who hoped to use the Web as a serious and reliable source of information must now ask the question – of how much will use will it be to search the Web, in the uncomfortable realisation that we search only in one tiny corner of the Web?
The large search engine companies remain openly unabashed. It isn’t about quantity, they tell us, it is about quality. 200 million pages is a sufficient resource so long as we are indexing quality information.
They overlook the symbolic importance of the mere existence of a Catalogue of Catalogues. To know that all human knowledge could eventually be tamed – that was the initial appeal of Altavista and its ilk. To know that their ambitions stop at merely indexing a random percentage of this knowledge, is worse than a declaration of failure – it is a declaration of planned mediocrity.
Never mind that the task probably always was impossible. That information is likely subject to rules more commonly sought in the complexities of social science and evolutionary biology. Borges isn’t the only writer to imagine a way of holding all our knowledge in one place. It is an appealing idea, because if it could be done, anyone could theoretically have equal access to education, and the human race might finally hope to escape eventual total oblivion. If our knowledge could be safely held somewhere, whatever the fate of our planet-bound bodies, our memories and knowledge would survive as our legacy to the universe.
The Glorious Tyranny of the Search Engines
Search engines are simultaneously the saviours and tyrants of the Web. Without them we would either drown in the unstructured maze of information, or else remain rather provincial staying in our own neighbourhood of interests. We would be almost totally reliant on our virtual communities to provide a guide to Internet resources, knowing only the same group of sites, only very slowly branching out to find new information.
Search engines allowed us to discover, suddenly, sites about anything for which we were able to construct a query.
But in an web becoming populated with increasing millions of documents, search engines are doomed to fall behind the task - it has been estimated that even now less than a third of the documents are indexed by any one search engine.
Faced with such overwhelmingly references, all that is possible is what the semiotician and author Umberto Eco has called "the art of decimation" - killing one person in ten or more accurately, killing nine hundred and ninety-nine thousand, nine hundred and ninety search (999,990) results in one million.
On our behalf, search engines conduct this decimation: this is the source of their tyranny.
Searching for the Rosetta Stone
Egyptologists, like contemporary scholars of Mayan hieroglyphics, struggled to make sense of an impenetrable code. Then, by immense good fortune, they happened upon the discovery of a tablet – the Rosetta Stone – in which the ancient Egyptian glyphs appeared alongside a Greek translation of the text. Ancient Greek was a language with which any educated person was familiar and so this opened a whole world of language to modern archaeology.
In the new Age of Babel, the most sought after tool is the Universal Interpreter – a Rosetta Stone for the search engines. Such an application might live on every information seekers’ desktop and be capable of interrogating not just all search engines, but a uniquely configurable combination of any search engines. Moreover, all newly discovered search engines would be easily cracked, submitting to its universal rules of interpretation.
Such a tool would truly put all the Web’s information riches at the disposal of its user – and would never need to expand to become an enormous, memory crunching application. Instead, it would remain light, simply carrying the information which would crack open any other Web search engine.
Holding the information on the desktop would mean that such information could then be subject to further analysis – without a significant loss of access speed. There is no doubt that our current methods of automatic information analysis are crude compared to what will be possible in the future. We don’t even need to postulate great advances in artificial intelligence – the capability exists right now to mimic intelligence by using human-configured definitions in concept-mapping. Thus, all documents about something as specialised as gene regulation could be automatically classified, simply by having access to a dictionary, compiled by experts, in which such esoteric terms as "promoter-binding", "CAAT box", "TFB", would automatically imply gene regulation. This is the principle behind Verity TopicsÒ , part of the Verity Information Server, which allows customisable rules of evidence to link documents to concepts.
The closest we have so far to such a Rosetta Stone, are the desktop metasearch tools: Copernic, BullsEye, Mata Hari. The best of these is BullsEye Pro, which resembles an early version of the ideal application as described above. Of all the metasearch tools, it has the existing capability to search across the largest collection (over 450) Web-based search engines. Moreover it has the capability to add more search engines to that arsenal. Currently presided over by high priest-like figures at the headquarters of the company which developed BullsEye, Intelliseek, it is this feature which can turn BullsEye into the Rosetta Stone of the Web search engines. Petitions to add additional, specified search engines can be made via the Intelliseek Web site, and eventually it will be possible for users to control this feature themselves.
In the ideal Rosetta Stone, not only can any search engine be added to the collection, but the syntax of every engine can be equally efficiently understood. For example, complex Boolean syntax suffers badly in translation when applied to the large Web search engines, which are statistical in their assessment of relevance. BullsEye, to a greater extent than many of its competitors, already has the capability to translate a query so that a specific search engine understands it.
Lawrence, one of the co-authors of the Nature paper, estimates that the all existing written information in the world might be available on the Web and indexed within 20 years. Even if this were achieved, how would we cope with 2,000 search results that actually were relevant? More useful will be the specialist search engines. A molecular biologist, for example, might find that just one biology search engine suffices for now. And when there’s too much written information about biology, this search engine would splinter into perhaps 20 more, specialised engines, which are quick, relatively small and understand how to service the information requirements of a specialist.
The superstore has not yet abolished the greengrocer – and probably never will. Both have their place in the economy.
Yet, our only hope to be able to access rapidly any combination of specialist search engines, and then to deal with all the results according to one set of analytical and reporting rules, lies in a Rosetta Stone. Intelliseek have stolen the march on the competition here, with BullsEye Pro the leader of the pack – watch out for future developments there.
Reference
Borges,Jorge Luis (1956) "The Library of Babel" in "Labyrinths" English translation by James E. Irby, published by Picador ISBN 0140029818
Coppock, Patrick (1995) in "'A Conversation on Information', an interview with Umberto Eco by Patrick Coppock, February 1995" for Multimedia World http://www.cudenver.edu/~mryder/itc_data/eco/eco.html
Lawrence, S. and Giles, C.L. (1999) "Accessibility of information on the web" Nature vol 400 p107-109
BullsEye information and orders at Intelliseek www.intelliseek.com
and www.oxford-knowledge.co.uk/bullseye.htm
Free Pint, published by Willco, is a free email newsletter (20,000 subscribers) which aims to help people locate information on the Internet. www.freepint.co.uk
Other search tools mentioned:




