Locating information on the Internet seems to be so easy. Every publication on the network has its proper address, and all we need to do in order to retrieve it is to point our Web browser to that address.
From their daily work, librarians know that the reality is different. In paper-based publications, we have come across many strange or wrong citations over the course of the years. If this is the case in the relatively standardized and agreed-upon print environment, it is unlikely that this situation will improve when it comes to electronic publications. The current naming system for networked documents relies on a chain of words, letters, or figures, subdivided by commas, dots, slashes, hyphens or other characters. The so-called Uniform Resource Locators (URL) are extremely liable to errors. One lower-case letter instead of a capital one will result in an error message.
Librarians and publishers are experimenting with better naming
systems. Uniform
Resource Locators are already developing into Uniform Resource
Names (URN), which allow a unique name to be assigned
to the publication that will not be changed even if the document is
moved to another computer. A name resolver will keep track of the actual
location of the requested file and translate the name into the correct
URL. The American Astronomical Society makes use of this approach in
their electronic journals (Warnock and Fullton, 1996).
OCLC (Online Computer Library Center, Inc.) has been testing
Persistent Uniform Resource Locators
(PURLs)
that also point to an intermediate resolution service instead of directly
to the location of an Internet resource.
These mechanisms are not yet commonly used, and librarians have to be imaginative in order to locate requested documents when users present to them strange network addresses and weird names of electronic publications. Our profession demands that we always try more than just one approach in order to locate information. If the most obvious way does not lead to satisfactory results, we think of alternatives. This attitude is being extended to Internet resources where usually several search strategies are possible.
For many years, librarians have been the experts within their institutes
with regard to searching commercial databases. They often even held a
monopoly, because they were the only ones who knew which retrieval
language had to be used for which database and how required
information could be obtained most cost-effectively.
It is obvious that searching the seemingly chaotic Internet is
completely different from well-organized bibliographic or full-text
databases and therefore requires new methods.
The most commonly used resource discovery tools are search engines
that allow searching the Internet for words or phrases.
The drawback is that enquiries typically result in a huge number of
documents
with extremely high noise and little precision. Projects have
been set up in order to improve search results. What is needed is
information about information, so-called metadata. OCLC and NCSA
(National
Center for Supercomputing Applications) are two of the main initiaters
who proposed the Dublin Core Metadata elements set which is
``intended to describe the essential features of electronic documents
that support resource discovery'' (Weibel et al.,
1996). The 13 Dublin Core elements include, for instance, information
about the subject, title, authors, form, etc. and can be generated by
authors of Web documents without extensive training. ``Indexing the
Internet'' sounds like a mission impossible but the achievements
so far are very promising. If some remaining problems can be
solved, it is quite possible that the Dublin Core will be implemented in
one of the future versions of HTML. Mapping between MARC and the
Dublin Core also is in preparation so that data interchange between
catalog records and SGML documents will be possible
(Dempsey and Weibel, 1996, and Weibel, 1995).
This approach to information retrieval still involves human
mediation. In a recent article, Bruce Schatz (1997)
explains how new concepts may allow users to search through
distributed repositories across the net. He foresees that
automatic
indexing and ``vocabulary switching'' (i.e., terminology being
automatically ``translated'' into the appropriate vocabulary of
different subject domains) finally will allow us to ``effectively utilize
the whole of scientific information.'' Although such systems are still under
development, one of the first applications
can be seen already in the Illinois Digital Library
Initiative in which
the AAS is also about to participate.