Next: The Role of Librarians Up: Title Page Previous: Archiving

Indexing and Retrieving Electronic Publications

Locating information on the Internet seems to be so easy. Every publication on the network has its proper address, and all we need to do in order to retrieve it is to point our Web browser to that address.

From their daily work, librarians know that the reality is different. In paper-based publications, we have come across many strange or wrong citations over the course of the years. If this is the case in the relatively standardized and agreed-upon print environment, it is unlikely that this situation will improve when it comes to electronic publications. The current naming system for networked documents relies on a chain of words, letters, or figures, subdivided by commas, dots, slashes, hyphens or other characters. The so-called Uniform Resource Locators (URL) are extremely liable to errors. One lower-case letter instead of a capital one will result in an error message.

Librarians and publishers are experimenting with better naming systems. Uniform Resource Locators are already developing into Uniform Resource Names (URN), which allow a unique name to be assigned to the publication that will not be changed even if the document is moved to another computer. A name resolver will keep track of the actual location of the requested file and translate the name into the correct URL. The American Astronomical Society makes use of this approach in their electronic journals (Warnock and Fullton, 1996). OCLC (Online Computer Library Center, Inc.) has been testing Persistent Uniform Resource Locators (PURLs) that also point to an intermediate resolution service instead of directly to the location of an Internet resource.

These mechanisms are not yet commonly used, and librarians have to be imaginative in order to locate requested documents when users present to them strange network addresses and weird names of electronic publications. Our profession demands that we always try more than just one approach in order to locate information. If the most obvious way does not lead to satisfactory results, we think of alternatives. This attitude is being extended to Internet resources where usually several search strategies are possible.

For many years, librarians have been the experts within their institutes with regard to searching commercial databases. They often even held a monopoly, because they were the only ones who knew which retrieval language had to be used for which database and how required information could be obtained most cost-effectively. It is obvious that searching the seemingly chaotic Internet is completely different from well-organized bibliographic or full-text databases and therefore requires new methods. The most commonly used resource discovery tools are search engines that allow searching the Internet for words or phrases. The drawback is that enquiries typically result in a huge number of documents with extremely high noise and little precision. Projects have been set up in order to improve search results. What is needed is information about information, so-called metadata. OCLC and NCSA (National Center for Supercomputing Applications) are two of the main initiaters who proposed the Dublin Core Metadata elements set which is ``intended to describe the essential features of electronic documents that support resource discovery'' (Weibel et al., 1996). The 13 Dublin Core elements include, for instance, information about the subject, title, authors, form, etc. and can be generated by authors of Web documents without extensive training. ``Indexing the Internet'' sounds like a mission impossible but the achievements so far are very promising. If some remaining problems can be solved, it is quite possible that the Dublin Core will be implemented in one of the future versions of HTML. Mapping between MARC and the Dublin Core also is in preparation so that data interchange between catalog records and SGML documents will be possible (Dempsey and Weibel, 1996, and Weibel, 1995).

This approach to information retrieval still involves human mediation. In a recent article, Bruce Schatz (1997) explains how new concepts may allow users to search through distributed repositories across the net. He foresees that automatic indexing and ``vocabulary switching'' (i.e., terminology being automatically ``translated'' into the appropriate vocabulary of different subject domains) finally will allow us to ``effectively utilize the whole of scientific information.'' Although such systems are still under development, one of the first applications can be seen already in the Illinois Digital Library Initiative in which the AAS is also about to participate.

Next: The Role of Librarians Up: Title Page Previous: Archiving

ESO Garching Librarian
Wed Feb 11 12:10:59 MET 1998