HOME

[A lecture delivered to a meeting of librarians and information scientists at the University of Bournemouth in 1987.]

Windows on Knowledge

[A conservative view of the traditional library and the transformations of electronic information. What should the library of the future be like? What will the library of the future be like?]

Notwithstanding the superb facilities provided in the new building at St Pancras, many will remember with affection, the powerhouse of scholarship that was the North Library.

***

It used to be the case that those responsible for libraries and the services they provide shared with their readers a common understanding of the nature of research and the way the curious mind accumulates and sifts evidence. In recent years that has changed, and the ubiquitous adoption of computers in all the activities which are necessary to the functioning of a library, largely driven by the supposed efficiencies thereby gained, has resulted in the belief that access to knowledge depends less on the traditional skills that used to be taught in schools of librarianship than on skills which are associated with what is called information management. This is true; but implicit in this truth are far-reaching consequences for all who depend upon libraries for the successful execution of their chosen topic of enquiry, whether casual or exhaustive.

If those using libraries in the past experienced difficulties in finding what they wanted there was usually a helpful reference librarian who could provide the nudge necessary to get the reader on the right road, and starting with a handful of relevant references, and equipped with a knowledge of the various reference tools at their disposal, the enquiry would gradually evolve to the point where many a researcher has felt overwhelmed by the sources available over which control was necessary. The computer, however, is a tool radically different from books on a shelf or cards in a drawer, and those who would discover the resources available electronically, whether in their home library or a library in a different continent, must now understand a great deal more about how information is stored and how it can be retrieved than before. What follows may seem to be more relevant to those who manage libraries than to those who use them: but I am convinced that both have a part to play in shaping the library of the future. The advancement of knowledge since the fifteenth century has depended upon the fruitful collaboration between the writers of books, those who publish and distribute them, those who, collect and make them available for general use, and those who study them and create new syntheses or suggest new understandings of history or reality. As technology develops it is imperative that all who play their part in the information cycle understand the issues and ensure that we gain more than we might lose.

In 1990 students and researchers in the humanities in British universities could not expect to find much material useful for their research available via JANET - the Joint Academic Network linking institutions of higher learning and research institutes throughout the British Isles. A few universities had, by then, made their catalogues available on the network, but for the most part what was available were records for modem works. The keyboarding of the British Library's General Catalogue, after two abortive attempts at conversion, had still only proceeded as far as the letter 0. The pre- 1920 catalogue of the Bodleian, though in machine-readable form according to a plan devised by John Jolliffe in the 1970's, was still not available online, though the cumbersome guard books which contained the original catalogue slips had been replaced by a computer printout. Libraries in Europe were accessible only if they used OCLC as a shared cataloguing service, and EPIC had only recently been made available in Europe on a trial basis. Libraries in North America had, by 1990, begun to make their online catalogues available on the Internet, but for the most part these were of marginal interest because they contained almost exclusively modern works appropriate to undergraduate studies. The only guides to the Internet available were Ed Krol's The Hitchhiker's Guide to the Internet (a slim pamphlet of 23 leaves published by the University of Illinois), and a six page pamphlet by Marilee Birchfield, Casting a New Net: Searching Library Catalogues via the Internet. There have been over 500 books on, and guides to, the Internet published since 1990.

When 1 started the Research Seminars for users of the Reading Room in the British Library in March 1990 the principal databases which I used to assist readers in their research were: the incomplete file of the General Catalogue; OCLC and RLIN. Discounting duplicates (a phenomenon found in all collaborative databases not subject to editorial control) readers discovered quickly that computer access to some sixty million records could provide them with sources which would have taken years to discover. In the four years over which the Research Seminars extended the sources available for online searching had multiplied to the point where it had become difficult to adopt practicable search strategies. By January 1995 the number of online catalogues exceeded two thousand covering libraries throughout the world. The countries still represented very patchily are those in Central and South America, Japan, China, India, Africa, Eastern Europe, and Russia. Even within Europe the number of catalogues available from France, Italy, and Spain is still very limited.

In spite of the fact that many libraries in the United States have received special funding to catalogue in machine-readable form particular collections of importance for historical research, it remains a fact that most of the literature printed before 1900 is still unavailable remotely. For manuscripts and archives it is going to be a long time before we have access electronically to more than a fraction of what exists.

In the 1850's the Society of Arts tried to inaugurate a project to produce a universal catalogue: the Prince Regent was to be the patron and his numerous royal relations throughout Europe were supposed to play a part in convincing governments of the manifold benefits which would derive from such an imaginative project. It came to nothing, of course, and there were those who argued that if only the British Museum were to produce a printed catalogue most demands would be satisfied. A printed catalogue was produced between 1881 and 1905, but we now know that the total collections of printed materials in the British Library, though enormous and of the first importance for historical research in almost every discipline, nevertheless account for less than 20% of all books, periodicals, newspapers, and ephemera printed before 1990.

Although research in the humanities depends upon access to historical materials (printed and manuscript) there is also a pressing need to have access to works published in the past twenty (or so) years and because research libraries have, during this period, suffered very real cuts in acquisition budgets discovering what has been published on any given topic requires access to large collaborative databases such as OCLC or RLIN. The growth in academic publishing since 1970 has necessitated a much more rigorously applied selection policy for even the largest university libraries.

Edited Databases

ESTC, the first edited catalogue of historical materials in machine-readable form, was started at the British Library in 1976 following an international conference jointly sponsored by the National Endowment for the Humanities in Washington and the British Library, and held in London during June of that year. From the outset, ESTC was conceived as an Anglo- American project with the British Library's holdings (the largest in the world) properly judged to be the core of what would become a union catalogue. During the first phase of the project the aim was to produce a definitive catalogue of the collections of the British Library. For a variety of reasons, this aim has still to be achieved after nearly twenty years of effort. As far as the total universe of individual titles printed in the British Isles, North America, India, and the Caribbean, and books in English printed anywhere, is concerned the file is probably 80% complete, though the record of editions and copies in the world's libraries still has a long way to go. The problem with any collaborative enterprise is ensuring uniform practices in cataloguing and accuracy, and the imperative of good housekeeping routines can hardly be over-emphasised.

On a smaller scale, and conceived as a finding aid rather than a bibliographical catalogue, is the ISTC (Incunabula Short Title Catalogue) begun shortly after ESTC. Given the attention devoted to the description of incunabula by libraries which own them it was sensible not to follow the Gesamtkatalog in describing books printed before 1501 but rather to adopt the principle of normalised entries. The normalisation follows, essentially, the forms of names (personal and geographic) and titles adopted by Frederick Goff in his Census. By adding to the record for each item abbreviated references to the works describing copies of it the labour of duplicating information available elsewhere is obviated. If ISTC has a weakness it is that of generally omitting notes on the peculiarities orprovenance of copies, thereby making itnecessary to consult the appropriate reference if one is in search of, say, copies with manuscript annotations or interesting provenances.

Unedited Databases

The 1980's witnessed the development of numerous cooperative schemes to rival the huge success of OCLC which started in the 1970's, under the leadership of Fred Kilgour, as a union catalogue resource for the State of Ohio, and then based in Columbus. The Research Libraries Group, which had its origin in a collaboration between Harvard, Yale, Columbia and the New York Public Library, expanded in the 1980's to include about thirty major university research libraries. The network which links member institutions is RLIN, and is based in California. One of the recent additions to membership is the British Library. Although RLIN makes available a number of specialised databases (including ESTC) the principal database used by members of RLG is the collaborative monograph file which contains some 20,000,000+ records and is seen by the academic community as a rival to OCLC, with 32,000,000+ records. In the publicity war which is relentlessly waged by both parties statistics should not be taken at face value. Both databases suffer, as one might expect, from similar defects. Conspicuous amongst these is the high incidence of duplicate records. OCLC has, it should be noted, made some effort since 1993 to reduce these. It has also, I note with approval, done something to reduce the number of variant headings for well known authors. There used to be no fewer than twenty-three for T.S. Eliot! There are now (as of writing) just thirteen.

The Research Libraries Group in North America provided Britain with a model, and the 1980's saw the creation of a comparable consortium in Britain: the Consortium of University Research Libraries (CURL). Starting as a group of seven it now numbers twelve libraries (including Trinity College, Dublin) and plans to expand in the future to include most of the British university libraries.

More recently, a group of European research libraries (twelve of them British) have formed a consortium (CERL) which, if it manages to avoid the muddle associated with so many European initiatives, holds the promise of making available electronically the important historical collections held in Europe's major national and university libraries. Often overlooked by those in search of English books printed before 1900 the research libraries in Europe have significant collections which deserve to be better known, so CERL will, if it succeeds, be of benefit to those concerned with the English as well as the European printed heritage.

In America, where the consortium principle has been extensively developed in recent years, there are now computer systems which enable researchers to interrogate collections as widely separated as the libraries within the University of California system. These networks are generally developed to serve a state (such as those for Colorado or Arizona), or a city (such as Boston or Washington, D.C.).

Online Library Catalogues

The last decade has seen a concerted effort by libraries in North America to convert their catalogues to machine-readable form. The process is an evolving one, of course, and there are still a large number of important research collections which await conversion, but there is every likelihood that by the beginning of the next century most will have been converted and the familiar card catalogue will be no more than a memory for those whose active research life extends back to the 1970's. Converted catalogues, while they undoubtedly save the space formerly allocated to 5 x 3 cards, are not always the blessing they are made out to be by library administrators. Much depends on the manner in which the conversion is done and the specification drawn up to guide the bureau undertaking the task. Converting a card catalogue into an accurately coded MARC structure is no trivial task, and it can be done in a variety of ways.

OCLC has been undertaking the conversion of large library catalogues for some time now, and these include collections as large as Harvard and the Bodleian. The conversion of Bodley's post-1920 holdings by OCLC was announced in July, 1994, and it is estimated that the project will take four years. At Harvard, the conversion of the Yenching Library (60,000+ Chinese and Korean titles) by OCLC will prove of immense value for students of East Asian cultures. Similar conversion projects for oriental collections are planned at the University of California, Los Angeles, the University of Washington, Seattle, and the San Francisco Public Library.

Derived Records

Large cooperative systems, such as OCLC, provide participating libraries with the opportunity to derive records from those already on the database for an agreed fee. It frequently happens that a library bases its record on one which is manifestly incorrect; or, where bibliographical niceties regarding issue or edition are involved attaching a new location to the wrong record. If this can happen in a relatively controlled database like ESTC it is hardly surprising that it happens with such frequency in one which does not benefit from strict editorial control. When using very large collaborative databases such as OCLC or RLIN it is essential that users whose needs depend on accurate bibliographical information are aware of the limitations inherent in them because of the manner in which they are created. The problems are, in general, similar to those which researchers have found in the National Union Catalogue ofpre-1956 Imprints.

When the colossal card catalogue at the Library of Congress was pre-edited for the lithographic reprint undertaken by Mansell, the matching of cards for identical or similar items was undertaken by inexpert staff, with the inevitable result that location symbols were added to a "clean" card for items which were actually different. In many cases cards which were typewritten, but containing collations and elaborate copy notes, were discarded in favour of a printed one as the "master". Since the entire card catalogue was destroyed when the printed volumes appeared a vast storehouse of invaluable bibliographical data was lost for ever.

While many library catalogues have interfaces developed locally most libraries use one of the numerous commercially developed OPACs. These interfaces have been developed to simplify the process of searching a machine- readable MARC catalogue: instead of having to learn a search language the user finds what he wants by using simple on-screen menus.

Surring the Internet

The number and range of research libraries now available on the Internet presents one over-riding problem: while there are guides like Hytelnet which facilitate connection, there is still no useful guide to what you will find if you choose to visit a library's catalogue. Libraries seem unwilling to indicate on their welcome screens exactly what proportion of their collections have been catalogued or even from which year of publication. Much of the available network resource is wasted by fruitless searches for material that has not yet been catalogued, and there appears to be no impetus to establish a mechanism for research libraries to inform the scholarly community of progress in converting research collections. Many of the libraries which use OCLC receive grants for converting special collections which researchers need to know about and there should be some easily accessible listserv developed forjust this purpose. An even better solution would be to update Hytelnet files with this information so that before considering a connection a remote user can ascertain whether or not it will prove positively helpful. Such a scheme would certainly require resources to maintain but the benefits for all who depend on the Internet would be incalculable. In the age of print we all came to depend on guides to the research collections held in the world's libraries: the need is now all the greater with so much electronic information readily available.

OPACS

The simplicity which most OPAC interfaces lend to the searching process is historically due to the fact that they have been designed with relatively simple objectives. In most university library systems they are designed to serve undergraduate needs. They are seldom useful as browsing tools. For the most part bibliographical records can be searched by author, title, subject, and classification (Dewey or Universal). Some, like the one still being developed at the British Library, provide readers with access to date of publication (a recent improvement) and shelfmark. There are none I know of which can provide a researcher with the flexibility to search across fields, though CDROM search software typically makes this possible. Yet the whole point of having an electronic catalogue is to enable readers to ask questions which conventional manual systems render impossible. The huge investment in automation which libraries have made in the past ten years can only be justified if the tools available enable us to conduct quickly and efficiently enquiries never before possible. Trying to second- guess the uses to which an intelligently constructed database served by imaginative software can be put is a pointless exercise, since no one can predict the sorts of questions in which researchers are interested. Suppose that I am curious to discover how many sermons published in cities other than London, Edinburgh or Glasgow between 1750 and 1800 were printed in octavo format rather than quarto there is only system I know of which will enable me to discover the answer: the ESTC file on BLAISE-LINE. In order to test the thesis that universities in the United States are purchasing fewer books in the major European languages 1 set about trying to establish statistics using the OPACs available on the Internet for holdings of books in German, French, Italian and Spanish for every year since 1980. It cannot be done, since very few systems provide searching on language and date of publication. A good online system should be able to satisfy such an enquiry as a matter of routine; yet even RLIN failed to help since date of publication is not indexed.

Researchers have, in the past few years, begun to appreciate the very real advantages which automation can bring and it is ironic that some of the more imaginative approaches to novel ways of manipulating information are coming from academic departments rather than from libraries. This is an altogether healthy development, of course, but it serves to underscore the extent to which libraries and their users are drifting apart. In spite of all the protestations we have heard in the past decade about the manifold benefits which discarding the old card catalogues would bring they have been replaced with tools unworthy of the effort and expense it has taken to bring about the transformation.

What Remains?

In spite of the countless millions of bibliographic records maintained on the computer systems of the world's research libraries we have still only succeeded in bringing under control a small proportion of the world's manuscript and printed heritage. If your research is concerned principally with events since the death of Queen Victoria then the networks and the electronic databases can serve you quite well. If your interests lie in the history of Europe in the sixteenth century they will. not be so well served. Even for those concerned with twentieth century literature the vast manuscript collections at the Humanities Research Center in Austin are still inaccessible electronically. While university libraries have found the resources to convert their catalogues (in part) what of research libraries such as the Folger Shakespeare Library in Washington; or the Huntington in San Marino; or the Pierpont Morgan in New York? The nature of such libraries, and the way in which they are funded, means that we may have to wait some time before their catalogues are available on the Internet.

The average IBM-compatible personal computer today has a line editor supplied as part of MSDOS which was formerly served by the dreadful Edlin: that primitive piece of software took twelve years to replace. For nearly as long we had to put up with a character set constrained by 128 codes: now we have 256, which just about copes with most European languages. There are, it is true, wordprocessors which can handle just about any language; but when will we have the ability to compose a letter or a contribution to a debate in any language using a standard chipset and send it anywhere by email?

Libraries in the Future

There can be only a few concerned with research libraries and their evolution who do not consider what they might be like in the next century. That they will contain a great deal of electronic gadgetry no one seriously doubts. But will they have changed out of all recognition from the institutions we now use? Given the enormous investment in automation during the past twenty years, are we on the threshold of a new age in which acquiring knowledge will be easier than ever before, or have we merely substituted one form of access to that knowledge for another? Are we, indeed, about to substitute for the book (in all its varied manifestations) information exclusively in electronic form? Has civilisation so burdened our librarians with the responsibilities of acquiring, accommodating, preserving, and making available print on paper that we must, of necessity, alter our habits? And even if we are persuaded that the future of information really is only possible electronically, what are we proposing to do with all that has been accumulated from the past? Even if we succeed in converting it all into digital form, are we going to throw away the originals when so much research still needs to be done that can only be done with originals? One of the lessons we are learning from enterprises like ESTC, global in its excavations of the printed record of the English-speaking eighteenth century, is how much we do not know. Sound enumerative bibliography always paves the way for new interpretations of the past and this is now becoming possible because of what ESTC has unearthed in the world's libraries. Like its antecedent, Pollard and Redgrave's STC and the many correlative works it spawned which nourished much historical research for half a century, it will surely take a long time for scholarship to take advantage of the riches uncovered by ESTC. And I have no doubt that many of its records will require adjustment as scholars probe more deeply into the history of particular texts. But they will only be able to do so if the books themselves are permitted to survive. Much useful investigation can be done with photographic/digital surrogates; but some questions can only be answered when the original is available.

It is a matter of fact that libraries are fashioned as much by those who use them as by those who govern them. 'Me best developments in library management occur when there is a consensus between the two, but that happens rarely today because we no longer entrust our manuscript and printed heritage to those who understand how scholarship functions. In time, that may change; but it is certainly true that the "managers" we have exchanged for "librarians" are unable to tell us with any certainty what libraries are going to be like in ten years. They simply do not know. What do those who engage in historical research think? The users of research libraries, in my experience, constitute a docile community that asks little more than that a library should make it easy to discover what books it possesses, keep them in serviceable condition, and produce them quickly on demand. Some managers of libraries regard this modesty of expectation as beneath contempt, knowing (as they claim to know) what is good for research better than those who carry it out

One hears, occasionally, wild speculation about how the future will transform our entire world heritage on paper into more convenient, instantly available, digital form. Given that we have not as yet succeeded in cataloguing and describing that heritage in its inconvenient form, how long will it take for the transformation and the necessary means of locating it digitally? The medium in which a piece of information is stored is irrelevant: it must still be described and located before it can be used.

The statistics published by those organisations concerned with inter-library loans and document supply suggest that, at the moment, the demand for eye-readable information far outstrips that for electronic. Electronic mail is, of course, a fact of life, and, in spite of the primitive facilities provided on most mainframe computers for producing accurate, well considered prose, widely used in preference to conventional "snailmail". Then there are the facilities for distributing electronic conversations (listservs using robot computers) to societies of persons interested in some aspect of knowledge. How much of this traffic is worth preserving 1 cannot say, but very likely only a tiny portion. The problem is, which portion? Should we contrive to keep it all, orjettison the lot? And if we decide to keep the huge quantity of electronic infonnation that daily stretches the networks to their limits where shall we keep it? Who will index it? The prospect of gigantic electronic parking lots full of inaccessible, ill-conceived and illiterate meanderings is too absurd to be taken seriously. There are occasions when someone contributes a carefully argued and documented piece of prose to a listserv debate. Instinctively, 1 print it out, perhaps for the benefit of my students. But what if I wish to cite it at some later date? Where is it by then?

Survival

Those engaged in exercising their curiosity to know more about some aspect of the past or present have, historically, shown both tenacity and ingenuity in overcoming the obstacles presented by the dispersion of primary sources in the world's libraries and archives. There are some areas of research where it is perfectly possible to obtain access to everything one needs in one place, but for most kinds of historical investigation the sources are dispersed and must be identified before they can be evaluated. Identifying sources is what bibliographers and archivists have been doing for centuries. Their task is never complete, however, because books and documents are essentially fragile and vulnerable: they may simply deteriorate, or they may be destroyed, both by accident and by design. In the case of books survival has, in large measure, been due to the existence of libraries, private and public. But bringing books together to form what we term a "collection" is no guarantee of survival, for a single act of deliberate or accidental destruction can result in huge losses. For the entire output of printing in Europe between 1450 and the year 1700 fewer than five copies (on average) survive for every known edition. In many thousands of casesjust one copy survives. It is probably true to say that every library in the world contains some unique items, and uniqueness extends beyond the artifact: marginalia and indications of provenance can be materially significant, which is why bibliographers go to such extraordinary lengths not simply to identify a work but to identify copies of it. They also understand the fact that very few books fail to benefit from what is observed by a different pair of eyes.

Computers have contributed nothing to the task of identifying the world's printed and manuscript heritage: that can only be achieved by people. What they have contributed to the process is the means of storing bibliographical information in a medium which permits infinite addition and correction. There is a price to pay, of course, for this benefit which a computer database has over a printed bibliography. The price is order.

Every bibliography or catalogue is constructed according to some pre-conceived order, which may be alphabetical, by subject (or topic), chronological, or (as is common for books printed before 1501) by country, subdivided by place of printing and printer. Other arrangements are possible but rarely found. A computer database, on the other hand, is constructed on no order other than time of entry. It follows, therefore, that unless each entry carries with it the information needed to enable the computer to output entries according to some coherent and meaningful sequence additional to that imposed by the author's name, the wording of the title, the place and date of printing, the format, and any of the other characteristics which bibliographers use to determine a sequence of editions, computer output will follow an invariable principle: last in, first out. Computers can sort entries in a database (OCLC permits a chronological or other sort providing the number of selected records does not exceed 500), but there must be instructions (codes) which it can be programmed to follow. Ibis is why bibliographical databases have automatically generated indexes: authors' names, titles, place of publication, printer or publisher (seldom both), date of publication, subject, etc. So it is possible to search the millions of records on OCLC and retrieve all books by, or about, T.S. Eliot - but only if every such book carries the identical form of his name; which is, in fact, not the case. Similarly, 1 can retrieve all the books printed in 1501 - but only those where the cataloguer either entered the date correctly or guessed that 1501 was the correct date (i.e. the imprint has " 1401 " or no date at all). What cannot be done (and this applies to most large databases) is to request all the books by T. S. Eliot in chronological order. While it is possible to add filing fields in a database (such as ESTC) these have no significance online and have effect only when the file is output to hard-copy or microfiche. When dealing with computer databases it is an unpleasant fact that the responsibility for arranging a meaningful sequence falls to the user. The British Museum General Catalogue is ordered according to principles which were, broadly speaking, laid down in the nineteenth century. It is a simple matter to turn to the relevant pages and follow the printing history of Dombey and Son. This is impossible on the computer version of the catalogue. There are sound reasons why card and printed catalogues are governed by filing rules, and the sequences they adopt are generally those understood to be meaningful and helpful to those consulting them. There are, sadly, few such helpful principles at work when searching computerised catalogues. There are, however, stratagems which can assist the user facing an inchoate body of records displayed in no order other than reverse order of input. But these stratagems will make demands on users of libraries that they may well find disagreeable.

It has never been accepted that in order to carry out research in a library one must become a librarian and learn the intricate rules governing cataloguing, classification, or subject indexing. Most researchers assume, sometimes wrongly as it turns out, that the principles underlying a library catalogue must be so straightforward that a little experience will make all clear. They will have learned something about the arrangement of a card catalogue as undergraduates, and postgraduate studies, especially if conducted with some elementary training in research methodology, will certainly have contributed much to their understanding. In Europe experience in dealing with the peculiarities of library catalogues is commoner than in North America where the 5 x 3 card catalogue has been the norm since the second half of the nineteenth century, and with Library of Congress cataloguing practice being standard since the development of the printed card service established in 1901.

1 remember well the puzzlement expressed by fellow readers in the British Museum in the 1950's and 1960's at the peculiar rules governing the placement of anonymous titles and of publications which the rules dictated should be placed under large arrangements like ENGLAND, LONDON, FRANCE, etc. To this day readers experience difficulties with periodicals which are entered under place of publication rather than title. The same confusion existed for the publication of learned societies until the old heading ACADEMIES was abandoned - though the shelfmark Ac[ademies]. testifies to its former existence. The rules for anonymous books has given rise since the publication of Donald Wing's Short Title Catalogue (1641-1700) to numerous problems for researchers: Wing entered (as most American libraries do) anonymous works under the first significant word other than the definite article (whatever the language), but since his titles are brief it frequently require ingenuity to discover the item in the General Catalogue. That problem is one which automation has, mercifully, relieved since works can be discovered by searching on the title field.

But automation brings its own problems for users of catalogues, especially those which, like the British Library's General Catalogue, provide sequences which the computer is unable to follow. Further problems arise from the fact that when the General Catalogue was converted it was decided that sequences of editions under a heading distinguished from each other only by the statement in square brackets [Another edition] or [Another copy] or a note indicating that this particular item is a variant of the one adjacent to it should be keyed as one record with repeat values attached to the different entries. This mistake, which confounds so many apparently reasonable searches, was not made when the Bodleian catalogue of pre- 1920 imprints was edited prior to release on CDROX In this manifestation of the old guard-book catalogues (which owed much to British Museum practice) the principle of the unit record - the only kind a computer database can successfully deal with - was adopted: at considerable expense, I have no doubt, but of significant benefit to users. The OPAC developed for use in the British Library's reading rooms has, quite recently, smoothed out some of the difficulties, but the online file, which permits quite sophisticated searches to be performed, still produces an unintelligible sequence for works frequently reprinted.

Given the nature of data which a computer is expected to process it follows that unless researchers take the trouble to acquaint themselves with the rules governing data entry their searches are likely to be frustrated. The detailed level for which this applies concerns almost every field in a MARC record - the standard universally accepted. MARC is a language, like Esperanto, developed to be an universal benefit for international communication. Unfortunately, it has already subdivided itself (as far as the rules for applying it are concerned) into numerous, rigorously defended dialects. Like systems of divinity, rules are subject to interpretation, and so it is not surprising that they are interpreted differently in different libraries by cataloguers with different perceptions and understanding. This can be demonstrated in any database containing records created by different libraries. Only when the basis for the rules is understood can a researcher perceive ways of "working around" inconsistencies and errors. A failure to understand this explains why so many researchers complain that automated library catalogues seldom reveal books the existence of which they do not already know. To find a book in a library the existence of which you know should now be trivially simple. To discover a book, the existence of which you never suspected, is what a good catalogue should make possible, and many scholars have testified to the fact that the British Museum General Catalogue, whatever its idiosyncrasies, make this more likely than a merely alphabetical arrangement. It is also the reason why generations of scholars have found dictionary catalogues, such as that for the New York Public Library (now available in 800 printed volumes), so valuable and revealing. The subject catalogues found in German university and French public libraries are, likewise, a revelation to scan. The greatest subject catalogue I know is that started at the University of G6ttingen in the eighteenth century and now housed in hundreds of volumes in the church which forms part of the Old Library. It is generally agreed that subject cataloguing represents the most expensive part of a modem machine-readable bibliographical record, whether in the form of verbal phrases (such as Library of Congress headings) or numbers (such as the Dewey or Universal Decimal Systems). The advantage of numbers over words in a computerised environment is obvious: numbers can be truncated, in order to generate larger subsets, whereas phrases cannot.

What should now be clear is that research using computer databases to lead our curiosity fruitfully in the direction of books which might otherwise escape our attention provides opportunities while at the same time presenting difficulties which traditional catalogues did not. Because the transformation of research libraries in recent years has been effected by the introduction of computers it seems clear that users of libraries must, if the promise of greater flexibility of access is to be realised, come to terms with the nature of that transformation and understand both the gains and the losses. They must also understand the processes involved in transforming a conventional catalogue from paper to electronic format. These processes are no longer the exclusive concern of librarians, as any scholar who has tried to compile a bibliography using one of the many available database systems developed for microcomputers will testify. The construction of a MARC record demands absolute accuracy and consistency in following established rules for coding and tagging: data incorrectly entered is as good as lost. Users of libraries often show a touching faith in the integrity of the electronic catalogues at their disposal and conclude that if they cannot find a particular book it is simply not there. In a drawer of cards filed under the name of ELIOT, Thomas Stearns mistakes in the spelling of his name may be non-significant: in an electronic file any variation in spelling will result in that record being indexed so that it does not appear in the correct place. 'Re superiority of electronic catalogues over manual ones is obvious: but only if the catalogue data is correctly and consistently entered. The complexity of an electronic record is such that errors are both frequent and difficult to identify. For reasons which should now be obvious we have made gains, but there have been significant losses.

It is, for example, quite possible to scan a drawer of 1,000 5 x 3 cards in less than an hour; it is possible to read the entire ENGLAND heading in the printed General Catalogue of the British Library in three days: scanning the 143,000+ records on the OPAC would take weeks. It is a fact with which most scholars I know would concur, that scanning a printed catalogue does, with a little practice, yield results in far less time than retrieving records on a screen, especially if the former is following a meaningful arrangement and the latter not. This situation need not, however, have been so, but for a number of circumstances in the history of the British Museum catalogue since the end of the Second World War.

When the revision of the General Catalogue (undertaken in 1929) was abandoned in 1954 after publication of volume 50 which covers the alphabet from DEO to DEZ, the decision to embark on a printed version which would combine the revision (known as GK2) with the information in the laid-down duplicate set of the volumes in the Reading Room was begun in association with the firm of Balding and Mansell (Mansell was later to produce and market the Library of Congress National Union Catalog. The result, known as GK3, became the familiar folio blue bound volumes to be found in most of the research libraries of the world, but it suffered from a number of errors and omissions. After the completion of GK3 in 1966 it became necessary for the British Museum to issue a series of printed Supplements (1956-65 = 50 volumes; 1966-70 = 26 volumes; 1971-75 = 13 volumes). In 1979 the firm of G.K. Saur undertook to publish an amalgamation of GK3 and its three supplements. This version of the catalogue became known, inevitably, as GK4, and was completed in 360 volumes in 1987: this was the exemplar used for the S aztech conversion. When the catalogue was keyed by Saztech it was part of the specification that each record should have a control number which would identify its position by volume, column and line number. If GM in 360 volumes had not, because of the haste with which it was produced, omitted many thousands of entries, and if Saztech had keyed the catalogue without omissions, then the control numbers would, at a stroke, have enabled meaningful output, since the entire database could have been inverted on this control number. But there had to be a series of six supplementary GM volumes (numbered 361-366) to make good some of the omissions; and these do not carry control numbers which would ensure their being filed in the proper place: records for the six supplementary volumes have control numbers starting with the three digits: 361xxxxxxxx through 366xxxxxxxx. Consequently, while most of the records for Andreas Aubert begin with 0 13 one of them (Det nye Norgesinalerkunst, 1904) begins with 36 1.

Images

In the last two years libraries have begun to experiment with image databases based on digital scanning. It is argued that digital libraries represent the way ahead since a digitised image can be economically stored, electronically refreshed when required, and transmitted anywhere over the world's evolving telecommunication networks. The technology lends itself, theoretically at least, to many of the services currently offered by libraries: notably document supply and interlibrary loan. There are, as might be expected, a number of problems presented by remote access to the kinds of material most frequently used by researchers in the humanities. These problems are technical and legal.

One of the most difficult problems which the proponents of digital libraries must address concerns intellectual property. Copyright law is constantly undergoing revision, but the rapidity of technological developments far outruns the machinery of legislation and it is not easy to predict how the agencies which form part of the legal protection of intellectual property will adjust to electronic publishing. Historically, libraries have played a crucial role in controlling the distributive processes to which both manuscript and print can be put: microreproduction and photocopying. The laws governing these ubiquitous processes, found in libraries world-wide, is quite straightforward and publishers have come to accept the principle that distributing a library's resources carries a fee element comparable to the royalties paid to an author. Agreements between libraries and publishers generally have clauses which restrict third-party exploitation of both microforms and photocopies derived from them. Such agreements are not difficult to monitor since microforms, are principally acquired by libraries which are able to enforce adherence to them. But electronic data is less easy to monitor, so it is hardly surprising that publishers and libraries are understandably nervous at the prospect of their materials entering an environment which is effectively uncontrolled.

A huge quantity of electronic information is currently available to anyone with legitimate access to the Internet: some of it is subject to control, some not. A university may, for example, negotiate with Encyclopedia Britannica to make the electronic version available to faculty and students, but not to remote users, access being controlled by the user's logon identification. Remote access to files held on computers via anonymous)~p requires the user to enter a valid Internet email address, a mechanism designed to control misuse of valuable information. A number of agreements are currently being negotiated between university libraries and publishers which grant site-licences for unlimited distribution of published texts on local university networks. How widespread this innovative approach will become remains to be seen. Certainly the notion of partnership between the parties which produce and acquire scholarly materials is preferable to the situation we have seen in recent years where diminishing funds for acquiring books has led librarians to regard publishers as the enemy. The spiralling costs of conventional publishing and the means of distribution have had serious consequences for all whose research demands access to scholarly monographs and journals and a solution must be found.

If the concept of the digital library has a future then it can only be achieved on the basis of cooperation and the creation of a universally available database of accurate bibliographical records for every digitised item: it would be an irresponsible waste of precious resources to create redundant electronic versions of the same item as we have, over the past fifty years, created redundant microform versions of identical items. Preservation microfilming has been a conspicuous activity in research libraries during the last decade, but there are few mechanisms in place to minimise redundancy. Given the shrinking budgets which most libraries face it is difficult to see how resources will be found to both digitise and catalogue accurately our printed and manuscript heritage. And we might consider the quite sensible proposition articulated in the 1970s that the advent of the electronic record of a book would reduce significantly the insupportable burden of redundant cataloguing: a pron-fise which still awaits fulfilment!

As we approach the twenty-first century it is hardly surprising that librarians throughout the world are looking with interest (and perhaps not a little anxiety) at the two colossal libraries undergoing construction in London and Paris: they are, after all, the two most important research libraries in Europe, and it is fitting that two such institutions enjoying the benefits of vast resourcing should show the rest the way. Whether either will remains to be seen.


HOME