An Overview
With Special Reference to
English Studies
By
R.C. Alston
October 2002
The summation of human experience is
being expanded at a prodigious rate, and the means we use for threading through
the consequent maze to the momentarily important items are almost the same as
in the days of square-rigged ships. We are being buried in our own product.
Tons of printed material are dumped out every week. In this are thoughts,
certainly not often as great as Mendel’s, but important to our progress. Many
of them become lost; many others are repeated over and over. [Vannevar Bush,
Science is not enough, New York, 1967.]
a a a
Ever since the great library at
There is no need to reiterate here the reasons why libraries
with comprehensive collections on the scale of the
There are persuasive arguments favouring the creation of digital libraries, and some of them can be traced back to Vannevar Bush who, in his landmark essay on the future “As We may Think” published in Atlantic Monthly in 1945, said that extracting information from the great research libraries was like “a stone adze in the hands of a cabinet maker”. Bush dreamed of a device (which he called Memex) that would make all knowledge readily available when and where one needed it. In some respects, the World Wide Web is a gigantic Memex, with information pouring into it from millions of sources and consulted, at any time of the day by countless millions of people in search of something. Given the apparently inexorable reduction, in real terms, of library budgets in the past ten years, it has seemed to some that a collaboratively constructed network of digital versions of what research libraries possess on paper or parchment might well provide future scholars with more information, readily available from the office or home, than has been possible for researchers using the traditional means of gathering information. This is undoubtedly possible, but whether it is practical, or affordable, is a different matter. Take some of the problems which must be overcome before there is any likelihood of there being a practical, affordable alternative to things as they stand.
1. Until every document/text has been accurately converted to digital form it will be necessary to continue maintaining the present stock of all such libraries and archives which possess materials germane to historical research or enquiry. In other words, even if the most optimistic forecasts come true, we have probably a century of effort and funding before the digital libraries of the world can supply the needs of those engaged in research of any kind. It is simply not possible to conjecture the burden of cost this will place on the largest of our research libraries.[2]
2. Conversion to digital form must be accompanied by adequate cataloguing, preferably to internationally agreed standards. This will be no trivial task, since cataloguing an electronic source is just as demanding as cataloguing a book, a microfilm, a film, or a video. While MARC standards exist for virtually all library and archive materials, there is not, as yet, any agreed standard for digital conversions. Such a standard would seek to provide information comparable with a library catalogue, but with additional technical data: type of conversion (bit-mapped/ASCII), resolution (dpi), type of file (binary/ASCII), source, &c. For materials printed before ca. 1850 it will be necessary to specify the precise copy used in the conversion, since copies of hand-printed books are seldom identical. A serious problem may prove to be one of the characteristics of the Web: the instability of URLs.[3]
3. A question not yet resolved concerns the identification of what has been achieved. This is difficult enough at the moment, with a relatively trivial number of conversions.[4] Will national libraries be expected to be responsible for the national digital output?
4. Problems associated with procedure are, of course, capable of solution: what is not at all clear is where the money is going to be found to continue funding libraries and archives as we do in the year 2000 with the additional burden, for perhaps a century, of converting the world’s research materials to digital form.[5]
a a a
In the early days of the Internet – before the World Wide
Web became universally adopted by academic institutions – the predominant users
belonged to scientific and medical disciplines. By the early 1990s university
libraries in the
a a a
It was during the summer of 1995 that Michael Lesk, then
Head of Computer Research at Bellcore in
In its latest report, published this summer, the
Commission on Preservation and Access addresses "the feasibility of a
project to study the means, costs and benefits of converting large quantities
of preserved library materials from microfilm to digital images". This
report, by Donald Waters at Yale, follows on from an earlier report (July 1990)
in which Michael Lesk observed that image digitisation would soon be both
relatively cheap and available. Lesk and I have been talking about ways of
digitising film since 1987, and I am pleased to report that it has taken only
four years to get this idea accepted as a possibility. Since then, the Mekel
400M and 40OF cameras for converting film and fiche to compressed digitised
form have appeared and four 400M cameras are being used in the
The point of departure, quite properly understood at
Yale, is concern for access and two major investigations of the methods used by
scholars in the humanities and sciences (the Research Library Group's Program
for Research Information Management and the Faxon Institute's report presented
at its 1991 conference at Reston, Virginia in April this year) confirm that scholars
highly value and tend to favor information that is readily at hand and ... a
critical measure of success for libraries charting a course into the future is
how readily they steer information into the hands of their clients. ... The
mission of the access-oriented library is to generate, preserve, and improve
access to collections of recorded knowledge. This mission guides the
fundamental relationship between the access services and the library
collections.
When flatbed scanners were first introduced four years
ago they aroused considerable interest in
If I am right - more important if
Lesk agreed to write a monograph on the practicality of developing digital libraries, and within a year I had received a draft of what became Practical Digital Libraries: Books, Bytes, & Bucks.[8] Lesk’s book is, as far as I am aware, the only serious study of the manifold issues raised by the concept of the digital library, though there are a number of notes of general guidance available on the Web – these are discussed in greater detail below. Encouraged by Lesk I persuaded the British Library to acquire a Mekel 400 microfilm scanner, and began testing various types of negative microfilm from the vast stock in the collections. After about three months, just as I was about to report to the Director General on what I considered to be valuable experience and clearly demonstrating the potential of microfilm scanning for creating substantial digital collections of research material the project became absorbed in the Preservation Service. One product of the British Library’s not very successful entry into the digital library arena was the publication in 1998 of Towards the Digital Library: the British Library’s Initiatives for Access Programme.[9] This is really a collection of essays on various experimental sub-projects and contributions by various authors on subjects pertinent to the concept of a digital library. With the exception of the Electronic Beowulf project none of the initiatives for access have borne fruit of any consequence. The Digital Library Programme was put out to tender in 1998; by the middle of 1999 it was abandoned: neither the consortia interested nor the British Library could agree on objectives or terms and conditions. Remarkably enough Lesk’s book is nowhere mentioned.
Quite apart from its tentative and ambiguous title, Towards the Digital Library, is a strange compilation, part concerned with document supply, part with the problems of transferring rare research materials into digital form. There can be no doubt that the latter is what will interest most scholars in this century; yet we seem no nearer to realizing the dream of Universal Availability of Publications (UAP) than we were all those years ago when it became a core activity of IFLA. Even today, as I shall show, much of what can be regarded as contributing to the concept of the digital library remains tentative, exploratory and limited. The reasons for this are not actually all that difficult to understand.
The first book which attempted to come to terms with digitization emerged from the 1996 celebration of the founding of the New York Public Library: Books, Bricks & Bytes: Libraries in the Twenty-First Century, edited by Stephen R. Graubard (Emeritus Professor of History, Brown University) and Paul LeClerc (President of the New York Public Library). This is a disappointing book since only the first essay “What is a Digital Library” by Peter Lyman addresses the issues implicit in the title. Most of the essays are concerned with traditional librarianship. It is not easy to grasp from most of the contributions what sorts of libraries we will have by middle of the twenty-first century.
For at least ten years libraries have been faced with the
conflicting demands of those requiring remote access to research materials
(monograph/serial, printed/manuscript) and the requirement to preserve and make
accessible original materials acquired at vast expense, against a drift towards
shrinking budgets. If a librarian found juggling resources difficult in 1980,
when the first storm clouds began to appear, by 1990 not even those who
approached budgeting with imagination could balance the books. The concept of
the digital library was born in desperation. For Michael Lesk “The answer
should not be despair but organization. A digital library, a collection of
information that is both digitized and organized, gives us powers we never had
with traditional libraries.” This, I believe, can be shown to be true, but only
if the material digitized is processed with accuracy and fidelity to the
original. These two conditions seem to me to be of primary importance if
digital libraries are to reduce the increasingly costly and inconvenient
business of travelling from
That the technology exists to convert print of every kind and manuscript [text] to digital form is not in doubt. I have three concerns: (1) the choice of what [text] is converted; (2) the manner in which it [text] is converted; (3) the means by which it [text] is accessible. Each of these raises considerable associated problems.
a a a
During the heyday of the facsimile reprint [1960-75]
publishers frequently reprinted texts known to be needed by research libraries
from editions that had little or no textual authority, and did so for no other
reason than that the correct edition was probably quite rare and would have
necessitated negotiating with a library which would want a share of the profit.
Better, therefore, to acquire a copy of any edition available through the book
trade; disbind it; photograph it; rebind it with some of the profit from the
facsimile; re-sell the original through the trade. Such a procedure works quite
well with a late edition of Tom Jones,
but is pointless if what is needed is the first quarto edition of Hamlet. The world’s research libraries
have hundreds of expensive facsimile reprints of dubious pedigree: we should
endeavour, with the new technology, not to repeat the mistakes of the past. In
selecting what books to convert publishers must graduate from the mindset which
reproduces every item in STC, or Wing, or ESTC, or NSTC. Sir Charles
Chadwyck-Healey did departments of English a valuable service by reprinting all
the English poetry in the
The method used to convert research materials is now being seen as crucial. The casual and uncritical methods adopted at an early stage by Project Gutenberg are clearly of little use for scholarly purposes: the electronic texts for the first few years of this extraordinary enterprise are textually inaccurate and frequently based on editions of doubtful value, yet Michael Hart (the creator of Project Gutenberg) has certainly created a digital library of sorts in the 1000+ texts available for FTP transfer at a number of sites maintained by American universities.[15] On the other hand, careful scrutiny of texts available from academic sites are not always dependable: of thirty electronic texts (mainly literary and historical) I checked against original texts in the British Library 75% were found to contain errors of substance. In 1992 I checked available electronic versions of Lewis Carroll’s celebrated Jabberwocky poem, and found no single version that corresponded with the first edition: the situation is no better today!
That there are model websites with electronic texts as good
as anything available in print is not in doubt, and I would cite the electronic
projects at the
While most academic
sites are conscientious about securing permission from copyright holders, I
have encountered numerous instances of flagrant breaches for writers still in
copyright (e.g. Yeats, Eliot, Spender). The law is clear and is available free
at the U.S. Copyright Office at the Library of Congress: http://lcweb.loc.gov/copyright/. A
detailed paper on “Copyright Law in the Electronic Environment” by Georgia
Harper is at: http://www.utsystem.edu/OGC/Intellectual
Property/faculty.htm. While fair use still operates in an electronic
environment (as it has done for photocopying) establishing fairness can present
some nice difficulties. Since it is now accepted practice for academics to put
texts they require their students to read on university-maintained sites the
danger exists that remote users (students in another university) will simply plunder
such texts for their own purposes. Mechanisms do exist to restrict use of such
texts to the host university for databases marketed on a licensed basis, but
few academics like such restrictions, preferring to let their colleagues in
other universities take note of their effort. Given the fact that university
and college libraries are unwilling to allocate funds to providing multiple
copies of works used in freshman and sophomore courses the electronic solution
has seemed both practical and economic. Furthermore, teachers are no longer
dependent on publishers of student editions for texts: a collection of texts
for minor seventeenth century poetry can be created quite simply. Such
collections form the majority of literary and historical texts available on the
Web: I know of at least two hundred such, mostly at
If the use of texts based on an edition without any authority is to be avoided, so must the integrity of what is made available electronically be zealously safeguarded. As anyone who has edited a text knows only too well, accuracy is difficult to achieve, but unless users can depend on the integrity of an electronic text then it is virtually useless. As Thomas Tanselle has reminded us:
Microfilms and other reproductions can be helpful to
scholarship if their proper use is recognized: but equating them with originals
undermines scholarship by allowing precision to be replaced with approximation
and secondary evidence to be confused with primary. The texts of many documents
that once existed are now lost forever, and the texts of others are known only
in copies. We use whatever there is; but when there are originals, we must not
let substitutes supplant them as the best evidence we can have for recovering
statements from the past.[24]
The quest for fidelity can yield interesting results: nowhere is this better seen than in the vastly expensive, yet exciting, forensic digital photography of the Beowulf manuscript in the British Library.[25] What we have now is evidently an improvement on the original, for detail not visible to the eye has been captured, enabling doubtful readings to be correctly interpreted. However, at a cost of over £1000 per page this is unlikely to be seen as a methodology for building a digital research library! There are numerous sites at which students can consult pages from medieval and renaissance manuscripts: sometimes the reproductions are complete, more often they consist of selected pages.[26]
It is not easy to see how international control can be brought to bear on which texts are digitized, any more than it has proven impossible to control preservation microfilming. One has only to consult the National Register of Microfilm Masters to realize that different copies of the same text have been filmed by different institutions at different times. As far as I am aware, there exists no mechanism for ensuring that different copies of the same text are not digitized more than once. And it must not be forgotten that, although a MARC standard was developed for cataloguing microfilms, no such standard exists for cataloguing digital texts.[27] I list below several websites which attempt to bring some measure of control to the prevailing anarchy, but it seems not to have been appreciated that cataloguing what is available electronically is as important as carrying out the conversion to digital form.
For some sixty years now commercial microfilm companies and libraries have been filming rare books, manuscripts, newspapers for two principal reasons: to preserve the texts they contain and to make them more easily available to scholars and students. Many of these microfilms have turned out to be useless, because they have deteriorated and can no longer be read.[28] In all, the efforts of publishers and libraries have made available approximately 1% of the world’s stock of printed research materials; materials in manuscript probably amount to less than 1%. Given that published output in the developed world is actually rising, it is difficult to see how it might be possible to alter this small proportion.
The concept of the digital library of the future remains a huge challenge, but before it is considered either as desirable, or practicable, we must decide what we propose to do with the world’s existing stock of print. In spite of vast sums of money spent on devising techniques for the mass de-acidification of printed books, each year that passes sees the disappearance of books printed between 1820 and 1900. Stemming the seemingly inevitable deterioration in the world’s book stock has given rise to vast sums of money being spent to microfilm collections at risk, notably by the Andrew Mellon Foundation, which has also supported JSTOR [Journal Storage]. The Chadwyck-Healey Nineteenth Century Microfiche Project, of which I am Editorial Director, has, in a decade, filmed over 12,000 nineteenth-century texts and provided them with full MARC records; but this represents a very small percentage of the texts at risk because of acidic paper.[29] It seems as if librarians must find truly staggering sums of money to pay for keeping their books and manuscripts as artifacts, and at the same time making them remotely available via the Internet. There seems to me no way in which these competing demands can be met without drastically altering library budgets.
Lesk’s Practical Digital Libraries concludes that the dream of Vannevar Bush “is about to be realized”:
More and more, it will be the digital version that is
used. Just as the reading of old newspapers moved from paper to microfilm, and
music moved from performance to recordings, and the theatre from stage to
cinema, we can expect a major shift toward digital reading. Just as in these
other examples, the old will survive, but the new will be dominant.
In December 1995 Wired magazine solicited the opinions of several American experts on “The Future of Libraries”. The questions were: (1) by when will half of the Library of Congress collections be digitized? (2) when will we see the first “Virtual Large Library”? The answers were as follows: (1) some time between 2020 and 2065; (2) 2005-2030.[30] Lesk is optimistic that
Vannevar Bush’s dream is going to be achieved, and in
one lifetime. Seventy years from 1945, when he wrote his paper, it will be
2015, and it is clear that before then we will have the equivalent of a major
research library on each desk. And it will have searching capabilities beyond
those Bush imagined. … We still lack a clear picture of how we’re going to pay
for all of this, but the explosion of the Web cannot be turned back. Whatever
combination of of greed and fear winds up supporting millions of Web sites, we
will find some solution.[31]
I find this depressing: not only will it spell doom for those who believe (as I do) that the world’s literature still needs careful scrutiny, but it will widen the gap between the “have” nations and the “have nots”. Only an engineer, it seems to me, could have reached such a conclusion; or, at least, someone for whom the distinction between what is information and what is knowledge is not understood.
Although no library can continue to seek universal coverage
in its collection policy,
The following sites were visited for a final check
during the last two weeks of September 2002. Brief mission statements are reproduced
verbatim; others are given in summary form. Addresses are correct as of this
going to press. Where provided the date of last update is provided. The
following list does not cover resources for science, technology and medicine,
and lists sources for subjects in the Humanities and Social Sciences, primarily
in English. It does not attempt to cover sources for
http://www.libraries.rutgers.edu/rulib/abtlib/easlib/ealib.htm
or the Tamil Electronic Library
www.geocities.com/Athens/5180/index.html
It excludes CD-ROM databases and resources.[32]
Useful guides available online are:
Peter Graham, Bibliography on Electronic Library/Digital Library Issues:
http://aultnis.rutgers.edu/texts/ElectLibBib.html;
Ben Gross, Digital Library Related Information and Resources:
http://interspace.grainger.uiuc.edu/~bgross/digital-libraries.html;
Jann Lynn-George, Digitization: Technical Processes, Applications and Issues:
http://www.library.ualberta.ca/library_html/libraries/law/digit2.html.
For references to printed works on digitization see
below: http://www.lita.org/ital/1603_klemperer.htm.
A very useful bibliography - “Scholarly Electronic Publishing” – is maintained
by Charles W. Bailey at the
The following are the only books on digital libraries of particular interest:
Lesk, M. Practical Digital Libraries: Books, Bytes, and Bucks. Morgan Kaufmann, 1997.
Leclerc, P. & S.R. Graubard. Books, Bricks and Bytes: Libraries in the Twenty-First Century. Transaction Publishers, 1997.
Lee, S.H. Collection Development in a Digital
Environment.
Janes, J. [et al]. The Internet Public Library Handbook: a Guide for Building and Monitoring Virtual Libraries. Neal-Schuman, 1999.
Stielow, F.J. Creating a Virtual Library. Neal-Schuman, 1999.
Hunter, G.S. Preserving Digital Information. Neal-Schuman, 1999.
Dewitt, D.L. Going Digital: Strategies for
Access, Preservation, and Conversion of Collections to a Digital Format.
Kessler, J. Internet Digital Libraries. Artech House, 1996.
Pastine, M. Collection Development: Access in
the Virtual Library.
Stern, D. Digital Libraries: Philosophies,
Technical Design Considerations.
Rothenberg, J. Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Council on Library & Information Resources, 1999.
“The Text Encoding Initiative (TEI) is an
international project to develop guidelines for the preparation and interchange
of electronic texts for scholarly research, and to satisfy a broad range of
uses by the language industries more generally.” Maintained by the
Part of UNESCO’s Memory of the World Programme and Bibliotheca Universalis (see below). A valuable text is available on image compression.
http://www.nkp.cz/altnkeng.htm
http://www.nkp.cz/start/knihcin/
A useful collection of references to books on digitization; unfortunately the list has not been updated since September 1997!
http://www.unesco.org/webworld/memory/basictexts.htm
There are now literally hundreds of gateway sites, some of
the best ones being those maintained by universities. For a survey of British
sites note Brian Kelly in issue 22 of Ariadne published in December 1999 – available
online at www.ariadne.ac.uk/issue22/
A model academic gateway is that maintained at
A novel approach to providing access to a multitude of Web
resources, using the Dewey decimal scheme. It is an electronic version of David
A. Mundie’s book CyberDewey: a Catalogue of the World Wide Web.
http://users.telerama.com/~mundie/CyberDewey/
An alternative to CyberDewey CyberStacks “is a centralized, integrated, and unified collection of significant World Wide Web (WWW) and other Internet resources categorized using the Library of Congress classification scheme. … All of the selected resources in CyberStacks(sm) are full-text, hypertext, or, hypermedia, and of a research or scholarly nature.”
http://www.public.iastate.edu/~CYBERSTACKS/
A gateway resource for Art, Design, Architecture and Media “a searchable catalogue of 2546 Internet resources that have been carefully selected and catalogued by professional librarians for the benefit of the UK Higher Education community.”
Covers a wide spectrum: Business & Management, Economics, Education, Environmental Sciences, Ethnology, Geography, Government, Law, Philosophy, Politics, Psychology, Social Science, Social Welfare, Sociology, Statistics, Women’s Studies.
Begun at the
The HUManities BULletin Board was started ca.1985, and
administered by the Office for Humanities Computing at
http://users.ox.ac.uk/~humbul/
National Information Services and Systems. A comprehensive
resource for
The UK Office for Library and Information Networking based
at the
A site for English Studies world-wide, with links to other
sites in
http://www.uni-duesseldorf.de/WWW/ulb/ang.html
There are numerous sites which encompass the concept of
the digital library. A useful essay and links to the referenced sites is at: www.lita.org/ital/1602_klemperer.htm,
compiled by Katharina Klemperer (a consultant) and Stephen Chapman (
http://robotics.stanford.edu/users/ketchpel/annbib.html
http://aultnis.rutgers.edu/texts/ElectLibBib.html
http://alexandria.sdc.ucsb.edu/public-documents/bibliography
http://www.nlc-bnc.ca/ifla/II/diglib.htm
http://www.ifla.org/ifla/II/diglib.htm
http://scils.rutgers.edu/~woernerc/biblio.html
http://sunsite.berkeley.edu/CurrentCites
A critical list of books on cyberculture in its early years (1990-95) is by David Silver: http://www.otal.umd.edu/~rccs/biblio.html. A larger (annotated) 22 page list of links is maintained by digitalLibrary.net at: http://digitallibrary.net/resources/. Michael Lesk’s various papers on digital libraries are at: http://www.lesk.com/mlesk/diglib.html.
The Digital Library Federation maintains a site that is part of the Council on Library and Information Resources, a pages providing “a working definition of digital library” and “Institutional goals for digital libraries” – www.clir.org/diglib/ A substantial list of links to digital resources on the Web (“Digital Librarian”) is maintained by Margaret Vail Anderson at http://www.servtech.com/public/mvail/ The fifteen page listing includes many minor, though interesting digital projects currently underway in the U.S. A useful list of “Selected References on the Virtual Library” is maintained by the Special Libraries Association at http://www.sla.org/ The most comprehensive bibliography is that maintained on the IFLA site at http://www.ifla.org/II/diglib.htm
A site that has grown remarkably in the past two years and
now provides a wide spectrum of information on all aspects of digital
libraries, including links to active projects in Europe and the
A monthly electronic journal, based at the Corporation for National Research Initiatives and sponsored by DARPA on behalf of the Digital Libraries Initiative. Contains numerous important contributions to the theory and practice of digital libraries. “The emergence of the networked information system environment has allowed us to envision digital library systems that transcend the limits of individual collections to embrace collections and services that are independent of both location and format.” D-Lib has published papers from the Xerox Palo Alto Research Center (PARC) on digital libraries, including “The Changing Social Roles of Documents” by Marti A. Hearst, May, 1996.
A partnership project of the national libraries of
http://www.konbib.nl/gabriel/bibliotheca-uiversalis/bibuniv.htm
Started in 1993. Except for the Beowulf project – published on CD-ROM in March 2000 – most of the original twenty projects covered by the Strategic Objectives (published in 1993) have lapsed. Part of Bibliotheca Universalis (see above).
http://portico.bl.uk/services/ric/diglib/digilib.html
Part of the Digital Libraries Initiative – Phase II,
sponsored by the National Science Foundation and other
http://memory/loc.gov/ammem/dli2/index.html
“Working on behalf of the academic community to collect,
catalogue, manage, preserve and promote the re-use of scholarly digital
resources.” Funded by the Joint Information Systems Committee (JISC) of the
http://ahds.ac.uk/bkgd/what.html
“An evolving digital library.” An exemplary site, rich in
style and substance – as it should be given the funding it receives from
several national agencies and commercial organizations. Based in the Department
of Classics at
Maintained by the
http://alexandria.sdc.ucsb.edu/
Established at
Maintained by the
http://surya.grainger.uiuc.edu/dli
“The New York Public Library Digital Library Collections represent a growing body of primary source materials created for the Web. The Digital Library Collections provide the public with digital versions of books, manuscripts, photographs, engravings, and other items as well as tools to browse, search, and analyze these materials remotely via the Internet.” For printed books the Schomburg African American Writers of the 19th Century reproduces the text of some 52 works previously published as the Schomburg Library in 1988. For historical photographs there is the Dennis collection of Small-Town America, and Images of African Americans from the 19th Century. A useful text is “Planning Digital Projects for Historical Collections”, available online.
Maintained by
Maintained by the
Maintained by the
http://http2.sils.umich.edu/UMDL/HomePage.html
Established in 1993 and now perhaps the most comprehensive
index (with links) to electronic books on the Web. Started by John M.
Ockerbloom at
http://digital.library.upenn.edu/
Links to all the major sites “dealing especially with English and American literature”. Maintained by Jack Lynch.
http://andromeda.rutgers.edu/~jlynch/
Hosted by
Compiled by Mary Mallery. Includes links to 19 U.S. Electronic Text Centers, as well as some Digital Library Projects. Not very current!
http://scc01.rutgers.edu/ceth/infosrv/ectrdir.html.
Provides links to Chapter One files in various
A list of major online text collections with links. Includes Manuscript Studies, Medieval Studies, Philosophy, Religion, Women and Gender Studies.
http://www.columbia.edu/cu/libraries/indiv/ets/offsite.subject.html
“Dscriptorium is devoted to collecting, storing and
distributing digital images of medieval manuscripts”. Compiled by Jesse D.
Hurlbut,
http://www.byu.edu/~hurlbut/dscriptorium/
A “collection of digital documents collected in the
subject areas of English literature, American literature, and Western
philosophy.” Started as a gopher service in July 1994 by Hunter Monroe at
Radcliffe Science Library in
http://sunsite.berkeley.edu/alex/
Includes guides to Arts & Humanities, Business & Employment, Communication, Computers & Information Technology, Education, Engineering, Environment, Government & Law, Health & Medicine, Places & Peoples, Recreation, Science & Mathematics, Social Sciences & Social Issues. The section on Arts & Humanities is wide-ranging. Links to a large number of sites.
Compiled by Patrick Leary at
http://www.indiana--edu.com/~victoria
“The purpose of the Model Editions Partnership is to
explore ways of creating editions of historical documents which meet the
standards scholars traditionally use in preparing printed editions. Equally
important is our goal of making these materials more widely available via the
Web. Currently provides exemplary editions of
“A Prototype Image Database & Visual Union Catalog of
Medieval and Renaissance Manuscripts.” “The Digital Scriptorium was conceived
as an image database of dated and datable medieval and renaissance manuscripts,
intended to unite scattered resources into an international tool for teaching
and scholarly research. It has evolved into a general union catalog designed
for the use of paleographers, codicologists, art historians, textual scholars
to verify with their own eyes cataloguing information about places and dates of
origin, scripts, artistic styles, and quality.” Started in 1996 at the Bancroft
Library (Berkeley) with funding from the Andrew W. Mellon Foundation. Currently
holds some 8,500 colour images based on the collections at
http://sunsite.berkeley.edu/Scriptorium/
Created by Douglas B. Killings. Links to several other websites concerned with medieval/classical studies. Provided with a good search engine.
http://sunsite.berkeley.edu/OMACL/
Renaissance English literature. Compiled by Anniina
Jokinen,
http://www.luminarium.org/renlit/
Electronic books for Windows. Some erotica (Sir Richard Burton’s translations, Fanny Hill, &c.). A commercial enterprise.
http://www.exemplary.net/publishing/index.html
“A subject guide to selected resources on the Internet.” Maintained by Nottingham Trent University Department of Library & Information Services. Resources include: Art & Design, Built Environment, Budiness & Management, Computing, Education, Engineering, Health & Human Services, Humanities, Law, cience & Mathematics, Social Sciences, Sport, Statistical Information, U.K. Official Information. Includes large collection of on- and off-line resources, mainly European in origin.
http://www.ntu.ac.uk/lis/elr.htm
Maintained at
http://scc01.rutgers.edu/ceth/
Alphabetical list of image databases maintained by the
Library at
http://dizzy.library.arizona.edu.cgi-bin/clearinghouse-to-html.pl?/
Maintained by the National Library of
http://www.nlc-bnc.ca/initiatives/erella.htm
Directory of digitised collections. The aim is “to compile a comprehensive listing of digitised documents held by libraries worldwide.” Begun in 1998.
http://www.ifla.org/VI/2/p1/desc.htm
A personal site maintained by a librarian at
Electronic Text Centres are facilities provided by a
number of universities in the
“The Computers in Teaching Initiative (CTI) was funded
from 1989-1999 by the
http://ota.ahds.ac.uk/main_ie4.html
Maintained at the
http://www.lib.virginia.edu/uvaonline.html
http://etext.lib.virginia.edu/
Sponsored by the
http://chaucer.library.emory.edu/
Largely the creation of Michael Neuman, and one of the
first such facilities to be created. Includes electronic standard critical
editions in philosophy, including Hegel and Feuerbach, as well as specialist
resources for Medieval Studies, American Studies, Politics in the
Maintained by the University Library, “established to
foster the creation and use of digitized collections and resources of interest
to the
Started in 1994 by the
An archive of Irish texts maintained by Michael
Sundermeier at
http://mockingbird.creighton.edu/english/micsun/index.htm
Maintained at
http://www.ucc.ie/celt/about.html
A list of sources covering archaeology, architecture,
archives & Libraries, bibliographies, geography, history, Irish language,
literature, newspapers, music, and theatre. An ambitious undertaking,maintained
by Susan Schreibman, New Jersey Institute of Technology for
Maintained at the
http://metalab.unc.edu/docsouth/aboutdas.html
“Our goal is to create the largest, most diverse, and most
user-friendly public library of poetic works ever assembled. The materials on
display are selected from an inventory of thousands of works by hundreds of
authors, transcribed and gathered here by the Editors and by many volunteer contributors
from around the world.” Currently 5715 works by 656 poets. Maintained by Bob
Blair (
http://www.geocities.com/~spanoudi/poems/index.html
A poetry archive, currently with 3636 poems by 137 poets. A commercial venture.
The following are just a few of the large number of digital
resources for the study of individual authors. Included are a few of the very
best, a few rather less ambitious, and some that illustrate the way in which
virtually every library and archive in the
Created by Jerome J. McGann. A model of its kind. The
archive has been under construction since 1993. Of considerable importance is
McGann’s often-cited essay, “The rationale of Hypertext” (1995). It is planned
to issue the archive on CD-ROM by the
http://jefferson.village.virginia.edu/rossetti/tour/index.html
http://jefferson.village.virginia.edu/public/jjm2f/rationale.html
“A hypermedia archive sponsored by the Library of Congress
and supported by the Getty Grant Program, the Institute for Advanced Technology
in the Humanities at the
http://jefferson.village.virginia.edu/blake/
Started in 1997 by Thomas Luxon at
http://www.dartmouth.edu/~milton/reading_room/contents/
The project, based at De Montfort University, is the
brainchild of Peter Robinson. It aims to “establish a system of transcription
for all the manuscripts and early printed books of the Canterbury Tales into computer-readable form. Transcribe the
manuscripts using this system. Compare all the manuscripts, creating a record
of their agreements and disagreements with a computer collation program. Use
computer-based methods to help reconstruct the history of the text from this
record of agreements and disagreements. Publish all the materials, the results
of our analysis, and the tools which we use in electronic form.” The tangible
fruit of this project, begun some years ago when Robinson was at
http://www.cta.dmu.ac.uk/projects/ctp/
An electronic version of Aberdeen University Library MS. 24, an illuminated bestiary dating from about 1200. The electronic facsimile is accompanied by commentaries, a transcription and translation of the original Latin.
http://www.clues.abdn.ac.uk/besttest/alt/comment/best_toc.html
MS. Cotton Vitellius A. xv. Published by the British
Library and the
http://www.uky.edu/%Ekiernan/eBeowulf/content.htm
http://www.uky.edu/ArtsSciences/English/Beowulf/eBeowulf/guide.htm
Edited by Kenneth M. Price and Ed Folsom at the
http://jefferson.village.virginia.edu/whitman/
An
Includes original documents for the following political events in Truman’s period of office: “The Decision to Drop the Atomic Bomb”, “The Recognition of Israel”, “The Marshall Plan”, “Desegregation of the Armed Forces”, “The Korean War”, “Berlin Airlift”, “North Atlantic Treaty Organization”. Within the constraints of national security one would hope for similar sites based on Presidential libraries!
“The letters of Calvin Shedd, edited and reproduced here
for the first time, tell a story of personal integrity and sacrifice in the words
of a simple man who lived in a turbulent, complicated time. The Shedd letters
add another fascinating source to our national reservoir of primary source
materials relating to the Civil War.” Reproduced from originals in the
http://www.library.miami.edu/archives/shedd/
“The mission … is to produce advanced electronic texts to be used by students, scholars, and admirers of literature around the world. Our goal is to provide free access to a variety of texts from world literature available in several languages and/or editions, with forums for communication regarding these works, for all types of readers.” Includes works by Dante, Jane Austen, the Brontes, Chaucer, Hardy, Milton, Poe, Shakespeare, Mary Shelley, Verne, and Voltaire.
A site of primary importance for the subject. New texts regularly added. “The goal of the Fictorian Women Writers Project is to produce highly accurate trsnacriptions of works by British women writers of the 19th century, encoded using the Standard Generalized Markup Language (SGML).” Includes a list of works in preparation. Edited by Perry Willett.
http://www.indiana-edu.com/~letrs/vwwp/
A comprehensive site for students and researchers in
English literature. Started in 1991 in the English Department at
http://englis-server.hss.cmu.edu/18th/
Electronic texts by English and American writers.
An irreverent site with excerpts from modern American writers, as well as complete texts. The links to other sites badly needs revision as they are mostly wrong! Presumably a casualty?
http://www.c3f.com/alivfree.html
An offbeat site with extensive links (over 315,000) to many subjects, including Literature.
“A collection of electronic texts written by US authors or
widely read by Americans in the Gilded Age (loosely defined hre as 1866-1901).
I assign these as primary sources for my William and Mary students doing
projects in my postbellum
http://www.wm.edu/~srnels/giltext.html
Mostly concerned with “freethought” literature. Published by the Internet Infidels, “an educational nonprofit organization of unpaid volunteers dedicated to the growth and maintenance of the most comprehensive freethought web site on the Internet. Our mission is to defend and promote metaphysical naturalism, the view that our natural world is all that there is, a closed system in no need of an explanation and sufficient unto itself. … With over 6,000 documents and more than 190,000 unique visitors per month, the Secular Web is the largest and most heavily visited nontheistic web sire on the Internet.”
http://www.infidels.org/library/
“A digital library of primary sources in American social
history from the antebellum period through reconstruction. The collection is partiicularly
strong in the subject areas of education, psychology, American history,
sociology, religion, and science and technology. The colllection currently
contains approximately 1,600 books and 50,000 journal articles with 19th
century imprints.” Maintained at the
A project based at MIT. Currently some 400 works of classical literature by 59 authors. All texts in English. Compiled by Daniel C. Stevenson. Claims copyright.
A project which had its roots in pre-Web days when academic exchange was usually by ftp and gopher. The brainchild of Don Mabry. A site rich in resources for the study of history.
http://www.geocities.com/Athens/Forum/9061/
Published by the
http://www.library.utoronto.ca/utel/rp/
“A project to provide enhanced access to
“Classic Christian books in electronic format.” With an online author index, and a search engine. A CD-ROM was published in February, 2000.
An ambitious site maintained by the
“Great Books Online.” Begun in 1993 by Steven H. van Leeuwen with an electronic version of Whitman’s Leaves of Grass. Reference works and major English & American authors, including several still in copyright (Eliot, Yeats). It is not entirely clear whether the electronic versions of authors still in copyright have the permission of the copyright holders. The texts are, for the most part, accurate and the choice of editions appropriate. The Waste Land, for example, is from the first Boni & Livewright edition of 1922. Bartleby bibliographical records for the texts published have ISBNs and copyright is claimed on the electronic edition, which I find bizarre. There is a six page user’s agreement.
Links to other electronic sites, with a few texts found only on the Web.
http://www.geocities.com/SoHo/7786/books.htm
“Home to electronic texts of all kinds, from the sacred to
the profane, from the political to the personal.” Started in 1992 by Paul
Southworth at the
“The next generation library. NetLibrary offers the world’s largest library of eBooks. Our “electronic books” give you the power to read and research. From anywhere. At any time of the day.” New York Times, January 18, 2000: “netLibrary will enter the market for supplying digitized books to public libraries with a program aimed at allowing the libraries to sample e-books for six months without charge.” Library Journal, January 17, 2000: “netLibrary donating 150,000 eBooks to 100 top Public Libraries.” Browsable index of eBooks by category. Language & Linguistics: 157 titles; American Literature: 333 titles; Classical Literature: 39 titles; European Literature: 234 titles.
Fiction; Children; Humor; Non fiction; Poetry; Mystery;
Plays; Horror/Sci-Fi; Medical/Psychology. Based in
“As the site grows in leaps and bounds, it’s impossible to put allwhat’s happening on the main page …” Sells ebooks from 25 electronic publishers.
Founded by David Rothman “national coordinator of TeleRead and a long-time advocate of electronic books and a well-stocked national digital library.” “Teleread is a nonpartisan plan to get electronic books into American homes – through a national digital library and small sharp-screened computers – in an era of declining literacy. With links to other sites with comparable aims, including Project Gutenberg, Project Bartleby, American Literary Classics, The On-Line Books Page, The Electric Book, The English Server, Online Originals, Boson Books, ReadToMe, &c.
A site devoted to Napoleonic studies.
http://napoleonic-literature.simplenet.com/
A Professor in the English Department at the
http://www.english.upenn/edu/~mgamer/
While most major publishers that have e-texts on their lists provide for their purchase online, the number of small enterprises (almost the electronic counterpart to the private press) is growing weekly. Every conceivable patronage is catered for, including pornography, fiction, romance, as well as non-fiction for most hobbies and activities. I list just a representative sample here.
A small player so far with a limited list. “Instead of
just uploading ASCII text, we’ve given you full color scans of these books, so
that you can enjoy the weathered character that makes them so fascinating.” The
site is administered by Antique Books Inc. in
http://www.antiquebooks.net/readpage.html
A division of Mozena Multimedia Publishing. “Pay for information, not paper and binding!” Currently 188 “Eclassics”, mostly derived from Project Gutenberg and MIT. “Why pay for pulic [!] domain classic [!] when you can download them for free from the internet?”
http://www.etext.net/cgi-bin/Eclassics.cgi
“Where all things are possible!” Started by Michele J. Johnson, a spiritual healer. There are free and priced ebooks on this site, fiction and nonfiction.
The Simonsays site gives links to various vendors of ebooks and ebook readers.
http://www.simonsays.com/ebooks
Launched the Rocket-Library on February 4, 2000. Compiled by volunteers, with 3,000 titles on offer – “the second largest repository of free eBooks on the Web”. The site features online discussion forums of current issues in electronic publishing.
Almost entirely fiction written for online sale. Claims copyright.
http://www.immortalpublications.com/
Kinetic and hypertext poetry. Quite substantial list.
Maintained by SUNY
“Out of copyright books neautifully enhanced with electornic [!] media, these ebooks include both classic works and little known writings from the past.”
http://www.teleport.com/~writers/SERIALS/oldtime/mybooks.html
Publishes fiction in conventional format. Specimen chapters (normally no. 1) available online.
“A new art form is about to be born – interactive, electronic works that are about people, story, and theme. It will do computer entertainment what Henry Fielding’s Tom Jones did for novels and D.W. Griffith’s “The Birth of a Nation” did for movies.” Developed by Chris Crawford, doyen of computer games. Crawford’s authoring software is known as the “Erasmatron” – available from the author.
http://www.thuntek.net/~scg/sts/efiction.htm
“The Open eBook Forum (OEBF) is an association of hardware
and software companies, publishers and users of electronic books and related
organizations whose goals are to establish common specifications for electronic
book systems.” A commercial version of AHDS (above). Started in October 1998 at
the first electronic book conference sponsored by the National Institute of Standards
and Technology at
Founded by Jim Sachs and Tom Pomeroy “to redefine the way
large volumes of information are distributed and read. … Softbook Press has
developed patented technologies that are transforming the way orga nizations of
all sizes distribute, access, and use reading material. The company is currently
building strategic partnerships with leading publishers and content providers
to make available an ever-expanding library of online material. Through the
SoftBook system, SoftBook Press provides a cost-effective way for any
organization to distribute and manage information – without paper.” Bought by
Gemstar International in January, 2000. Based in
[1] The website devoted to this extraordinary venture is still under construction (20-04-00), but the home page states clearly: “1) The Revival of the Ancient Library of Alexandria Project aims at building a universal modern public research library to eb a center of culture, science and academic research. 2) The Library is to provide both the national and international communities of scholars and researchers with unique collections and facilities focusing on Alexandrian, Egyptian, ancient and medieval civilizations as well as on contemporary disciplines. … 3) The Bibliotheca Alexandrina shall sponsor intensive studies on the historical and contemporary cultural heritage of the region.” http://www.greece.org/alexandria/library/ and http://www.bibalex.gov.eg and http://www.unesco.org/webworld/alex/
[2] It is probably sensible to regard “major” collections as those with more than one million items.
[3] Universal Resource Locators: the Web address used to locate the conversion. Web sites pften migrate from one location to another, and this is likely to prove problematical for any cataloguing agency to deal with.
[4] I
estimate that the present total for electronic texts is of the order of 50,000.
This is less than one percent of the total holdings of the principal university
libraries in the
[5] Lesk,
Michael (1997)Practical Digital Libraries.
[6] SGML is often referred to as the parent language, with HTML [HyperText Markup Language] and now XML [Extensible Markup Language] as special derivatives. XML is not a fixed format, and is designed to enable SGML on the Web. For a helpful paper – “Frequently Asked Questions about the Extensible Markup Language – is available at: http://www.ucc.ie/xml/. For a similar paper on SGML note: http://www.infosys.utas.edu.au/info/sgmlfaq.txt.
[7] Alston, R.C. (1992) ‘Preserving the Record’. Archives, xx(88), October 1992.
[8]
[9] Edited
by Leona Carpenter, Simon Shaw and Andrew Prescott. [
[10] The collection of tracts on health and sanitation (inter alia) collected by Edwin Chadwick in the British Library, bound in 535 volumes (CT1-535).
[11] Johns,
[12] An idea I first suggested to him in 1986, but which seemed impossibly ambitious at that time; as did my suggestion that he publish in machine-readable form Migne’s Patrologia.
[13] Every week I receive information of library proposals to digitize collections of materials of minor importance.
[14] One reason for this is the fact that it is not always understood that the metadata which form part of a website, in which the materials are classified in such a way as to ensure their capture by the major search engines, is crucially important. The most effective metadata protocol is what is known as “Dublin Core”, developed by OCLC. See: “Dublin Core Metadata Initiative”, http://purl.oclc.org/dc/.
[15] Project
Gutenberg was, from the start, designed as a supplier of etexts via ftp [File
Transfer Protocol]. These have been either keyed (early days) or scanned using
OCR software, proofed and sent to the project’s base at
[16] There is the Glassbook Reader, the Glassbook Plus Reader, and other software products for book publishers, distributors and booksellers, as well as libraries. There is even Glassbook Kiosk – “An e-book purchasing station for bookstores, coffee shops, or anywhere book lovers gather.” http://www.glassbook.com/.
[17] NetLibrary for PCs – http://www.netlibrary.com.
[18] Rocket eBook for palm devices and PDAs – http://www.rocket-ebook.com.
[19] Peanut Press for PDAs – http://peanutPress.com.
[20] Softbook for PDAs – http://www.softbook.com.
[21] Softlock for PCs – http://www.softlock.com.
[23] These
sites are, of course, for students whose principal language is English; there
are hundreds more for universities in
[24]
Tanselle, G. Thomas (1998) ‘Reproductions and Scholarship’. Literature and
Artifacts.
[25] http://www.uky.edu/ArtsSciences/English/Beowulf/eBeowulf/guide.htm. This is the online Guide to the CDROM edition of the text and full supporting documentation, edited by Kevin Kiernan. The Electronic Beowulf proved an expensive project because of the number of high quality colour images. The camera used was a Kontron, now marketed as the JenOptik ProgRes 3008. Typical file sizes for the JPEG colour pages is 20MB. The CDROM images are severely compressed.
[26] For example, the Cambridge University Library site (http://www.lib.cam.ac.uk/Digital/) makes available MS. Ee.3.59, and Anglo-Norman verse life of Edward the Confessor, written ca. 1235. Several illustrations from manuscripts in the Vatican Library are available at the Library of Congress site “Rome Reborn” (http://lcweb.loc.gov/exhibits/vatican/toc.html). Unfortunately, no site I am aware of provides technical information on the conversions.
[27] The best standard is Dublin Core: see note 10.
[28] My own sampling of films made of items in STC (1475-1640) and Wing (1641-1700) by University Microfilms suggests that about 50% of the films made before 1980 are difficult to read, and certainly could not be used for conversion to digital form using any of the currently available microfilm scanners.
[29] A serious problem has affected the choice of texts for this project: namely, the preservation treatment of books at risk, making them impossible to film, especially if lamination has been prescribed. The massive use of lamination in the British Library makes it unlikely that books treated this way will ever be digitized. It is ironic that literally thousands of items which might have been ideal candidates for a digital library project must now be preserved as they are for ever.
[30] “Reality Check – The Future of Libraries.” The experts were: Ken Dowlin, former librarian of the San Francisco Public Library; Hector Garcia-Molina, Professor at Stanford University, and Principal Investigator for the Stanford Digital Library Project; Clifford Lynch, Director of Library Automation, University of California; Ellen Poisson, Assistant Director, New York Public Library; Robert Zich, Director of Electronic Programs, Library of Congress. http://www.wired.com/wired/archive/3.12/reality_check.html.
[31] Lesk, Michael (1997) Practical Digital Libraries, p. 270. Bush’s landmark essay “As We May Think” was published in The Atlantic Monthly, July, 1945. In 1967 Vannevar Bush published Science is not Enough An essay in that work is entitled “Memex Revisited” in which he, like Lesk, believes that a radical alteration of the ways in which we learn and think is just round the corner.