Non-fiction E-books 3: CD sales and ‘elibraries’

Online Currents 2003 – 18(6): 23-25

In two previous articles I discussed the current state of eBook development and looked at online sources of downloadable eBooks. Here I want to describe two other distribution models – the supply of eBooks in bulk on CDs, and the use of an ‘elibrary’ model to permit limited-time access to eBooks while online.

All businesses mentioned here can be found online at www.nameofcompany.com; e.g. ‘www.samizdat.com’, etc.

 

CD Sales

There are two major sources of free out-of-copyright eBooks on CD, Samizdat and BlackMask Online. Both appear to be run as one-man businesses: B & R Samizdat Express, by Richard Seltzer and BlackMask by David Moynihan. Both carry out their marketing through websites; Blackmask also provides its texts for free download. Both derive most of their material from Project Gutenberg, which was discussed in the previous article. Both attempt to provide answers to the question ‘Why should I pay for something I can get online for free?’. The differences between them are mainly a matter of price, classification and emphasis.

Samizdat

‘Samizdat’ was the term used in Soviet Russia for underground copies of ‘subversive’ literature, usually run off on photocopiers and passed from hand to hand among dissidents. Richard Seltzer’s Samizdat site attempts to maintain this tradition by making large numbers of texts available world-wide on CDs. One CD in particular is classified as ‘non-fiction’, although non-fiction items appear on some of the other disks as well. Australian librarians will also be interested in a special ‘Australia’ CD, and I have discussed these both below.

The Samizdat non-fiction CD contains the CIA Factbook on Intelligence, an Icelandic Primer from 1886 in .PDF format, a few Web pages providing access to the books and then the books themselves – 959 plain text files organized into about 140 folders and sub-folders. The CDs can be purchased from http://store.yahoo.com/samizdat.

As I mentioned in the previous articles, there is no length requirement established for an eBook; the shortest file in the collection (a statement on Hemispheric Defence issued by Canada and the US in 1940) contains only 169 words and the longest (a collection of Presidential State of the Union Addresses extending over 212 years) contains 1,613,500 words, the equivalent of a printed A4 document 2947 pages long. (This is more than twice the length of the next biggest document in the collection, Plutarch’s Lives). In total, 428 Mb of disk space is given over to text files, mostly relating to the humanities and mainly published before 1920; perhaps half of these are of interest to non-historians.

The Australian CD has a much smaller collection of 34 text files making up 14 Mb in total and ranging in size from ‘An Address to the Inhabitants of the Colonies’ by the Reverend Richard Johnson in 1792 (12,795 words) to Edward Eyre’s Journals of Expeditions (253,254 words). They include some Henry Lawson stories and the Poems of Henry Kendall. Over half of them – i.e. all the non-fiction documents – can be found on the Non-Fiction CD as well.

Access to the files has been improved over the Gutenberg system by expanding on their names: thus the file stored as ‘gbnlw10.txt’ in the Gutenberg site would have been renamed as ‘my life and writings.txt’ for Samizdat. However, authors’ names are not included in the file names (this is a sensible move, and common practice in bootleg sites), making it impossible to search the file name list for particular authors. As mentioned above, files have also been sorted into directories and subdirectories, but this is not always consistent or useful. There is no cross-referencing between folders, though this has been possible in theory since Windows 95 permitted the use of ‘short cuts’.

The files themselves are in the format favoured by Project Gutenberg – plain text with hard carriage return characters at the end of each line and double returns at the end of each paragraph. This makes them very irritating to read, especially on a smaller screen, and most readers will want to convert them – for instance, with a Word macro – before reading at any length. Since the CD is not rewritable this conversion will have to be done anew each time the file is loaded, unless the converted copies are saved elsewhere.

‘Official’ access to the files is through an HTML file in the root directory of the CD. This is a long Web page beginning with a list of directories and subdirectories and then showing the files in each subdirectory, with a hyperlink to each. Titles and authors are generally listed in full, making it possible to search this page for specific works or authors; but there is no alphabetical listing of either.

As a bonus a demo version of the text-reading program ReadPlease is included on the CDs. This can be installed and used to read the books aloud for 30 days, after which it is limited to smaller chunks of text. I tried the program and was pleasantly surprised by the quality of the reading, although it does require more concentration to follow the text than if it was read by a human.

The HTML file lists the other CDs available from Samizdat: they include American Literature, British Literature, Children’s Books, World Literature and Religion. Each disk is available for $US29 plus postage and handling – about $AU50 at current exchange rates.

BlackMask

Blackmask is largely the work of David Moynihan, whose heroic efforts in making eBooks available may be second only to that of Professor Michael Hart, founder of Project Gutenberg. Like Hart, Moynihan takes the view that quantity comes first, quality second. His books are largely taken from the Project Gutenberg site, but in bringing them across to BlackMask Moynihan converts them into several more useful file formats and provides a brief extract from the book to act as an indication of its contents. He lists new books in a daily newsletter – and there are usually a dozen or more every day – and adds them to a growing collection of CDs.

I purchased a collection of 4 CDs containing BlackMask eBooks in a zipped HTML format. This actually provides five files for each eBook; the text itself, a table of contents, the BlackMask banner and graphic, and a frames page to display them together on the screen. Although the table of contents was useful at times, I found that having it on the screen all the time was a nuisance, and ultimately preferred to read the books in an unframed display. Using a zipped format makes it possible to fit more on a CD: the 636 Kb of material on the first disk probably represents over a gigabyte of text. The site claims that there are 10,000 ‘books’ on the four CDs, but I could only (only!) identify 5982 unique titles, and some of these were overlapping collections, e.g. the Koran as a unit plus each of its books as a separate file.

Perhaps 20% of the books are non-fiction; some of these are classics, but many others would be of limited interest to anyone but an antiquarian. Each CD has two sets of HTML files in the root directory; the first provides hyperlinks to books on particular topics, and the second an alphabetical list of books by hyperlinked title. Unfortunately the division into topics is not particularly logical, and on the first CD neither set of files included authors, making it virtually impossible to locate books by a particular author. Things had improved by the fourth CD, which includes authors in its hyperlinked title lists, but by this time I had become so frustrated at searching through the files that I decided to compile my own complete database of titles and authors (available from me on request – jonjermey@optusnet.com.au). In doing so I found several errors in file names and in the attribution of works to particular authors.

Until I had done this it was often faster and easier to search for and download a book directly from the BlackMask website than to try and establish whether I already had it on CD.

The BlackMask CDs were $US25 for four plus postage and handling; a grand total of about $AU50 for 6,000 books, or less than one cent each.

Conclusion

Buying these CDs is like walking into a huge disorganized second-hand bookstore. There is certainly a lot of valuable material here, but establishing exactly what it is, and then finding it, is a job in its own right. The Samizdat material is better organized, but the BlackMask collection is bigger and the HTML file format more user-friendly than plain text. Both would benefit enormously from the attention of an information scientist, but given the current drop in interest about eBooks this is unlikely to happen any time soon. Both principals deserve enormous praise and encouragement for their pioneering efforts. And my thanks to Richard Seltzer for providing a review copy of the Samizdat CDs.

Since the price is minimal and there is not a great deal of overlap apparent between them, a librarian may want to acquire both sets – but be prepared for extensive work in making the material acceptable to users and clients.

eLibrary systems

The essence of a library system as applied to eBooks is the concept of limited time. That is, the user pays to have access to the eBook for a fixed period, which may be defined in terms of contact time (e.g. 10 hours with the book open on the user’s PC screen) or in terms of elapsed time (e.g. a month during which the user can access the book at any time). To maintain the viability of the library some form of copy-protection must be exercised; and to encourage ongoing commitment, users must expect the library to be around for the foreseeable future.

Paper-book libraries are not money-making propositions. Paid lending libraries, once common, have virtually vanished, and nearly all libraries are either funded by the public sector or cross-subsidised by other business activities. It is interesting, then, that both the library-based models for eBook access which I will discuss here have arisen within the private sector. (Of course, there are also many existing libraries – including some in Australia – which have begun to purchase eBooks and incorporate them into their collections in the same way as other media, but I will not discuss these here.)

Questia

Questia describes itself as ‘the world’s largest online library’. It is targeted at students, with quotes on the home page like ‘Join now and get a better grade on your next paper’, and tips on writing research reports. It claims to have ‘over 45,000 books’ and ‘over 25,000 journal articles’. (For comparison, a large university library would have over a million books, though many of these would be duplicates.) There is a search option on the home page, which suffers from the usual limitations of text searches, and a ‘power search’ which allows users to search the text of books as well as titles and authors.

A search for books with “Australia” in the title brought 52 results. There is no indication of how these are sequenced, but the first five were:

  • Surrender Australia? Essays in the Study and Use of History: Geoffrey Blainey and Asian Immigration by Andrew Markus and M. C. Ricklefs; 1985
  • Sociological Theory and Educational Reality: Education and Society inAustralia Since 1949 by Alan Barcan; 1993
  • Race, Colour, and Identity in Australia and New Zealand by John Docker; 2000
  • Australia‘s Outlook on Asia by Werner Levi; 1958
  • Australia: Aboriginal Paintings, Arnhem Land by Herbert Read; 1954

A similar search for “New Zealand” titles yielded 25 results ranging in date from 1923 to 2000. Questia users can also search by topic: a ‘browse by subject’ list contains an expanding set of categories and subcategories, but only down as far as the level of, say, ‘Australian and Oceania History’, which has 151 items. It would have been nice to take this down at least one level further to reduce the number of items the user has to browse through.

Once an item has been found, non-subscribers can read the first page; subscribers can read the entire book, bookmark pages, highlight and copy passages, create citations and references, look up words and add the book to their own bookshelf. There are also a few free books for non-subscribers to try. Books contain hyperlinked tables of contents, and may include hyperlinked indexes and footnotes.

It is the reading itself that is the least satisfying aspect of Questia. The document appears in a framed view, with an optional hyperlinked table of contents to the left, and a Questia banner at the top which cannot be hidden. To read a book, the user must be online and sitting at a computer; there is no option of copying the material to a laptop or PDA. Obviously to read and work on a book of any length would require a continuous connection to the web for many hours. Subscribers can apparently choose their own font size and style, although non-subscribers cannot, and for me to read a full page required scrolling up and down. In fact, the retention of the ‘page by page’ model, although it allows tables of contents and indexes to work with minimal modification, is a major irritation.

There are several options for subscribing to Questia, ranging from a flat monthly fee of $US19.95 to an annual subscription of $US129.95. A free time trial was mentioned at one location on the site but I was unable to find out any more about this. While I remain a sceptic in general about charging for content, I have to concede that Questia has put together an impressive package: and if they could find some way to get their books to users in a PDA-readable format I would be very tempted to subscribe.

netLibrary

netLibrary began as a service directed towards the general public, but after some financial vicissitudes, it has re-established itself as a partner for existing libraries. A library can set up an account with netLibrary and then buy netLibrary eBooks for its subscribers. eBooks are ‘purchased’ by the library from netLibrary in the same way as paper books are bought from a publisher, and for similar prices, plus a service fee for netLibrary themselves.

Subscribers can then arrange to ‘borrow’ the book from the library, either by making a physical visit or by phone or email, and then read the book at home while online. Only one subscriber can read a particular book purchased by a library at a time. A quick look at the netLibrary titles available through the Hong Kong Public Library (http://www.hkpl.gov.hk/01resources/ebooks_subject/sub_education.html) showed them to be more up-to-date than those at Questia, but that may merely reflect the librarians’ choices.

The netLibrary site claims that more than 5,500 libraries and organisations are using this service including Los Angeles Public Library, Hong Kong Public Library and the National Library Board of Singapore, and that it has ‘more than 40,000’ eBooks available. Yarra Plenty is an Australian regional public library which has adopted the netLibrary system; their largely positive experience is described at http://www.alia.org.au/conferences/online2003/conferencepapers/lewis_saunders.htmon

Without access via a subscribing library, I was unable to view a book on the screen and assess the reading experience, but presumably the difficulties of reading while online on a PC screen will apply to netLibrary as much as to Questia.

If netLibrary succeeds for Australian public libraries it will represent a major boost out of public funds for netLibrary itself and the publishers it represents. Should this be allowed to happen without any scrutiny of the costs involved in supplying eBooks, and how these compare with costs for paper books? Or, to put it bluntly, why should one eBook from netLibrary cost more than 6,000 from BlackMask?

Conclusion

These paid elibraries have managed to collect in a few years more than ten times the number of eBooks which Project Gutenberg has accumulated since 1975, and considerably more than any downloaded eBook distributor. Whether they are approaching ‘critical mass’ and whether the economic climate will support their further growth remains to be seen, but perhaps the library model does have a future. If broadband becomes commonplace and these ‘elibraries’ can cater for PDA-based readers then they have a chance of achieving the eBook Holy Grail by satisfying both readers and publishers. We live in interesting times.