New technologies, old books

by Admin on May 24, 2013 · Statistical analysis, The Book Trade

digitization and dataficationRare Books Digest occasionally hosts opinions and views of international book trade professionals such as this week’s contributor, technology engineer, book collector, and book dealer – Jim Sekkes (

Since the invention of the printing press in the middle of the fifteenth century, an estimated 130 million books have been written and published. It took Google eight years to scan over 25 million of them, or 20% of the available books. The book-scanning project was recently extended to include Optical Character Recognition (OCR) which converts scanned images into text. The digitization of books may ultimately be indexed and analyzed; a process referred to as the datafication of contents.

At the same time, Amazon is leading a similar initiative, digitizing millions of new books through an arrangement with hundreds of publishers for its Kindle e-reader.  These are not simple scans of pages, but are instead active digital content that is consumed by applications rendering smart presentation of content with controls to customize, extend and integrate with other devices or web apps.

The hardware side of technological innovation is also expanding rapidly. The new development of super-fast, book scanning robotics allow “book flipping scanning,” and digitize at speeds of above 200 pages per minute.  Scanner automation offers multiple modes of operation to accommodate a variety of high resolution image capturing for different types of books.

But what does all this gee-whiz technology mean to the rare book trade?  High speed or not, the complete book scanning technology has the potential to revolutionize the functions of cataloging, searching, comparing and trading rare books.  In theory, if every available copy of a particular rare book was digitized (not just a single copy as in Google’s initiative), then all kinds of trade professionals would conduct the majority of their functions electronically. Outside buying and selling, a seller will have the ability to offer a closer examination of the whole book worldwide, without the risk of causing any damage or losses to the book. A buyer on the other end should be able to compare all available for sale copies, as well as those held by institutions that are not offered for sale.

In addition to the usefulness that OCR software has to offer, the book trade may also benefit from software that performs fast pixel-to-pixel comparisons.  By comparing images a buyer or researcher can spot both minor and major differences between two copies of the identical publication. Image comparison will show those areas where pixels do not match, using a “deep” mode that detects variations in text or illustrations, or a “surface” mode that detects page loss or any extra added content. The level of comparison can be adjusted between these two modes by setting the level of transparency, which in essence leaves areas with unmatching pixels undetected up to the desired level. The ideal software is also equipped with zooming capabilities, for a closer look at the differences identified.

 The bibliophile equipped with such tools can easily get answers to questions such as: How many copies are surviving today? Is the published content identical? Are there any pages or fragments of pages missing? Are there any handwritten annotations, signatures, stains, discoloration?  And so forth, allowing the entire process of rare book research to become much more precise.

In the antiquarian book world added remarks, inscriptions or signatures are not necessarily either a flaw or a desirable trait.  Unlike the digital version, the embedded handwritten notes, sketches, bookplates, corrections, inscriptions and other marks represent text or image that have a life in the physical world. The historical reconstruction of how previous readers, some of whom were important during their lifetime, thought or perceived a particular work can be extremely valuable to a collector. 

Just this week, Sotheby’s held an auction in London titled – First Editions Second Thoughts, consisting of fifty contemporary first edition books annotated by their authors.  The total sale reached £439,000 ($665,410).  A unique copy of the “Harry Potter and the Philosopher’s Stone” annotated by J.K. Rowling with 43 pages of “second thoughts” commentary, alone sold for a record £150,000 ($227,421).

Such unique features of important information are extremely difficult to analyze or compare electronically, without human intelligence supplementing. While new tools provide drastically improved capabilities to conduct visual inspections, comparisons and statistical analysis, there is still a great deal of research that can only be conducted by an expert human being, and through a good old physical examination. Technology is nowhere close to providing a way to detect signals receptive by the human nose or feeling. There is still a need to touch and smell the book in order to get an intelligent read on its composition, texture, stability and smell (good or bad).

About the author

{ 1 comment… read it below or add one }

Alan Culpin May 27, 2013 at 4:43 pm

Enough of the digital book already!
Yes, they may be a huge boon to researchers, but those of use who love real books, this is of little interest. The publicity garnered by the kindle crowd has served to wash coverage of real books into the backwater of the book world.
Fortunately there are enough of those who also love real books that allow we who purvey rare and op books to the discriminating public, to eke out a meagre living. As always.


Leave a Comment

{ 4 trackbacks }

Previous post:

Next post: