Book Liberator in the News

The Book Liberator got a writeup in Good magazine! I sent in hundreds of rambling words about the project, and Theo distilled them into a few pithy quotes. Thanks, Theo, for making me seem clever!

Posted in Uncategorized | Comments closed

Prototypes ahoy!

Last week, Ian and Winnie got all heroic with some tools, wood and plexi. The result is a couple sweet prototypes, which we’ll be sending to the Decapod folks so they can hack software to process BookLib images.

In other news, I put a prototype design of the camera mount on thingiverse. Ian’s original washer-and-bolt design was a little janky, and when we get the parameters right on the mount, we should be able to print them quite cheaply.

We’re moving quite quickly towards a shippable kit. The cradle design is stable. We have dimensions for the plexi and the cube. We’re down to exploring two basic design paths (bent plexi vs. two flat sheets). Everything else about the prototype is in the detail stage.

There are photos of the wood hackery around, and I’ll try to post some soon.

Posted in Uncategorized | Tagged , , | Comments closed

The people’s words

One of the biggest problems for people, like Project Gutenberg, who want to digitize and share our culture’s public domain works, is tracking down and confirming that a work is no longer under copyright. Gutenberg is not alone, towards the end of last month I ran into an opinion piece on teleread arguing that Amazon is right to keep away from public domain books for this same reason.

In the United States we have a resource with authoritative records about which works are covered by copyright and which ones are in the public domain, it is called the Library of Congress. Not only does the Library of Congress have authoritative records but, as the largest library in the world, it has physical copies of more works than any other institution. Unfortunately, the Library of Congress has no plans to digitize their collection. For those of us involved with book digitization, this is something of a sore topic.

So it was a great moment for me this morning to read that the Japanese National Diet Library, a close equivalent of our Library of Congress, is digitizing all their out of copyright works. Not only is the Diet digitizing and distributing the of out of copyright works, they are also beginning a process of digitizing the portions of their collection still under copyright in order to preserve those works more easily against physical destruction.

Of course, if preservation is our goal, the true solution is obvious and has been known in this country since its founding:

“”"
[T]he lost cannot be recovered; but let us save what remains: not by vaults and locks which fence them from the public eye and use, in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.
“”"
-Thomas Jefferson
(Boyd, ‘These Precious Monuments of…Our History,’ pp.175-6)

Whatever the reason, it is great to see leading institutions take steps to share the public domain with the public.

Posted in In the News | Comments closed

Pushing Paper

Great piece up today by Paul Grahm called Post-Medium Publishing:

Almost every form of publishing has been organized as if the medium was what they were selling, and the content was irrelevant. Book publishers, for example, set prices based on the cost of producing and distributing books. They treat the words printed in the book the same way a textile manufacturer treats the patterns printed on its fabrics.

Economically, the print media are in the business of marking up paper. We can all imagine an old-style editor getting a scoop and saying “this will sell a lot of papers!” Cross out that final S and you’re describing their business model. The reason they make less money now is that people don’t need as much paper.

Your “content” is ripe for unpulping.

Posted in Uncategorized | Comments closed

A Bit about Book Ripping

Digitizing your own books

The Book Ripper community, bkrpr.org, came together to take the difficulty out of digitizing books. Unlike music, movies, or even loose paper, books have proved surprisingly difficult to break out of their analog format. Very complicated robotic scanners, costing tens of thousands of dollars, have been built to address this problem, but their size and cost make them practical only for large institutions, leaving individuals who want digital books at the mercy of book publishers.

As it turns out, digitizing your books is not hard. The advances in small cameras make it possible to achieve high quality results cheaply and at a rate of 600-900 pages per hour. That is what we do at bkrpr.org and there are a number of advantages compared to getting your ebooks from publishers.

Cost

The most impressive advantage is cost. For people who own books already, getting digital copies of those books from publishers is an expensive prospect. Commercial ebooks have no commodity price and can vary wildly by publishing outlet, but let’s assume a $10 price for each ebook. The book ripper design we use costs around $250 dollars, which includes the price of two small point and shoot cameras. If you own more than 25 books, building a scanner will be cheaper than buying electronic editions. For those of us that own hundreds or thousands of books, the math becomes obvious.

Control

In the wake of Amazon’s memory hole-ing of George Orwell’s works, their retroactive disabling of the text-to-speech capabilities on new readers, and the continuing industry wide obsession with DRM, control over your ebooks has been gaining visibility as an issue in the digitization of our vast printed catalogue. With publisher-made ebooks, they control what devices can read it, what software can do to it, where it can be stored, how many times you can download it, and how long you have access to it; people doubt so strongly that you will even be able to read the closed formats that publishers sell books in that they suggest insurance as a way to cover your losses when your digital copies disappear.

The books that you convert, you control.

Authority

Of course, the illegal distribution channels release everything in free formats, and release it all for free, so there they would seem to be ahead of publishers on both fronts, and much less effort than home book ripping. Where the illegal copy market falls short, besides the obvious issues of copyright infringement, is in the reliability of their versions.

Illegal copies are known for typos and OCR errors, lack of text and page formatting, and spotty availability of works. Unfortunately, legal ebooks are known for these same things. Neither can be relied upon as an authoritative representation of the author’s work and neither offer any way to verify or improve the accuracy of the digital work other than by reference to the printed one.

In contrast, when you scan the books yourself, you retain high quality images of every page. Viewers and other tools will let you jump back and forth from the text to the image versions. OCR can be corrected over time or re-run with better software and formatting can be added or corrected, but only if you have the page images.

Until digital distribution becomes the original and authoritative method of book publishing, as it has for the web, having the page images will remain the only way to guarantee or improve the accuracy of your digital books.

Because you love your books

If you love your books, if you care enough about them that you need every word to be right and you want the digital copy to be as beautiful as the paper one, you should scan them yourself. If you don’t care that much about a book, the publishers’ copy or the illegal copy may be all you need, or you might be better off cutting the spines off your existing books and feeding them through a high speed USB scanner. You can always recycle the pages afterwards.

If nothing but the best will do, or no other options are available, come on over to bkrpr.org and see how easy it is.

Posted in Uncategorized | Comments closed

The paper analog

Yesterday’s EFF deeplinks blog picked up our DRM exploit post from March and linked it with a paper from Microsoft security engineers arguing that DRM is doomed to failure.

Now might be a good time take another look at just how big this analog hole is, since, however big it is, Amazon’s new 9.7″ screen is about to make it a lot bigger.

Posted in In the News | Comments closed

Building in Parallel

We’ve talked about other efforts to digitize books before but now things are getting a lot closer to home.

Yesterday, a group of three grad students posted an instructable on how to build a book scanning device using a similar V-shaped cradle, the same camera model, and for about the same price as our design. They even wrote free software to do the image processing! It looks like our two efforts have a lot to offer each other and there is some great interest on both sides in pooling our knowledge to come up with a best of both worlds design, so there should be some great new stuff once the relevant grad schools are done with finals.

Posted in Uncategorized | Comments closed

6 Ways to Save Publishing

Michael Tamblyn, the CEO of BookNet Canada, presents 6 ways technology can improve the world of paper book publishing. For me, the most interesting tidbit in that video is that 99.5% of Canadian book sales are paper books (I suppose this is as opposed to digital or audio books). And if you’re interested in how books publishers survive as that percentage decreases, there are some good thoughts in this presentation.

Here’s the 6 ideas:

  1. The ability to pull metadata from the cloud
  2. More and better markup. “An XML workflow that doesn’t suck”
  3. Sexy mobile reading devices
  4. Smarter, individualized comprehensive sales information in catalogs
  5. Online browsing that makes you want to buy
  6. Embrace startup culture

And the video:

Via Boing2

Posted in Uncategorized | Comments closed

Grouping

We’ve got a google group!

Now that a number of people are in the process of building their own book rippers we need a place where we can pull together the multiple discussions we’ve been having over email and in other forums. The google group should do that very nicely. Come on by.

For those of you keeping track at home, we went with a google group rather than focus on the mailing list capabilities built in to launchpad, our code host, for a number of reasons mostly relating to the fact that everything from the mailing list itself to whether or how you can use OpenID accounts (like our wiki) is impossible to find when using launchpad. Google makes the important things the easiest to find.

Posted in Uncategorized | Comments closed

Universal e-book DRM exploit discovered!

Sort of.

There is a common joke in e-book and book digitization circles that paper is the original, and most effective, DRM available for books. This is mostly true, though anyone with a sheet-fed scanner quickly learns that it is a book’s binding that stops it from being easily digitized, not its paper. Yet, wherever things currently stand in the race between publishers releasing things in restricted formats and the public breaking those formats open to get at the tasty text inside, we can be sure of this, getting around the DRM on an e-book will never be more difficult than treating it like paper and photographing it normally.

When I tried just that using my book ripping camera mount I found that the OCR rates were almost identical vs the matching paper book, the page turning effort was reduced to almost nothing, and the speed was a comfortable ~750 pages/hour, where a “page” is the screen size of your e-reader.

So to anyone out there working on e-book DRM: I’ve got enough paper books (with bindings!) to scan already. Why not spend your effort on something more productive? Because this paper-DRM is only going to get even worse.

Posted in Uncategorized | Comments closed