Books have the same enemies as people: fire, humidity, animals, weather, big business and their own content.
April 23, 2009 7:27 AM   Subscribe

Build a DIY non destructive book scanner for under $300. An open source OCR package. A gratis ebook creation tool. An open source ebook library management tool and reader. An open-source Linux distribution for eink-based devices. And many, many ebook readers.
posted by bigmusic (84 comments total) 72 users marked this as a favorite

 
The DIY book scanner is by "metafilter's own" fake.
posted by bigmusic at 7:29 AM on April 23, 2009


This is the second text related item of note I've seen today. Awesome.

The first was this.
posted by oddman at 7:49 AM on April 23, 2009 [1 favorite]


The DIY book scanner is by "metafilter's own" fake.
for real? in that case, my friend, the scanner should be marketed under the name genuine faux.
posted by the aloha at 8:27 AM on April 23, 2009


But you have to turn the pages by hand? Aww man...
posted by Eideteker at 8:28 AM on April 23, 2009


This post is incomplete without quality text-to-speech.
posted by DU at 8:29 AM on April 23, 2009


It's a nifty system, the book scanner -- but I want to see the homebrew robotic version. According to its inventor, automatic page turning is not significantly faster (vid) than manually turning pages. Perhaps true, but the automation part means you don't need to stand over the thing flipping pages until your entire book is scanned.
posted by ubermuffin at 8:31 AM on April 23, 2009


In related book-meets-technology news:
Finnish book rental service Bookabooka is being threatened by national copyright lobby organization TTVK for running a service the lobby group calls "Pirate Bay for textbooks".

Bookabooka doesn't host any e-books on its site, but instead allows students to rent their textbooks to their peers. Renting is conducted via traditional "snailmail" (i.e. postal service) and it is mandatory that the textbooks are originals (not xeroxed copies). Bookabooka acts only as an intermediate, connecting the students together and doesn't handle the shipping or returns of the textbooks.
I hope they don't decide to target BookMooch next. I'd rather my $5/mo donation not be gobbled up by a legal defense fund.
posted by odinsdream at 8:40 AM on April 23, 2009


I'm more interested in seeing a working version of the utterly destructive book scanner depicted in Rainbow's End.
posted by CaseyB at 8:42 AM on April 23, 2009


I recently experimented with kindle-izing some antique juvenile fiction books by just putting them on the kitchen floor, holding the pages open with one hand, and camera in the other hand, on min aperture, with flash. OCR software today can do amazing things.
posted by nomisxid at 8:55 AM on April 23, 2009


Or you could get one of these and save $50 and a lot of time.
posted by Toekneesan at 9:00 AM on April 23, 2009


Toekneesan, that product would involve actual scanning and waiting for the scan. By the time you'd scan 10 pages with that machine, the DIY scanner would have done probably 5x that many.
posted by bigmusic at 9:05 AM on April 23, 2009


Thanks for the mention, bigmusic. It is cool to see this on the front page, and your post title is absolutely killer.

I'd like to make an offer -- if any MetaFilter users have a rare, interesting, public domain or out-of-copyright book they'd like scanned and shared back here -- send me a message.

Jonson and I did that once (but he scanned the book), and it resulted in a lot of great discussion. I wouldn't mind doing it again with my new system.

If anybody has any questions about the scanner or plans to build one, please let me know. This is actually the second one I've built, and I just delivered it to Aaron, the guy who wrote the PDF software for it. Seeing it set up and running in his apartment was just cool. Several other people are building them now and we're rushing to improve the software.

Re:OpticBook, I have used one, and have several friends with them. They do not compare in speed, and the software totally sucks. If you think standing in front of this thing for 10-25 minutes is bad, well, try 3 hours bent over that thing. You can build my scanner in a few evenings, especially if you're not totally broke (as I am) and you don't need to hunt for materials.

I'm looking forward to a future full of free books. My first big project is scanning books that help you make book scanners -- first some books on basic hand tools, next up will be power tools, and then basic electronics and cameras... I'm halfway through a stack of ten.
posted by fake at 9:12 AM on April 23, 2009 [3 favorites]


PS. Is that title yours, or did it come from somewhere? I think I want it on a banner over my scanner. ;)
posted by fake at 9:13 AM on April 23, 2009 [1 favorite]


Thank you bigmusic, this is an awesome post. I've only recently gotten into ebooks and have been looking for tools to make my own. Very timely!
posted by gofargogo at 9:16 AM on April 23, 2009


It's a corrupted quote.
posted by bigmusic at 9:17 AM on April 23, 2009


I forwarded the DIY scanner link to my library director only half jokingly saying that it would make a good summer project here at the library. We're going Kindle and eReader crazy here after all, so...
posted by robocop is bleeding at 9:23 AM on April 23, 2009


How in the Hell could an ebook reader NOT support .PDF in 2009?
posted by ZenMasterThis at 9:26 AM on April 23, 2009


Can we rebind some of those ebook readers in something other than black, white, or gray plastic?
posted by gyusan at 9:38 AM on April 23, 2009


Re: PDF's - the problem w/ ereaders and PDF support is that PDF's were intended to display things as they would be printed on a page. With ebook readers, the text needed to be reflowable (which was recently implemented by adobe for pdfs) so that the text could be resized easily in the readers. PDF's general have each line of text in a "div" so that it is it's own enity which makes "reflowing" a PDF problematic. But there are a few ebook readers out now that do have PDF support, but it's still problematic on most of them because margins are maintained and they (those that implement pdf support in ebook readers) haven't seemed to figure out that margins should be really tiny automatically when reading w/ an ebook reader - or they haven't figured out how to do it easily.
posted by bigmusic at 9:41 AM on April 23, 2009


You don't necessarily want pdfs on an e-book reader. Most pdf files are not not reflowable which make it's hard to to display them on the small screens available on readers.

Wonderful post! I've got a Sony PRS-505 and I was just wishing
posted by rdr at 9:42 AM on April 23, 2009



How in the Hell could an ebook reader NOT support .PDF in 2009?

It's a SONY...
posted by yoyo_nyc at 9:45 AM on April 23, 2009


So maybe I'm just being stupid, but where's the software that post-processes the images from the cameras? Is that one of the things eCub does? It seems to just go from XHTML to an ebook format, and Tesseract is an OCR engine.

The output from the scanner is going to be two digital cameras' worth of files; one with all the left pages and one with all the right. So at some point you need to take those files and get them into the correct order, and then assemble them into an image-PDF or send them for OCRing. While I guess that's not really hard (you could do it with some shellscripts easily) I'm curious how it was accomplished.

Anyway, really awesome project. I thought the way the bookholder slides horizontally, with the platen sort of pushing it and keeping the open pages in the same position, was very clever. (Is that a novel solution or is it how the commercial scanners work? I've never seen it done that way before.)
posted by Kadin2048 at 9:54 AM on April 23, 2009


Yeah, I sold my PRS-500 because the software was so bad. I am still hoping someone will recognize the huge academic market for an e-reader that can display A4/letter paper and has a stylus. I ended up with an old Toshiba M200 tablet. Not perfect, but usable until epaper grows up.
posted by fake at 9:55 AM on April 23, 2009


Hey Kadin2048, it's on step 78. Aaron wrote it in Matlab. You'll find it at the end of the instructable, in part because it's alpha-quality right now, and in part because it's really only useful if you have built one of these things. We include source code. We need help if you're good with that kind of thing.

If anyone wants to read the whole thing, I highly recommend getting an instructables account and downloading the PDF, it just makes things so much easier.

I'm not sure how other scanners handle the center shifting problem, but it wouldn't surprise me at all if this wasn't a new idea.
posted by fake at 9:59 AM on April 23, 2009


Great post.

The Sony PRS-505 does PDFs fine, albeit with somewhat limited screen resolution and slow page redraws. I use mine with ReaderPlates which I think a great application, because the paper equivalent would be much more expensive and massive, and the page doesn't need to be redrawn often.
posted by exogenous at 10:58 AM on April 23, 2009


I'd really like to do a non-destructive scan of one of my favorite old book finds: Obstetrics and Womanly Beauty (blue binding, if you're curious), a late-1800's book covering marital advice, home childbirth instructions, use of cosmetics, and related subjects. I'm not sure what the relevant copyright expiry is, but it can't be far off if it's 150 years or less.

I've considered using a service (I probably first saw on the blue) that offers non-destructive scanning and conversion to ebook, but it ran around $250-$300 IIRC, and I have yet to have that kind of spare money available for a pet project.
posted by notashroom at 11:03 AM on April 23, 2009


Awesome post. I'll post a fledgling project for interested parties that may not have seen it yet:

Bkrpr
It's in python, so it's cross-platform (although very early alpha now), using Tesseract (and maybe Ocropus in the future).
It's just getting off the ground now, and I'll hopefully be contributing some code to it in the future, but they have a very cool little take on the hardware aspect of scanning. I've recently built my own scanner similar to fake's, after my Opticbook crapped out on me. (Do not buy an Opticbook. The software is terrible and the bulb burnt out on mine after 1.5 years. You can buy 2 8MP new point and shoots for less than the Opticbook and you'll be much much faster). I'm now scanning books at ~150 pages an hour, which is alot faster than the Opticbook.
posted by i_am_a_Jedi at 11:14 AM on April 23, 2009


This post is incomplete without quality text-to-speech.

There's a wiki page for that. Want to know more of the long history of speech synthesis? Thre's a wiki page for that. More interested in screen readers? There's a wiki page for that, too.

Take THAT, iPhone application ads!
posted by filthy light thief at 11:21 AM on April 23, 2009


notashroom, if you will trust me with your book, I will scan it for you at no cost, as long as you don't mind posting it here. ;)

give us a month or two to get the software perfected.

i_am_a_Jedi, do you have any pics of your device? I'd love to see your innovations. Thanks for mentioning the bkrpr guys, they seem a great bunch and have done some nice work.
posted by fake at 11:59 AM on April 23, 2009


> I am still hoping someone will recognize the huge academic market for an e-reader that can display A4/letter paper and has a stylus.

PlasticLogic have a roughly A4-sized reader coming out in "late 2009". It was originally scheduled for "early 2009", so beware slippage. It's targeted at the business market - promised support for .pdf, .doc and many other shiny features.

The Irex Iliad has quite a big (touch)screen and, I've heard, handles complex .pdfs well. It's pretty expensive though - around £400 in the UK, equivalent to a cheap or midrange laptop.

You're right about the demand - I've skimmed through the mobileread fora a few tims and there's a constant trickle of people popping up to ask about support for image- and table-heavy A4 pdfs. I'm writing my thesis at the moment, and the number of papers I've printed out to take home, to conferences or just to take a rest from the screen is comical.
posted by metaBugs at 12:01 PM on April 23, 2009


fake, I definitely intend to share the book on the blue (as well as the internet as a whole), as long as I'm not subject to some sort of copyright violation lawsuit. I appreciate your generous offer, and have sent you a MeMail. Thanks.
posted by notashroom at 1:41 PM on April 23, 2009


i_am_a_Jedi, do you have any pics of your device?

Not with me, but if you look at the bkrpr guy's implementation (the 90 degree angle sheet of plexi with the cameras mounted to it), mine is similar. I made a wooden inverted V-shaped frame from wood, and mounted two sheets of lexan (polycarb) to the underneath, and then the cameras are mounted onto the wooden frame (similar to the bkrpr guys). Of course, there is an overhead light like in your implementation. I've just been using plain lightbulbs, but I'm going to have to investigate using halogen bulbs (thanks for the tips!). I like mounting the cameras to the moving frame, because it ensures that the distance to the page is the same as you traverse the book. And also I have a small 90 deg book stand that the book sits on during scanning. I just take pictures right onto SD cards in the cameras and dump those to the computer to do OCR.
posted by i_am_a_Jedi at 1:49 PM on April 23, 2009


The instructable for fake's app leaves out any discussion of it, but I'm gathering that after all the shooting there's some process whereby the images are taken out of the cameras and put onto a PC.

I dunno if it matters to you, fake, but you might consider using gphoto2 to fire off the cameras. You'd remove the need to build a triggering gadget (and presumably the need to use Canons and that specific BIOS replacement) if you just had a little script that fired the two cameras directly. gphoto2 has a pretty large quantity of supported cameras too.

The command I use in my photobooth project builds a filename that uses the date [sprintf("%02d%02d%04d%02d%02d%02d", localtime->mon+1, localtime->mday-1, localtime->year+1900, localtime->hour, localtime->min, localtime->sec);], but you could just as easily increment a name.

gphoto2 --capture-image --frames=1 --interval=1 --filename filenamehere.jpg

(on the ver of gphoto2 I am using there's a bug that if you don't specify a frame and interval it won't let you specify a filename)

Alternately gphoto2 will work in tethered shooting mode, so you could trigger the cameras another way and get the images auto-downloaded but you couldn't use the usb trigger then...
posted by phearlez at 2:26 PM on April 23, 2009


I've just been using plain lightbulbs, but I'm going to have to investigate using halogen bulbs (thanks for the tips!).

Home Depot sells some insanely cheap halogen worklights for $5. They're 250 watt and about as big as a box of Pop-tarts. It's pretty simple to remove the wire anti-flaming-death guards if they throw shadows.
posted by phearlez at 2:29 PM on April 23, 2009


phearlez, at the moment I simply copy all the images from the left camera into a folder called "left" and all the images from the right camera into a folder called "right". I think I explained it in the Page Builder video, but that video desperately needs to be done again. I was having real problems with my screen capture software and I was rushing to meet the contest deadlines. please vote!

I'm on the gphoto mailinglist for just this sort of thing. For the moment, I like the ease (and speed) of stereodatamaker+a trigger, but I can see the real elegance in your approach (other than that it requires adding a PC to the whole mechanism). The speed of transfer over USB does worry me, but I might be wrong about that. I'll take it into account for the next version of the scanner.

Re:halogens, get the outdoor ones with the little bulbous lenses. They make a great diffuse lighting pattern in a short distance. Four of them would be ideal, but two are not bad with a little photometric correction (invert a blank page and add it back to a normal page, and you'll remove any lighting effects -- we do it in our little proggy). We'll be correcting lens distortion later.
posted by fake at 2:44 PM on April 23, 2009


& I do really love the idea of not relying on Canon hardware...
posted by fake at 2:48 PM on April 23, 2009


> I am still hoping someone will recognize the huge academic market for an e-reader that can display A4/letter paper and has a stylus.

The Irex Digital Reader is pretty close. Not cheap though. 10.2" screen (compared to 6" on most other e-ink devices) and a stylus.
posted by markr at 5:11 PM on April 23, 2009


Since my children's welfare is somewhat dependent on publishing, I'm not sure I think a machine whose purpose is large-scale pirating is such a cool thing. Certainly an admirable engineering challenge well conquered, but good for the culture or my kids?

That said, I'm also not sure how practical this is. If you want a book digitally and want to make your own copy, do you really need to take up so much space, time, and effort? Wouldn't the Opticbook I linked to above do the trick. Sure the software it comes with is a little hinky, but there are other options for the output. Low cost and Open Source options. If you know a little scripting, you could train a monkey to do it. And at least you get to sit down when you use it.

Anyway, sorry I can't jump on the "awesome" band wagon. While it is really neat, why don't you donate it to a library or not-for-profit publisher who could then use it to help get their own content on the market or into circulation digitally. Remember, tons of books aren't available, not because publishers are greedy idiots, but because they were produced in an era that composed pages—edited, designed, typeset, and proofed—using actual paper. Most American book content isn't owned by its publisher in a digital format. Google owns a lot of it, but the publishers themselves have only been creating books digitally for about 15 years. Everything before that going back to Gutenberg is only available if the markets and the rights owners, including the third-party rights owners, properly align.

Sure, I understand the impulse to digitize this content, but remember these actions have consequences. And creating the means to do it on a massive scale is revolutionary, but revolutions, like any war, always have greater consequences than intended.
posted by Toekneesan at 7:05 PM on April 23, 2009


Umm, did you read my Instructable or watch the video? My point is exactly to move old, unprofitable, public domain, and not-commercially viable books into the digital domain. Not to mention personal "books" like journals, photo albums, and family recipe books. There exists an incredible mass of books that are pre-digital and pre-copyright (or never were copyrighted, as in the case of the books I offered in my instructable), and they are exactly our cultural heritage.

Maybe you're comfortable with Google being in charge of all that, but I'm not.

As for donating it to a library, that's just asinine. I'm a graduate student. $300 bucks is about a third of my take-home. I provided complete instructions for anyone to build one, and free software (currently being updated) to process the output. If that's not enough for you, well, we have different ideas.

If you're writing books now, which I assume is why you keep mentioning kids, your books have nothing to do with my scanner. They are already electronic and it's up to you to release them electronically. I could care less about that kind of book -- I'm interested in the vast spread of knowledge that's already been created and is just waiting to be set free. Commercial desktop scanners suck for that purpose, and if you'd know this if you ever tried to use one, especially the one you linked.
posted by fake at 10:51 PM on April 23, 2009 [1 favorite]


I work for a publisher. We publish scholarship. What your video suggests, and what the equipment might be used for aren't going to be the same thing and it's disingenuous to suggest it will only be used for "good". Your call to fix publishing's broken business model seems to imply something very different. You can't stop folks from using it on commerically viable content. The first book of ours I found on Scribd was our best selling title. It makes sense that those will be pirated first. The reason folks think this is so cool isn't because they have a desire to digitize mass quantities of family albums and recipe books.

No, I don't want Google to have this content exclusively. Which is why I'd love to see this machine in my university's library. I don't think that's asinine. I think that's an appropriate application. Why didn't you build this for the North Dakota State library? They might have then legally been able to digitize those damaged books you found in a dumpster. Sure, it wouldn't have been on Reddit, Digg, BoingBoing, or Metafilter, but it would have been much better for the future of the books you're concerned about.
posted by Toekneesan at 3:24 AM on April 24, 2009


I gave them, and everyone, including you, the exact plans to build their own. If they want one, $300-500 is nothing for them. They spent that much money just putting books in the dumpster.

I built a scanner for myself (and one for a friend), and I am entitled to it, and your suggestion that I give it away and not post the plans online is weird. These plans are for everyone, and I'm glad and humbled that they were posted in those places, so lots of people become aware that they can build and use them.

Of course there are infringing uses for my scanner. There are also infringing uses for your computer, and for the scanner you linked. Powerful tools can be used for good or for awesome. I am sorry to hear your book was copied, but I doubt it was with a scanner.

My lab pays per-page prices to publish in Open Access journals to keep our research/academic output free and open. Sounds like we're on opposite sides about a lot of things.
posted by fake at 5:04 AM on April 24, 2009 [3 favorites]


Give a library a scanner, and one library will have a scanner. Post scanner-making instructions, and libraries everywhere can make them?

Anyway, any scanner can be used for good or evil, whether it's commercial or DIY. And many people who scan public domain books privately do then post them online for public accessibility.

Regarding piracy, I wonder what would happen if you posted one of your less popular books on scribd. Would you find an effect on sales, and would it be positive or negative? This isn't to snark - I am honestly curious. (I would be surprised if many publishers carried out that experiment, though.)

fake, thanks for the instructions! I thought they could have been laid out more clearly: all the information was there, but spread out over many pages and so harder to follow. (I'm only commenting because you seem to care about presentation. I did read the PDF.) Cool project.
posted by mail at 5:15 AM on April 24, 2009


That'll teach me to preview. The first part of my comment was to Toekneesan (nice name, by the way).
posted by mail at 5:17 AM on April 24, 2009


Give a library a scanner, and one library will have a scanner. Post scanner-making instructions, and libraries everywhere can make them?

:D

Mail, your criticism is spot-on. The Instructables interface leaves a lot to be desired. Enough people have commented on it that I'm seriously thinking about making my own page with full-size images (they were authored at 1200x800), broken down in easy to read sections instead of picture-by-picture bits.

Think it's worth it? I won't have time to do it until mid-May (when the semester ends), but I hate leaving things in such an ugly state.
posted by fake at 5:30 AM on April 24, 2009


We do offer some of our books on the Web, totally for free. This series on Romance Language studies is available for download using an Open Access platform that we're working with Cornell to develop. You can buy a physical copy if you find that easier to read. I also offer at least one free chapter for about 400 of our books on our Web site. Again, I don't think I have a problem with the existence of the machine or the plans, I am concerned about your call to use it to fix publishing's broken business model. I'm pretty certain your call to action won't benefit books in the long run.

I also think it's not really practical to think that a librarian has the same skill sets you do, or the desire to dumpster dive for parts. I don't think these plans were really designed with libraries or librarians in mind. I think it's pretty clear they we're created with pirating in mind.
posted by Toekneesan at 6:32 AM on April 24, 2009


I don't think these plans were really designed with libraries or librarians in mind. I think it's pretty clear they we're created with pirating in mind.

I think it's pretty clear these were designed with preservation of old, private, and non-copyrighted materials in mind. I saw where fake said something along those lines at least 3 or 4 times in here, and altogether missed the part where he cackled in glee about his plan to undermine the entire publishing industry with his DIY scanner plans.

I know there are people out there who scan copyrighted materials and make them available to others, usually online, and that the publishing industry's position is that downloaded copies represent lost revenue.

That does not mean that fake's scanner is any more particularly suited to that end than any other scanner, such as the one you linked, which is utterly useless for projects such as mine when the materials are delicate and can't be simply plopped on a flatbed. Flatbed scanners are much more practical for scanning mass-market, recently-published material (in which bent spines, removed covers, bent pages are at worst inconvenient) -- the exact material most publishers are most concerned with fighting piracy of.
posted by notashroom at 6:57 AM on April 24, 2009 [1 favorite]


Fast forward to the end of the video where he advocates people putting their books on the internet or envisions a world where publishers are forced to rethink "their obviously broken business model".
posted by Toekneesan at 7:11 AM on April 24, 2009


"I also think it's not really practical to think that a librarian has the same skill sets you do, or the desire to dumpster dive for parts. I don't think these plans were really designed with libraries or librarians in mind. "

They were designed with everyone in mind. While a specific librarian may not have the skills I'd bet many do. And those that don't could hire someone. Does North Dakota State not have an engineering department? Universities and colleges tend to have excellent shop facilities and lots of students hanging around eager for extra credit.

"Fast forward to the end of the video where he advocates people putting their books on the internet or envisions a world where publishers are forced to rethink 'their obviously broken business model'."

This shouldn't be a newsflash but the current business publishing model of book distribution _is_ broken. As much as the telegraph.
posted by Mitheral at 8:47 AM on April 24, 2009


"their obviously broken business model".

I don't know if you remember, but a few years ago digital audio started coming on the scene and people were sharing their ripped CD's on P2P networks. So the music companies said "Oh, no" and when they started selling digital audio they loaded it up with DRM, rights restrictions, and licensing agreements. And people didn't like that, so not very much of music was sold that way. But the pioneers - emusic and many many independent record companies and little sites like bleep.com - they knew what people really wanted - they wanted to be able to listen their music on any device they wanted - that meant no DRM. And they sold music, and they did wonderfully. And now, every other place that sells digital audio online is getting it too. They are finally selling what the customer wants.

This is happening in publishing too, you know those pioneers? We have those too in the publishing realm - O' Reilly is a big one. But there are many others who sell ebooks without DRM. Many authors are publishing their own books and selling on their own website sans publisher. So is the model broken? Certainly. Companies are still selling things that people DON'T WANT. And in Amazon's case, by in large they are only selling ebooks in kindle format. What about everyone else's ereader? Screw em. How's that for a business model? A little broken?

Projects like this, they are going to only hurry us along the path to ebooks free of DRM. And that, my friend, is a very good thing.
posted by bigmusic at 8:52 AM on April 24, 2009


Why on earth would someone use this scanner on a physical version of a book that's already available in e-form? It's trivial to crack the encryption on .mobi/.azw files.
posted by nomisxid at 8:53 AM on April 24, 2009


You can't buy ebooks from amazon unless you have a kindle.
posted by bigmusic at 9:00 AM on April 24, 2009


Companies are still selling things that people DON'T WANT.

People usually don't buy things they don't want. If companies are indeed selling them, they are probably selling them to people who want them.

Look, you're barking up the wrong tree. I don't need a lecture about the music industry. I've been selling books for over thirty years and I'm familiar with the issues. I write a blog about them. I just find this presentation of plans for a book scanner for the proletariat a bit disingenuous. Gestures are made about this being for legal purposes, and then there's a sly turn of the head and a not so subtle wink. The video showed a dumpster full of scholarly monographs, and then made the claim that you could do something virtuous by digitizing them. Scholarly monographs are what I make and sell for a living. Many of the books in that dumpster were probably in copyright and published by a non-profit university press. I don't think you are doing scholarship any favors making your case for digitizing books this way. If you're not disseminating scholarship in a sustainable way, you are hurting the ability of non-profit publishers of scholarship to do their work. It might seem like a good idea, but it's not.

Did you know that the North Dakota State U. library could (and may have already) digitize those books completely legally and could then print replacement copies or even share them digitally with your university community? Copyright law actually contains such a provision. But that same right doesn't apply to you as an individual. If there is a fatal flaw in our business model, it's not greed, or ignorance of what happened to music, it's that we are working in a system that assumes everyone will obey the law and that everyone will respect the work of others. Besides anarchy, can you suggest a way to produce future scholarship that allows for Open Access? A system that includes vetting, editing, design, and distribution?
posted by Toekneesan at 10:39 AM on April 24, 2009


People usually don't buy things they don't want. If companies are indeed selling them, they are probably selling them to people who want them.

Complete bull. I don't have a choice about the textbooks I must buy (and do buy). Your argument ignores the relationship between academic publishers and university bookstores.

And sales aren't even relevant to my scanner. I'm talking about books the publishing industry won't even consider reprinting.

Did you know that the North Dakota State U. library could (and may have already) digitize those books completely legally and could then print replacement copies or even share them digitally with your university community?

I was on a committee that discussed this. They're not doing it because commercial scanners cost in excess of 10k, and that's before you pay "a monkey" (here I can only assume you mean a student) to operate them.

It's that we are working in a system that assumes everyone will obey the law


No, it is clear from your arguments and the way your industry treats people -- especially students -- that you expect them to act like criminals. You are not assuming the best.

You outright refuse all legitimate uses of my book scanner in the face of evidence. I've already provided public domain books with it. You expect that I am a criminal outright, and that, too, is weird and disingenuous.
posted by fake at 11:21 AM on April 24, 2009


the end of the video also pictures me working (with others) in Valley City, ND to save an entire basement full of books from an oncoming flood.
posted by fake at 11:23 AM on April 24, 2009


I just find this presentation of plans for a book scanner for the proletariat a bit disingenuous. Gestures are made about this being for legal purposes, and then there's a sly turn of the head and a not so subtle wink.

So in your opinion, if I scanned my thousands of books for personal use you would have problems with that?
posted by bigmusic at 11:24 AM on April 24, 2009


fake: "Complete bull. I don't have a choice about the textbooks I must buy (and do buy). Your argument ignores the relationship between academic publishers and university bookstores.

Doesn't the library offer course reserves? Are there no opportunities to buy or trade used? I agree there are problems with textbooks, but wouldn't scanning them be illegal?

And sales aren't even relevant to my scanner. I'm talking about books the publishing industry won't even consider reprinting.

How do you know why a book hasn't been reprinted? Is it possible that there are other reasons, like third party rights, that prevent it? You ascribe these things to intentional choices the publisher is making, but there could be reasons you are not aware of. If you want to help me bring the 300 or so art books of ours that are out of print and in copyright back into print, work to reform fair use. Those books are OP because the folks who own the rights to the images in the book either won't give us digital or reprint rights, or the cost would make the book so expensive no one could afford it. If there was a clear exemption for the use of images in published scholarship from non-profits, those books would be available again.

I was on a committee that discussed this. They're not doing it because commercial scanners cost in excess of 10k, and that's before you pay "a monkey" (here I can only assume you mean a student) to operate them.

Actually, the monkey is me. I own an OpticBook. And if you were on the committee, why didn't you suggest they roll their own scanner?

You outright refuse all legitimate uses of my book scanner in the face of evidence. I've already provided public domain books with it. You expect that I am a criminal outright, and that, too, is weird and disingenuous."

I don't think I'm refusing to believe there are legitimate uses, I think I'm trying to point our that the rhetoric in your movie is attractive to sites like BoingBoing and Metafilter because it implicitly encourages book pirating.

bigmusicPoster: "So in your opinion, if I scanned my thousands of books for personal use you would have problems with that?"

It doesn't matter if I personally do or don't have problems with it. My business models are based on the parameters of the law. While scanning your own books seems to be free of ethical issues, in most cases, it won't be legal. Electronic rights belong to the author or publisher. The law does not transfer them to you with the acquisition of a physical book.
posted by Toekneesan at 12:46 PM on April 24, 2009


I own an OpticBook.

A scanner deliberately designed to... scan books. Imagine that.

Why didn't you suggest they roll their own scanner?

My book scanner did not exist last semester.

My business models are based on the parameters of the law.

The law is bent to the parameters of your business model. The law does not make right, only legal and illegal.
posted by fake at 1:13 PM on April 24, 2009 [1 favorite]


While scanning your own books seems to be free of ethical issues, in most cases, it won't be legal From my understanding, it is perfectly legal as it is fair use.
posted by bigmusic at 1:20 PM on April 24, 2009


bigmusicPoster: " From my understanding, it is perfectly legal as it is fair use."

You're probably mistaken. Fair use is one of those legal concepts where it's hard to tell until you're in front of a judge, but I'm pretty sure scanning every page of all of your books, without having a disability, or being the author of all those books, would be considered copyright infringement on a massive scale.

Take a look at that "c." factor in part three of the EFF's guide. Still think it's fair use?

fake: "A scanner deliberately designed to... scan books. Imagine that."

Yes, I've scanned from books, and I've scanned whole books, but I did it mindful of the law and mindful of the work of others.

Not all publishers are alike. Some aren't all that concerned about the role books play in developing culture and advancing knowledge, and that's very unfortunate. But you can't equate books to music, and labels to publishers. We need to navigate this together. Part of that means keeping the rhetoric in check. That's been my point. The rhetoric in that video assumes all publishers are SONY/DISNEY and that ripping a book is the equivalent to ripping a CD. It's not. There's a bit more at stake.
posted by Toekneesan at 2:34 PM on April 24, 2009


Take a look at that "c." factor in part three of the EFF's guide. Still think it's fair use?

Yep, I do. Look at #4.
posted by bigmusic at 2:41 PM on April 24, 2009


Also fair use is evolving, on # 3 of the EFF guide they mention if it's used for commercial purposes as a determining factor to whether or not something is fair use and if the fair user uses a substantial portion of the material of the work. On these notes, the EFF guide is out of date (just by a week =) Have a look at the Turn It In case.
posted by bigmusic at 2:49 PM on April 24, 2009


It seems there is rhetoric on both sides -- I see loaded words like "my children's welfare" "book scanner for the proletariat" and "anarchy". Looks like it's personal for the both of us.
posted by fake at 3:48 PM on April 24, 2009


Agreed.
posted by Toekneesan at 3:55 PM on April 24, 2009


"I'm pretty sure scanning every page of all of your books, without having a disability, or being the author of all those books, would be considered copyright infringement on a massive scale."

Didn't the Betamax case pretty well determine that time shifting of material you own for your own use is A-OK? And then the Rio case supported the thought that this also applied to media shifting?
posted by Mitheral at 3:57 PM on April 24, 2009


Look bigmusic, anybody, do us all a favor and copy your entire library on to your computer, and tell the AAP and the Author's Guild about it. Then, unlike Google, see it through the courts. There's a lot of precedent against what you describe being fair use. With Google and Turnitin, there's a pretty legitimate argument for their copying being transformative. That's not what you are describing. If you really think it is, see it through and I promise I will change my model immediately and accordingly. Ask the EFF for help, I'd bet they'd handle the case pro bono. If you do that, a big part of the confusion about the future of publishing will be resolved. Until that's resolved, I'm not sure what to do.
posted by Toekneesan at 4:15 PM on April 24, 2009


As for media or space shifting, is there a place or time you can read a computer that you can't read a book? My guess is that will be the court's test.
posted by Toekneesan at 4:33 PM on April 24, 2009


"As for media or space shifting, is there a place or time you can read a computer that you can't read a book?"

You can't grep a dead tree.

Be nice to be able to take my library on vacation.
posted by Mitheral at 5:21 PM on April 24, 2009


Our books come with indexes.
posted by Toekneesan at 5:31 PM on April 24, 2009


and tell the AAP and the Author's Guild about it.

The same paragons of fairness that won't let a blind man listen to his Kindle books?

The right to read what you buy is a default, not a privilege someone grants you. The right to discriminate is what is being granted here, by law and by big business. That will change.

Publishers will eat themselves with or without my book scanner.
posted by fake at 6:01 PM on April 24, 2009 [1 favorite]


Isn't most of the market for scholarly monographs 99% library? I find it unlikely that major universities are going to eschew legit copies, in favor of scans.

Now, if professors stop having to worry about producing papers/books that university presses can sell without going bankrupt, and instead are allowed to do their publish-or-perish in e-form...yeah, your bacon is fried. And rightly so.

Just like Mr Burn's investment in opera hat manufacturing, your time has passed, and the world is not worse for it, just different. Arguably better, since lower cost to market means more people get published, and have to make less compromises for commercial appeal.
posted by nomisxid at 6:48 PM on April 24, 2009


fake, you can live in a civil society or you can choose to live outside of it, but you probably shouldn't pretend you live in it but work against it. That's really of no use to you or the civil society. The right to sustain yourself by your ideas is what's being debated, not the right to discriminate.
posted by Toekneesan at 7:58 PM on April 24, 2009


This thread has taken an interesting turn, with fake held up as a piracy guy and Toekneesan as a total DRM guy. It's kind of strange. Toekneesan, I've read your blog and it seems that you care deeply about books, that you have a very sane approach to the business of publishing and selling books, that you take both writers and readers into account, and that you're quite willing to experiment with publishing models. That you have a project to post books online (for free, even) is another example. (What was the effect on sales?)

Meanwhile, fake as far as I can tell also cares a good deal about books, is committed to preserving them, is himself a "content producer," and is conscientious about how that content is published. He also seems committed to making public domain information publically available (see above, for example, where he offers to scan a book for notashroom on the condition that the scans be posted online). Unrelatedly, he appears to be seriously into recycling and reducing environmental abuse. He's conflated these two concerns in his scanner instructions, but they could certainly stand alone without the environmental part.

Overall, fake's views on publishing and copyright might be more on the "information should be free" end of things than Toekneesan, but you two seem to have way more in common than this thread suggests. Apparently fake's instructions can be read as recklessly pro-piracy and anti-social. But I think that's a misreading, as he's clarified here multiple times. I do agree about being careful with rhetoric.


Not sure why I cared enough to write this.

Anyway, fake, about your question: I think it's definitely worth doing. For now, I'd add a page at the beginning with two things: a full list of parts and tools needed, and a table of contents with page numbers. That way there's a birds-eye view of what to expect.

To reduce extreme misreadings by reasonable people, maybe separating the build instructions from the salvaging instructions would be a good idea. (Personally I didn't find it offputting, but since I physically can't dumpster-dive I did find it distracting.) What do you think about a Part A with straight instructions on how to put the parts together, and a Part B with environment- and pocket-friendly ways of gathering parts in the first place?
posted by mail at 11:15 PM on April 24, 2009


Umm, there is a TOC. Ignore me, I'm tired.
posted by mail at 11:17 PM on April 24, 2009


I think you're right, what it's really missing is a big-picture view. I mean, first I show you the machine working, and then I head straight for the nearest trash bin. ;)

You can kind of see me fighting their interface on the Instructable -- I name each step according to the piece being made or the action being done:

Bookholder:Second Strip.
Bookholder:Screw It.

This was just my best attempt at making sections. They seriously don't let you have more than one image per page that has text under it -- all the others get jammed under the first.

A parts/tools list, PROPER table of contents, and a separate guide to salvaging are all great ideas. Salvaging is thrown in with the rest because of the environmental focus of the site, but I agree completely that for a separate, proper PDF that should be another manual. I'll get to it.
posted by fake at 3:28 AM on April 25, 2009


Just to clarify, I'm not pro-DRM. I am for keeping the 25 people I work with employed. I think what they do is very important to all of us. Editors, designers, book developers, the review process, this is what I think is at stake and worth considering before we advocate pirating books. But I'd like to persuade people against it on moral grounds. Respect the work of others.

The law is a mess here. We absolutely agree. But I think the appropriate response is to change the law, or go to court and help better define it. I don't think flouting the law in this instance is a healthy option for a culture.
posted by Toekneesan at 5:16 AM on April 25, 2009


And I'm not advocating piracy. Be fair.
posted by fake at 6:07 AM on April 25, 2009


And by the way, I've spent some time checking out your blog. I think you've got a lot of things right -- especially "My biggest concern about the future of the publishing industry is pirates. The easiest way to compete with free is to offer a reasonable alternative.". You're right, there needs to be a reasonable alternative. And for most of us, and for much of what we want, there isn't. And there never will be unless we take the responsibility ourselves.
posted by fake at 6:24 AM on April 25, 2009


mail said it best --

We're even both bakers.
posted by fake at 6:30 AM on April 25, 2009


The speed of transfer over USB does worry me, but I might be wrong about that. I'll take it into account for the next version of the scanner.

I don't think you'd have an issue, though in fairness my app doesn't trigger 2 cameras at once. However I am firing a Nikon D70 and the resulting images are 900k. Transfer takes less than 1 second.

What you MIGHT have an issue with, were you to code it up lazy the way I did (I simply use a system call to fire the gphoto2 executable) is that even if you pass it the params to just tell it to fire it still does some initial communication and setup related to the camera's local storage.

I didn't notice this till I hooked it back up for a test with a lot of images stored on the camera's CF card. Suddenly my .5 second initialization was taking more like 6 seconds. Once I formatted the card we were back to quick shots.

If I didn't have to get this working before my wedding next Saturday I'd hack out something that uses libgphoto that was less swiss-army-knife than gphoto2. It's probably less than a day's work for someone who has worked in C more recently than I have.

Alternately using the tethered mode would overcome that initial setup delay. Just the transfer alone would take less time than lifting your platen and flipping the page, I am certain of it.
posted by phearlez at 7:29 AM on April 25, 2009 [1 favorite]


Yeah, I think I secretly like you too.
posted by Toekneesan at 10:01 AM on April 25, 2009


I am a grad student in a technical field. In the last two years of classes I've had to study eight textbooks, six of which were available free on the authors' Web sites. I'm currently teaching an undergrad course which uses two textbooks, both of which are also legitimately free online. The transition to e-books in my field has already happened.

That's why its valuable to archive older, paper-bound works while we can, before the bookstore stops stocking them and the library throws them out due to lack of use.

Most of the academic publishers aren't going to survive either way.
posted by miyabo at 8:13 PM on April 25, 2009


You can use a Macbook Air as a pretty good A4/letter size pdf reader. You can't rotate the screen, (except physically, of course) but using the Skim program, you can rotate the page so that each pdf page occupies a full screen. The Air is light enough so that you can read many pages without fatigue, though just having finished The Kindly Ones(900 pages) I'd note that weight is an issue with some books. The Air is much easier to handle. The Sony U101 portable computer from 2002-2003 tried to accomplish this without much success, but the Air pulls it off. Reading A4/letter-sized non-flowable pdf's is a challenge on the Sony Reader or Kindle-sized screens, but at least for the Reader, there's a program called Rasterfarian that makes such texts accessible. The Plastic Logic device, when it comes out, may be game-changing.
posted by tesseract420 at 1:21 AM on April 30, 2009


Toekneesan: I think the risk of book piracy via scanning is really overblown, except perhaps for graphic novels and some types of technical books. Producing a decent reflowable ebook from a paper copy is really hard. I've been a part of the Distributed Proofreaders project for a couple years, and just going from a rough OCR draft to a finished ebook is a process that can involve thousands of man-hours of work to do right. That amount of work involved bears absolutely no relation to the totally automated process that is ripping a CD or DVD, and it's totally independent of the scanner used to image the book in the beginning.

Where there will be book piracy, and where I'm confident the majority of pirated ebooks will come from, will be the versions released by publishers themselves. E.g. the kind you buy on Amazon. The books most likely to be widely pirated are also the ones most likely to have electronic editions already available.

It's from this latter direction that piracy will come, and in some senses it will be unstoppable; as the music industry has spent millions of dollars proving, DRM doesn't work and just encourages piracy. The best you can do is make the legitimate product cheap, convenient, and fun to buy, and realize that most people will buy a product from a legitimate source if it's not a PITA and the value proposition is right. (And never fall into the trap of thinking that each pirated copy somehow represents a lost sale; not only does it look ridiculous to the public, who can see that it's silly on its face, but it causes industries to mis-allocate resources to piracy prevention rather than product development — digging their hole deeper rather than climbing out.)

So, people in the book industry shouldn't fear scanners, because pirates won't need scanners to get ebooks of the mass-market stuff that represents the bulk of P2P trading: the publishers are going to provide them! But even then, publishers shouldn't let that stop them from selling ebooks (or worse yet, leading them down the false path of DRM and other consumer-hostile policies); the only way they can lose this battle is by refusing to fight and thus surrendering the field to the pirates.
posted by Kadin2048 at 1:07 AM on May 7, 2009


I don't fear scanners. I do get concerned about calls for fixing publishing's broken model by putting scanned books on the Internet, which is what the film on Instructables calls for at the end. It's not the scanner, it's the rhetoric I have trouble with. It seems to ignore the value of the work of others while making generalizations about content creators and disseminators. You would not be doing scholarship a favor by digitizing all those mongraphs in that dumpster and putting them on the Internet, most of which are probably already available at least partially through Google Book. You would instead destroy the ability of the non-profit publishers of many of those books to publish future scholarship. You wouldn't be cutting into our profits, we don't have them. Without viable digital rights, non-profit publishers of scholarship will publish less, moving the production of that scholarship to the commercial sector where they are very likely to respond like the music industry did. That doesn't really fix anything and instead creates a whole new set of problems. You can't point to a dumpster of books and say "digitize these and save the world." It's just not that simple. If you really want to see fewer physical books and more digital books, become a developer on PKP's Open Monograph project. It's a much more sustainable strategy and it doesn't involve a dumpster.

I agree that breaking DRM is probably going to be the preferred method of pirates, and that the best tactic for publishers is to offer a reasonable alternative. Read my blog, linked above, for a better understanding of what I've done about it. I've already found books we've published on p2ps, and while I can't prove they're ebooks with busted DRM, there are a suspicious number of them that do have corresponding editions with one particular ebook vendor, one that is not Amazon, btw.
posted by Toekneesan at 2:30 PM on May 7, 2009


« Older Interested in Soviet era spying by the KGB in the ...  |  The Pulitzer (and Polk)-winnin... Newer »


This thread has been archived and is closed to new comments