When you love books enough to murder them: 1DollarScan.com

A couple of months ago: Has anyone tried a book scanning service for Kondoization or pre-move preparation?

I’ve got the first batch of results from 1DollarScan now and the scanned pages are in paper recycling Heaven. This is a review of their cheapest possible service: $1/100 pages and no OCR, no enhancement, and no naming of the files. Mailing books to San Jose via USPS Media Mail costs roughly $1 per book. Once the scans come back, you start with a page like this:

I downloaded them, opened each up to see what it was, and then renamed the file using Windows File Explorer. I then let my state-of-the-art PC (circa 2015) do a batch OCR via Adobe Acrobat Pro (a few clicks and then 4 hours of runtime for 28 books). Maybe due to watermarking, the files are epic in size, 70-100 MB for a regular book-type book and 200-300 MB for the technical manuals and cookbooks. A couple of the books hit 400 MB, but were reduced in size somewhat after the OCR process. (If you want to preserve the original quality from 1dollarscan, use the “Searchable Image (Exact)” setting for OCR in Acrobat; that leaves the scans unmolested and just builds the OCR database in a separate data structure behind it.) The Acrobat Pro Preflight tool says that the images within the PDFs are “300 ppi”.

How do the scans look? Here’s a book on bicycle maintenance that I thought would be tricky:

A cheaply printed book with 40 years of acid destroying the pages (how to slice a banana without peeling using needle and thread):

A book where the color is not from the acid:

A Wall Street Journal book on estate planning (timely topic now that the Democrats say they are working to change all of the rules!) in which color is used as a sidenote background:

(If your child lives in Massachusetts, a family court predator can easily take all of the trust assets via child support (over the 23-year period during which child support is ordered, not all in one lump). A parent creating a trust in Nevada or South Dakota cannot necessarily undo the child’s mistake of being exposed to the family courts in other states. Your child can simply be put in prison by the judge (“contempt of court”) unless the amount ordered is paid, either by the child or by the trust and therefore it may not matter, from a practical point of view, what the trust documents say.)

Perhaps the toughest challenge of all, some handwritten bound journals from the late 1980s. Some notes on a finite-state machine implemented in Programmable Array Logic (PALs). I think this is for reading the bits from and syncing the clock to a digital audio output of a CD player.

There wasn’t much they could do about my handwriting, but the scans are sharp and zooming in from the PDF should be just as good for deciphering as having the physical page (not that I have the physical page anymore).

So far, I would say that this service delivered everything promised.

More:

15 thoughts on “When you love books enough to murder them: 1DollarScan.com

  1. Lol, sometimes the internet is a small place. As a kid, I remember reading a book (or magazine … 1950’s Pop Mechanics?) with that exact same banana cutting picture.

  2. Thanks for the follow up on this because I have a friend with a cookbook that was very briefly published by some dedicated people in my town. It’s spiral-bound, about 250 pages and the surviving copies are getting close to dilapidation, so he wants them scanned and OCRed. Now I have something I can recommend to him because the pages aren’t yet as badly oxidized as the sliced banana man’s pages.

    Separately, I saw a story on the internet yesterday about some folks in Alaska who were fined for feeding brown bears against all the regulations of the park they were in. I thought: “My goodness, I learned this in the Boy Scouts. Doesn’t anyone read Boy’s Life any more?”

    Then I thought again about how badly Boy’s Life has declined in terms of the basic literacy needed to understand it over the past several decades. Now it’s all flashy photos and first-grade level reading. Look at this from 1946 and realize that this magazine was published for K-12 boys at the time.

    https://books.google.com/books?id=cyMFz7UJOQQC&lpg=PA28&ots=TZQioscV3z&dq=Don't%20Feed%20Bears%20Boy's%20Life&pg=PA28#v=onepage&q=Don't%20Feed%20Bears%20Boy's%20Life&f=false

    Now “Boy’s Life” is “Scout Life” because there are no Boys anymore. It’s a lot more pictures, many fewer words, and the writing has been dumbed down to match our society.

    https://scoutlife.org/magazine/173824/inside-the-october-2021-issue/

  3. Important reminder of a time when jet engine diagrams were collectible because they couldn’t be downloaded. Having no understanding of thermodynamics, young lions were just fascinated by the artwork.

  4. You don’t say what resolution the scans are or what is the file-size of the products. It would be useful to know.

    • @Bernie: #44 in their FAQ:

      “Books will be scanned at a resolution of 300 dpi in color by Default. You can see the samples of our scanned files on our pricing page here. If you want us to scan with a higher resolution of 600dpi, then please select the High Quality Scanning Option. You can select Black & White, Gray Scale or Full Color scanned at 600dpi.”

      https://1dollarscan.com/faq.php#44

      300 DPI in my professional experience is adequate for viewing and use on another computer. 600 DPI is great if you’re going to reprint, but in all likelihood you will need to do some image work in any case, particularly if you want high-quality OCR. I use Adobe software extensively and – to put it nicely – even at this late date, really good OCR is as much of an art as science. They also offer a high-quality option for OCR.

    • Bernie: I updated the original post. Adobe Acrobat Pro thinks that they’re 300 ppi. The files are huge!

  5. In my experience, running the 300DPI scanned PDFs through ABBYY FineReader OCR will dependably reduce the size of the PDFs by about 90+% (with no noticeable image quality deterioration problem), and FineReader will also allow you to save the OCR results in such non-PDF formats as HTML and EPUB. The HTML files will be 90+% smaller than the OCRed PDFs, and the EPUB files will be another 20+% smaller than the HTML files.
    For example, you might start with a 300DPI scanned 100MB PDF, get a 10MB OCRed PDF with embedded text, a 1MB HTML, and a 0.3MB-to-0.8MB EPUB.
    The HTML and EPUB files will contain all the ‘image’ portions of the 300DPI scanned PDFs as images, but complexly formatted text/numerical tables/super-large title fonts/italics etc. may be somewhat distorted. When books use fancy fonts/formats for chapter headings, those headings sometimes disappear.
    So you might OCR, end up with smaller PDFs with embedded text along with HTML/EPUB files. In the case of a novel etc that is mainly ordinary text, you will probably be fine just using the EPUB file, while in the case of scientific books and other books with many numerical tables you might choose to use only the smaller PDFs with embedded text. But the size of all three files combined will still be less than 10% of the original 300DPI scanned PDFs.

  6. I like that they have an option to receive books directly from booksellers (e.g., Amazon). I’ve bought lots of used books online over the years, and there are still more I would like to get, but my desire to dedicate space to books that I might only want to refer to very rarely is low. This seems like a great option.

    Besides which, I may well send some books that I already have in my possession.

  7. How are you going to organize and store the PDFs?

    I recommend Zotero. I’ve got a 120GB library of over 4000 PDFs, all catalogued using the Dewey Decimal System. I’ve been using Zotero for three years and have never had a serious problem. Their customer support has been excellent on the rare occasions it’s been needed.

    https://en.wikipedia.org/wiki/Zotero

  8. Thanks for the file-size/resolution information.

    I am too near the end of my life to think of making serious use of the options mentioned, but can see some attractions and also limitations.

    • Morris: It took about two months from when I mailed them out to when the PDFs were available. I think they offer faster service at higher prices.

Comments are closed.