How to convert scanned book page to editable PDF?

16 replies

My dad's got an old how-to book he wrote in the late '70s. It's got great content, lots of diagrams and pictures, and we're interested in making it into a PDF to sell online.

I've got Adobe Acrobat Pro 9, and it does OCR just fine -- except that I can't figure out how to change the font of the text! Every page is a combination of images and text, and I basically want to convert each scanned page into an editable document with the same layout of the original document. (The font in the book is a little dated.)

Does anyone have any tips on how to do this -- automatically convert a scanned book page into an easy-to-edit PDF document that keeps all formatting (images, paragraph layout, etc) intact?

C.B. Stewart
#book #convert #editable #page #pdf #scanned
  • Profile picture of the author C.B. Stewart
    Bumping this thread to see if anyone has any suggestions.

    Thanks again!
    {{ DiscussionBoard.errors[1252051].message }}
  • Profile picture of the author Obama
    There are software doing that I tried in the past but nothing automatic; needs lots of manual tweaking. Forgot software name, search google for "convert pdf to word" or something like that.
    {{ DiscussionBoard.errors[1252062].message }}
    • Profile picture of the author bdgbdg
      {{ DiscussionBoard.errors[1252139].message }}
      • Profile picture of the author Dan C. Rinnert
        ABBYY FineReader - Professional OCR Software for Document and PDF Conversion Application

        It claims to be able to do all that. But I don't have any firsthand experience with it.

        They appear to have a trial version though.

        Dan's content is irregularly read by handfuls of people. Join the elite few by reading his blog:, following him on Twitter: or reading his fiction: but NOT by Clicking Here!

        Dan also writes content for hire, but you can't afford him anyway.
        {{ DiscussionBoard.errors[1252202].message }}
        • Profile picture of the author bobsstuff
          Here is some FREEWARE, topOCR at Digital Camera OCR , that I downloaded a while back (a year or more) to convert my typed notes to word documents. I wanted to use my camera instead of scanner to capture the pages.

          I started the project and dropped it so my memory of the product is slipping. However, as I recall it did a pretty good job with documents I copied with my digital camera. There were a number of errors on each page, but a spell checker fixed them fairly quickly. Some words needed cross referenced to the original document because they were pretty messed up.

          My Epson scanner came with software that has a "scan to OCR" feature that does the same thing. I wanted to use a camera because it "scans" pages in a second or two, whereas the scanner takes a lot longer per page.

          Anyway, it's free and worth the try. If you are leery of the website, places like TUCOWS also have it.

          ON EDIT: I just uninstalled my topOCR version 2.6 and installed the newer version 3.1
          Maybe it will be even better now. Maybe I will get back to editing my notes using my camera and OCR (topOCR) OH, for those that might not know, OCR = optical character recognition
          Bob Hale
          {{ DiscussionBoard.errors[1252277].message }}
  • Profile picture of the author Steve Peters Benn
    PDF isn't editable btw, so it would be an idea to OCR it to a word document.
    {{ DiscussionBoard.errors[1252660].message }}
  • Profile picture of the author intromaster
    You can also try Elance. I've seen people posting this type of task frequently.

    It wasn't expensive either.
    PLR99 Announcement Club. Private label rights special offers EXCLUSIVELY to PLR99 club members only. My Own Material. You wont see anywhere else. FREE to join

    {{ DiscussionBoard.errors[1252897].message }}
  • Profile picture of the author tj
    If the pdf is not created from a "image only" source, Adobe offers you the option to save the file as a word file. When you chose this option it saves the file as a word format with the formatting as close as possible. Problem with Adobe is that they save the text in the word file as an object with the text included , that gives you only limited options to change the formatting of the word file if needed.

    {{ DiscussionBoard.errors[1253072].message }}
  • Profile picture of the author jacktackett
    One thing we used to do when OCR programs were first on the scene was to put the document through several of them and then merge the results. Word Perfect handled this well in its day so not sure how word or OoO would today. But its a thought.

    As someone else suggested - you may want to farm this out to be updated- elance, etc or amazon turks - 100 pages? send to 100 turks for 1$ each. Who knows?

    There are plenty of typists available in the back of Writer's Digest still too. You may want to check them out. Same with transcriptionists- though they mostly work from recordings rather than texts.

    Good luck,
    Let's get Tim the kidney he needs!HELP Tim
    Mega Monster WSO for KimW

    {{ DiscussionBoard.errors[1253104].message }}
  • Profile picture of the author mannex
    I've had good experiences with ABBYY PDF Transformer 2.0. It's fairly cheap.

    Couple of secrets.

    1. Use white out tape to white out page headers, footers and page numbers. Reduces the junk in the OCR process.

    2. Do the OCR into a MS Word document. Then, "Select All" the text and "Paste Special" as Unformatted Text into a new Word document. You will lose all the formatting, but the text should be fairly clean and you can apply consistent reformatting in the new document.
    {{ DiscussionBoard.errors[1253752].message }}
    • Profile picture of the author darktemplar
      I use this program called 'Nitro PDF' and it ROCKS. I've been able to do pretty much anything I can imagine with creating, editing, securing, importing and exporting files from and into a PDF format.

      Nitro PDF Software - Create & Edit PDF Files (not an affiliate link :p)
      {{ DiscussionBoard.errors[1253775].message }}
  • Profile picture of the author lumbardi
    {{ DiscussionBoard.errors[1253776].message }}
  • Profile picture of the author C.B. Stewart
    Thank you for the suggestions, everyone!

    I use a Mac running OS X and have Parallels installed so both Mac and PC options work for me.

    What I ended up going with was a newer program for the Mac called Prizmo (Prizmo 1.1 + OCR in 10 languages | Creaceed). It's pretty cool -- I can take quick snaps of each page of the book with a digital camera (takes just about five seconds to get each page) and put them into this program, which then straightens it out, makes it black and white, and has EXTREMELY accurate OCR. It exports as PDF with just text and the formatting in place (so I can just use a second PDF editing program to paste images in), or as PDF with hidden text behind the straightened-out image.

    The program is meant to be a way to use your digital camera as a scanner, and works brilliantly for that purpose. In fact, it's higher quality and faster than using a scanner, and the OCR is super accurate.

    Thanks again for all the suggestions, everyone!
    C.B. Stewart
    {{ DiscussionBoard.errors[1255618].message }}
  • Profile picture of the author John Conner
    I think it should be editable word documents. ABBYY Finereader OCR software is hopefully helpful in converting scanned images into editable format but it is required to proofread. If you don’t want to do yourselves, outsourcing is the best option.
    TranscriptionServicesIndia.Com (TSI) - Low cost, fast and accurate transcription services for interviews, podcasts, webinars, dictations, etc.
    DataExtractionServices.Com - Scraping data from web directories, WebPages, LinkedIn, Yelp, Yell, Amazon, eBay etc.
    {{ DiscussionBoard.errors[7491502].message }}
  • Profile picture of the author tomcam
    1. I've had good luck with the software that comes with my Canon scanners. You can get Canon scanners pretty inexpensively. Just be sure you check the OCR option.
    2. Vuescan is fantastic
    3. Google may have done the job for you. Check Google Books. They've scanned tons of stuff from the 70s and have a stated goal of getting every book ever printed.
    {{ DiscussionBoard.errors[7496202].message }}

Trending Topics