Need to create new digital files from paper documents

Hi there, I am new to these fora.

I am newly back to the academic milieu, in a doctoral program in education. I have some old papers of mine that I would like to use as the basis for some publishable articles. Unfortunately, the digital copies (the drives containing the files) of said papers have been lost, leaving just the paper copies. Hoping to avoid having to retype everything, I have decided to explore whether I can create new digital files from the papers through the use of OCR technology. In this way, I hope to be able to rework the originals, changing grammar and adding additional material.

I have discerned several ways that this may (or may not, of course) be achieved. I think that I can achieve this by either: (1) scanning with an OCR capable scanner and then exporting the resulting text file to MSWord as a .docx, then editing and reworking in Word, (2) scanning with a non-OCR capable scanner and opening the resulting image .pdf in Adobe Acrobat pro, and using the Acrobat OCR capability to render the image .pdf searchable and then editing in Acrobat itself, or (3) scanning with a non-OCR capable scanner and opening the resulting image .pdf in Adobe Acrobat pro, and using the Acrobat OCR capability to render the image .pdf searchable and then exporting the resulting searchable .pdf (which I think is essentially a text document, although I might be wrong about that) to a .docx in MSWord, and doing the editing in Word.

I am wondering which of these three is the better course of action in terms of (A) the accuracy/effectiveness of the respective OCR capabilities, and (B) the ease of editing the documents in both Acrobat and Word, respectively. Is the Acrobat OCR more powerful and accurate than that of the OCR software found in most scanners, producing fewer errors and a truer rendering of the original, or alternately, is the reverse true? Is editing of a searchable/text document more easy and intuitive in Acrobat than in Word, or is the reverse of that true? Though I have used MSWord frequently in the past, I am utterly inexperienced with Acrobat and with Adobe products in general, and with the use of scanners. Any guidance that you all can give me with respect to which of the programs that I am considering that I should try will be much appreciated, especially as I will have to purchase a scanner and the Adobe or any other software that I will use.

Thanks in advance,
Mike

This isn’t really complicated, and in your shoes, I wouldn’t buy a scanner just for this. The primary factor in your decisions about how to go about this is volume. How many sheets of paper are you converting?

Fewer than 50:
For this I’d consider downloading a smartphone app that uses the phone’s camera to scan documents (CamScanner is my personal favorite), after which they can be saved with or without OCR and transferred to other devices via email of file transfer. This is a dirt cheap, but very reliable option if you’re willing to manually scan one sheet at a time using a smartphone.

Higher numbers:
Copy shops and office superstores have sheet feeder-equipped scanning capability that will convert 100’s or even 1000’s of pages quite quickly and save them on a portable USB drive. They’ll also do the OCR for a fee, or, you can take the scanned pages into Acrobat Pro as you already described and do the OCR there, with the options of saving as PDF or exporting to .docx, .txt, etc.

Haha, thanks HotButton. I find it slightly embarrassing that you noticed my mild consternation right off. The truth is that I have little experience with scanning documents (I have not had need to own a scanner before), and no experience whatsoever with using OCR software or using Adobe products. I did encounter and consider your suggestion of using an app such as CamScanner or Abbyy Finescanner, but I discounted that since those are web-based, and I am somewhat loath to transmit my intellectual property (which I hope to use for journal publication) over the internet. That, in conjunction with the fact that a desktop scanner is not a hugely expensive item, is why I have decided to simply purchase a scanner (not, however, a scanner-printer) and be done with it. After all, once I have bought a scanner, I will have that capability going forward…

My greatest need with respect to that is to know which of the myriad of scanners on the market today I should seek to buy. How will the way the scanner receives the original document effect the performance that I desire from the unit? I appears that there are three basic types of scanner, in terms of how the machine recieves the original document: the “feed through” type, the “lay flat” (as with a photocopy machine) type, and the “sheet feed” (also as with some copy machines) type. I would like to have feedback on how each of these characteristics effect the ease of getting a true and “unskewed” digital copy of the paper document.

Then, the second thing that I have to know is at which point in my process I should employ the OCR to the documents. Part of that involves which scanners on the market today have the best OCR capability, and whether that capability is as good as that found in Acrobat Pro. Another consideration is how Adobe’s OCR compares, essentially, with OmniPage by Nuance. I think I could scan image only .pdf documents and then, having installed OmniPage software on my PC, use that to OCR the documents to create searchable .pdf docs, then export those to .docx in Word for editing. There seem to be many options for a methodology, and I would like to know which is best before spending money on equipment.

Thirdly, I am wondering if Acrobat Pro allows for as intuitive and easy editing as MSWord. The ease of performing extensive editing and reworking is another consideration for the method I decide on. Any further advice will be appreciated.

Mike

Unless I am overlooking something, CamScanner is not “web based” as far as I know or have experienced. I have used it and it has worked very well for my purposes. You can use the phone app and save those files directly to your phone and transfer them to a local computer using any available means. (i.e. usb cable, bluetooth, etc).

With the high megapixel densities on most mid to high end smartphones today, you can get amazingly good results vs a dedicated scanner. But the scanners on even a typical all in one type printer should be more than sufficient for OCR tools.

Oh, my…that was a false assumption on my part. I was assuming that of CamScanner because I had investigated the use of Abbyy FineScanner, which is also a phone-based app for doing these things, and I am under the impression that with the Abbyy software, the files are transmitted to rectification and application of the OCR, then transmitted back to one’s device after a time. Thanks for the correction, skribe, I will look into that. I am on a rather steep learning curve right now.

You alluded to “megapixel densities” in your reply. That makes me think that, if I do use a scanner for this purpose, the DPI that I use for the scan will be important for the OCR to work properly. What minimum DPI should one scan at if applying OCR to a text document?

Thanks again.

©2019 Graphic Design Forum | Contact | Legal | Twitter | Facebook