\\ Home Page : Articolo : Stampa
Optical Character Recognition does not Recognize Various Languages from Scanned Image in Adobe Acrobat Pro Extended.
By Admins (from 15/10/2014 @ 08:02:09, in en - Science and Society, read 2595 times)

You have scanned  a document with Adobe Acrobat Pro Extended, let's say an old library book, with yellow pages and stamps on it and want to do the text recognition to transform the image scan into plain text for Microsoft Word Office, Apache Open Office Writer, RTF and Word Pad documents software or other word text editor.

Then you should select from the menu:

Document > OCR Text Recognition > Recognise Text Using OCR...

One will get the window: Recognise Text with the Pages options (All pages, Current page or From page 1 to XX) and the Setting to Edit (Primary OCR Language - where you should select the text language from the scan: Romanian, English, Italian, etc.), PDF Output Style and Downsample Images options.

Unfortunately, sometimes you will have this message (not recognising the diacritics characters you have on the text, for example when the conversion is for Romanian language):

Acrobat could not perform recognition (OCR) on this page because: This page has graphics other than images or text on it. It cannot be captured.

What to do?

Jut save the scanned pages as images (.PNG):

File > Save As... > select *.PNG

and then open again these png files with Adobe Acrobat Pro Extended and just do the same operation of Text Recognition for the language of the text.

Now the text will have the diacritics for your language, very useful for the text you want to select, copy and paste in a text editor software!

Arturo Find for TurismoAssociati.it