Suppose you have a PDF document that was made using a scanner, or otherwise consists of image data but doesn't have text data. Such a PDF can't be searched by PDF readers or desktop search applications.
Read more »Howto Make scanned PDFs searchable (OCR) using pdfocr
http://www.ubuntugeek.com –
Created by bruce.almighty 14 years 24 weeks ago – Made popular 14 years 24 weeks ago
Category: End User Tags:
Category: End User Tags:
Tesseract: an Open-Source Optical Character Recognition Engine
http://www.linuxjournal.com –
I play with open-source OCR (Optical Character Recognition) packages periodically. My last foray was a few years ago when I bought a tablet PC and wanted to scan in some of my course books so I could carry just one thing to school. I tried every package I could find, and none of them worked well enough even to consider using.
Read more »