OCR apps you cannot live without!

Hi everyone. Need to extract text from a pdf but it’s an image?
Enter … ocrmypdf.

My day job (not for much longer - yay!) is modifying texts for students with reduced or no vision - it’s frustrating using Okular (pdf and lots of other things viewer) only to find you cannot extract text.

Install ocrmypdf - it’s a command line utility so for example you have a pdf called text.pdf (but really its only contains an image of text, and it is in your Downloads folder say. Open a terminal and:

cd Downloads
ocrmypdf text.pdf output_pdf

The output_pdf is now a fully strippable pdf in Okular! Yay!

DRM pdf? Use LIOS - Linux Intelligent OCR Software.

Just open the pdf in LIOS - it calls them files then then the output in the left pane are termed ‘images’ - recognise all images from the menu then on the centre bottom pane just do any deleting that is necessary, select all, copy and paste into your Text Processor - job done.

1 Like

Well work needed some work doing on Editable pdf fields. I had a very old copy of Adobe Acrobat lying around and it would not run in 64-bit Windows 7. I was toying with purchasing Master PDF Editor - free for personal use on Linux but limited functionality. I then searched 'Alternative to' and found rave reviews for Qoppa (pdf StudioPro 2020). One of the few Pdf suites that supports Linux - absolute gem piece of software - got it on offer reduced from $129 to $109.

Hello SWARF! :grin:

Please do review that software, once you've gone through it all, and have used it for awhile. I am sure there are other's who need a good PDF program as well.

When I have time I will take some screen shots and post to imgBB!