Tag Archives: PDF OCR

How to OCR a PDF in PDF Studio

Here is a video showing how to OCR a PDF in PDF Studio

Video Transcript:

  1. Hi, today I’m going to show you how to enhance a scanned document and interact
    with the text in that document using the OCR feature in pdf studio. As you can see the document is a little tilted and you’re unable to interact with any of the text within the document.
  1. First what you want to do is go to the document tab, click OCR, and a list of OCR options will appear. First you want to select the language you want. Initially languages will not appear in the language box but you can download new languages at any time using the download OCR languages tab. So we’re going to click English, it’s already downloaded and the text is in English so we’re going to click English.
  1. We’re going to select the page we’re working on, I’m just going to click current page.
  1. We can edit the DPI resolution here. The standard resolution for most documents is going to be 300 but if the document happens to be really grainy or unclear you can always decrease the DPI resolution. It seems pretty decent now so I’m just going to leave it at 300.
  1. As I said before the document is a little tilted so the auto deskew images box will help straighten it. It’s usually already checked so I’m just going to keep it that way. Press ok. As you can see the document is now a lot straighter.
  1. We can also interact with the document text by going to the comment tab and selecting text. Now you can select text within a document. So say we want to extract information from the scanned document, I can now just copy and paste the information in the new document I’m working in.
  1. I can also highlight text in this document, just select highlight text and highlight the text you want.
  1. So I taught you how to enhance a document using the OCR feature as well as interact with the text within a scanned document. Thank you for listening.
Follow Facebooktwitterlinkedinrssyoutube

Summer release of Java PDF library adds PDF OCR, digital signature enhancements including latest AES 256 encryption

Atlanta, GA August 12 2013 – Qoppa Software’s summer release of Java® PDF component and library products delivers a new OCR module, digital signature enhancements including latest AES 256 algorithm and many other improvements.

Qoppa Software is pleased to announce a new Java PDF OCR library sdk which supports all Latin-based languages including English, German, French, and Spanish and is available for Windows®, Mac OS X® and Linux®, in 32 and 64 bit. This is a clean, production-level Java integration of the well-known Tesseract engine and Qoppa’s own PDF rendering and editing technology.

This release also contains many digital signature enhancements including PDF certifying signatures, often used in document workflows to approve documents before publication. A certifying signature is the first digital signature applied to a PDF document and allows specifying what subsequent changes may be made to the certified document.

Appearance of digital signatures was improved to allow validation of multiple signatures. Since the PDF format does not inherently support multiple digital signatures, Qoppa’s PDF engine is parsing for the content added after a signature and identifying which changes are acceptable and which changes invalidate the signature.

The new AES 256 encryption algorithm (R=6) has been implemented and allows encrypting or decrypting PDF documents with the highest level of security available. This algorithm is defined in the upcoming PDF 2.0 specifications and is compatible with Adobe® Acrobat® X, XI and above.

Finally, our PDF viewing components now offer a flexible navigation API, allowing developers to customize navigation within a PDF document or across PDF documents to fit their document workflow.

 “Our goal is to continue to offer the most comprehensive PDF technology available in Java.  We are excited to present our new PDF OCR solution. This is an affordable, integrated solution to recognize text in PDF documents from Java applications on Windows, Mac and Linux, in a J2EE server environment, or on the client side.” says Gerald Holmann, Qoppa Software President.

The new release adds many other improvements and internal fixes.  For more details, please visit the links below:

Full Release Notes for Qoppa’s Java PDF Component 2013R2

Full Release Notes for Qoppa’s Java PDF Libraries 2013R2

About Qoppa Software:
Qoppa Software specializes in Java PDF library products – pure Java as well as Android Java – for developers to integrate into their own Java or web applications. Qoppa Software also offers a fully-featured PDF end-user application, a PDF server, and Android PDF apps developed on Qoppa’s own robust PDF technology.Follow Facebooktwitterlinkedinrssyoutube