Extract Text and Data from PDF with the SetaPDF-Extractor ▷ setasign.com

PDF Text extraction with PHP

The SetaPDF-Extractor component is written in PHP and allows PHP developers to extract textual content from existing PDF documents.

Beside extracting text it is also possible to extract glyphs, words or groups of words and their positions and bounding boxes through different extraction strategies.

A simple text extraction process of a single page will look like:

 Code
 Run

 Code
 Run

In Action [See all demos]

Mark Words

Mark or highlight all words on a specific PDF page.

Count Words

Count words in a PDF document with PHP.

Find more in the demos section.

Examples of Usage

Create a search index for PDF documents
Extract the plain text from PDF documents to create a search index.
Extract data from a specific locations on a PDF page
For example an invoice number, sender name, po number,...
Highlight words in a PDF document
A full indexed search catalog may allow your customers to hightlight the words in the PDF document due a Highlight Annotation.

Miscellaneous

System requirements
FAQ
Manual
SetaPDF-License Agreement
The German original is available here.

Questions about SetaPDF-Extractor?

If you are searching for a feature or have any question regarding this or any other product, contact us at support@setasign.com.

Do you like this product?

Then it would be awesome, if you‘d recommend it to your friends!

SetaPDF-Extractor Extract text, glyphs, words and metrics from PDF documents with PHP