News Archive

Release of the SetaPDF-Extractor component2015-02-06

After several month of work and research we finally released the initial version of the SetaPDF-Extractor component.

This component will allow PHP developers to extract text from PDF documents. Furthermore it allows detailed access to words or glyphs and their positions and bounding boxes on a PDF page.

Completely written in PHP and backed up by the SetaPDF-Core component we're very proud to release this product to the public. Any feedback or question is welcome! Just send an email to

The full product details are available here.
For a full user manual including an API documentation see here.

Just give it a try

This demo extract simple plain text from a single page:

Select or upload a file

The uploaded files are bound to your browser session and are not accessible by any other user. They will get deleted after 24 hours automatically.

Password for authentication

If the PDF is protected with a password, you can authenticate with this password.

You may also check out the additional demos:

Extract Plain Text

Extract plain text from a PDF document.

Get Words

Get words and their bounding boxes from PDF documents.

Mark Words

Mark or highlight all words on a specific PDF page.

Extract Words By a Specific Location

Use a rectangle filter to limit the result to a specific area.

Phrase Search

Create a phrase search with the SetaPDF-Extractor component.

Count Words

Count words in a PDF document with PHP.

Get Word Groups

Get words grouped by visible entities.

Mark Word Groups

Mark word groups of visible entities.

...more demos

See more live demos in our demo package, which is shipped with the products.