Release of the SetaPDF-Extractor component2015-02-06
After several month of work and research we finally released the initial version of the SetaPDF-Extractor component.
This component will allow PHP developers to extract text from PDF documents. Furthermore it allows detailed access to words or glyphs and their positions and bounding boxes on a PDF page.
Completely written in PHP and backed up by the SetaPDF-Core component we're very proud to release this product to the public. Any feedback or question is welcome! Just send an email to firstname.lastname@example.org.
The full product details are available here.
For a full user manual including an API documentation see here.
Just give it a try
This demo extract simple plain text from a single page:
You may also check out the additional demos:
Extract plain text from a PDF document.
Get words and their bounding boxes from PDF documents.
Mark or highlight all words on a specific PDF page.
Use a rectangle filter to limit the result to a specific area.
Create a phrase search with the SetaPDF-Extractor component.
Count words in a PDF document with PHP.
Get words grouped by visible entities.
Mark word groups of visible entities.