Extract text, glyphs, words and metrics from PDF documents with PHP

SetaPDF-Extractor

Extract text, glyphs, words and metrics from PDF documents with PHP

Downloads and Changelogs of the SetaPDF-Extractor

The following table will show you all changelogs and available downloads of the SetaPDF-Extractor component. A full overview of all your licenses is available in your personal Pickup Depot.

SetaPDF-Extractor

Version 2.48.0.2155

Release date: 2025-06-19
SetaPDF-Extractor Component
Feature
  • Added Extractor::getResultByPage() method.
  • Added getResultByTextItems() method to all available strategies to allow filtering a result several times.
  • Added Extractor::getTextItemsByPage() method.
Bugfix
  • Fixed infinite loop through circular references in form XObjects.
Tweak
  • Changed all internal used class names to new namespaced variation.
SetaPDF-Core Component
Feature
  • Added support for Tiff predictor (only 8 bits-per-component).
Bugfix
  • Fixed cleanup of XMP packet wrapper.
  • Fixed creation of compressed object streams for new created objects.
  • Fixed behavior in creation of object streams with objects of foreign document instances which are referenced later again.
  • Ensure a stream object in Page::toXObject() to prevent "__clone method called on non-object" error (raised only in faulty documents).
Tweak
  • Changed all internal used class names to new namespaced variation.
  • Moved Kids array in tree structures into an indirect object instead of a direct object.
  • Cache IDTree instance in StructTreeRoot class.
  • Removed unnecessary memoization of cross-reference entries to save memory.
  • Optimized the creation of compressed cross-reference streams in view to memory consumption.
  • Changed parsing of hex strings to a token-based form to ignore whitespaces and comments directly.