A PHP library for low level access of PDF files

SetaPDF-Core

Access PDF documents at their lowest level with PHP

SetaPDF-Core - Extract a page from a 50000 pages document

It is not a common task to extract pages from a PDF document with 50,000 (!!) pages - most people out there doesn't know that these things exists... Anyhow at least one of our customer had this requirement and here's the result.

The difficult thing for this task is the parsing of the page tree. The demo file includes only a single page tree node with 50,000 leaf nodes which requires the whole tree to be read/parsed. If the page tree would be structured like a balanced tree the component could resolve a faster way to access the needed page object... anyhow we were not able to create such file atm.

This demo make use of the fact that a SetaPDF_Core_Document instance is serializable to speed things up. You can play with this feature by enabling or disabling the cache with the desired check box.

(You can view the original 50000 pages PDF document here)