[GH-ISSUE #23] feat(cli): Add PDF Support #6

Closed
opened 2026-03-02 11:44:57 +03:00 by kerem · 0 comments
Owner

Originally created by @karant-dev on GitHub (Dec 13, 2025).
Original GitHub issue: https://github.com/karant-dev/AutoRedact/issues/23

Context

CLI PDF support was attempted in v2.0 (PR #22) but blocked due to technical limitations in the Node.js ecosystem. Currently, the CLI only supports .jpg and .png.

Technical Blockers

  • pdfjs-dist vs node-canvas: The standard PDF.js library requires a comprehensive DOM implementation (specifically Path2D and DOMMatrix) to render PDFs.
  • Missing APIs: node-canvas does not fully implement Path2D, causing pdfjs-dist (both Legacy and Modern builds) to crash with ReferenceError: Path2D is not defined.
  • Polyfill Failure: Extensive attempts to polyfill window, document, and Canvas globals failed to resolve the rendering context incompatibility.
  • Dependency Issues: Alternative libraries like pdf-img-convert failed to install due to node-pre-gyp compilation errors on modern Node versions.

Proposed Solutions

  1. Docker/System Deps: Use pdftoppm (part of poppler-utils) via child_process. This is robust but adds system dependencies.
  2. WASM Renderer: Investigate a pure WASM PDF renderer that decouples from the DOM entirely.
  3. Wait: Wait for node-canvas to implement Path2D.

Current Workaround

Users are advised to convert PDFs to images manually before processing with the CLI.

Originally created by @karant-dev on GitHub (Dec 13, 2025). Original GitHub issue: https://github.com/karant-dev/AutoRedact/issues/23 ### Context CLI PDF support was attempted in v2.0 (PR #22) but blocked due to technical limitations in the Node.js ecosystem. Currently, the CLI only supports `.jpg` and `.png`. ### Technical Blockers - **`pdfjs-dist` vs `node-canvas`**: The standard PDF.js library requires a comprehensive DOM implementation (specifically `Path2D` and `DOMMatrix`) to render PDFs. - **Missing APIs**: `node-canvas` does not fully implement `Path2D`, causing `pdfjs-dist` (both Legacy and Modern builds) to crash with `ReferenceError: Path2D is not defined`. - **Polyfill Failure**: Extensive attempts to polyfill `window`, `document`, and `Canvas` globals failed to resolve the rendering context incompatibility. - **Dependency Issues**: Alternative libraries like `pdf-img-convert` failed to install due to `node-pre-gyp` compilation errors on modern Node versions. ### Proposed Solutions 1. **Docker/System Deps**: Use `pdftoppm` (part of `poppler-utils`) via `child_process`. This is robust but adds system dependencies. 2. **WASM Renderer**: Investigate a pure WASM PDF renderer that decouples from the DOM entirely. 3. **Wait**: Wait for `node-canvas` to implement `Path2D`. ### Current Workaround Users are advised to convert PDFs to images manually before processing with the CLI.
kerem 2026-03-02 11:44:57 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/AutoRedact#6
No description provided.