LiteParse is a fast, open-source document parser designed for local, high-quality spatial text parsing with bounding boxes, catering to users seeking efficient and lightweight PDF parsing without proprietary LLM features or cloud dependencies.
Source: README View on GitHub →LiteParse is gaining attention due to its focus on fast and light parsing capabilities, which is particularly appealing for users who require high-quality spatial text parsing without the overhead of cloud-based services or complex OCR systems. Its multi-language support and flexible OCR options also contribute to its popularity.
Source: README, OverviewLiteParse uses PDFium for spatial text parsing, providing high-quality results with bounding boxes, ensuring precise text positioning information.
Source: README, OverviewThe OCR system supports built-in Tesseract, allows integration with HTTP servers like EasyOCR or PaddleOCR, and provides a standard API for custom OCR solutions.
Source: README, OverviewLiteParse can generate high-quality page screenshots, which are useful for LLM agents to extract visual information.
Source: README, OverviewThe tool outputs in JSON and Text formats, catering to various use cases and integration needs.
Source: README, OverviewLiteParse is accessible from Rust, Node.js/TypeScript, Python, and the browser via WASM, and runs on Linux, macOS, and Windows.
Source: README, OverviewThe architecture of LiteParse is modular, with a Rust core handling format conversion, text extraction, OCR, and spatial layout reconstruction. The project utilizes a flowchart to illustrate the data flow from input formats to output formats, with language bindings and CLI tools providing user interfaces for various programming environments.
Source: README, OverviewCenter: project; inner ring: core feature modules; outer ring: key dependencies. Auto-generated from core_features and tech_stack.key_deps.
PDFiumTesseractnapi-rsPyO3wasm-bindgenLiteParse is suitable for developers and technical users who need to parse PDFs locally, such as those working on document processing pipelines, building applications that require spatial text information, or integrating OCR capabilities into their projects.
Source: READMEv2.0.4 (2026-05-30): Fixed bounding box issues for pages with rotated text.
Source: GitHub ReleasesLiteParse is a valuable tool for developers seeking a fast and efficient local document parser with robust OCR capabilities. Its multi-language support and flexibility make it a strong choice for a wide range of applications, particularly those that require precise text positioning and local processing.