Logo

All Labels

  • Published on
    We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model and a measure of certainty of the implicit preference model optimized by DPO.
  • Published on
    We explore the rise of agentic retrieval over vector indexing and how PageIndex can be used to build agentic RAG systems.
  • Published on
    We examine the inherent limitations of OCR from an information-theoretic perspective and show why a direct, vision-based approach with PageIndex is more effective.
  • Published on
    PageIndex OCR is the world's first OCR model that understands documents as a whole — preserving full structure and section hierarchy across pages, instead of treating each page as an independent unit.
  • Published on
    Experience the power of reasoning-based RAG with PageIndex Chat - our new conversational interface for intelligent document understanding.
  • Published on
    PageIndex is a vectorless, reasoning-based retrieval framework that simulates how human experts extract knowledge from complex documents. Instead of relying on vector similarity search, it builds a tree-structured index from documents and enables LLMs to perform agentic reasoning over that structure for context-aware retrieval. The retrieval process is traceable and interpretable, and requires no vector DBs or chunking.
  • Published on
    We benchmarked PageIndex Chat against ChatGPT 5.1 on real-world long documents. PageIndex achieved 100% accuracy compared to ChatGPT 5.1's 59-82%, with faster response times and page-level traceability.
  • Published on
    How PageIndex’s vectorless, reasoning-based RAG overcomes the challenges of traditional vector RAG in long, complex technical manuals.