We present PageIndex, a reasoning-based RAG system that simulates how human experts navigate and extract knowledge from long documents through tree search.
PageIndex OCR is the world's first OCR model that understands documents as a whole — preserving full structure across pages, instead of treating each page as an independent unit.