We built what Andrej Karpathy described, and solved the hard part. OpenKB is an open-source CLI that compiles raw documents into a structured, interlinked wiki, powered by PageIndex for long PDFs.
PageIndex OCR is the world's first OCR model that understands documents as a whole — preserving full structure and section hierarchy across pages, instead of treating each page as an independent unit.