⭐ What is PageIndex
PageIndex transforms lengthy PDF document into a searchable tree structure — like a smart "tables of contents" — but optimized for use with LLMs.
Built for Reasoning-based RAG 🧠, PageIndex enables LLMs to navigate documents logically and find exactly what they need through reasoning and structured relevance — without relying on vector similarity or arbitrary chunking. It's ideal for: financial reports, legal documents, technical manuals or any document that exceeds LLM context limits.

👉 Try it now via the API or the Web Dashboard.
💬 For support or feedback, please leave us a message or join our Discord community.
✅ Key Features
-
Hierarchical Tree Structure
Enables LLMs to traverse documents logically—like an intelligent, LLM-optimized table of contents. -
Chunk-Free Segmentation
No arbitrary chunking. Nodes follow the natural structure of the document. -
Scales to Massive Documents
Designed to handle hundreds or even thousands of pages with ease. -
Precise Page Referencing
Every node contains its summary and start/end page physical index, allowing pinpoint retrieval.
📦 PageIndex Format
Here is an example output. See more example documents and generated trees.
{
"title": "Financial Stability",
"node_id": "0006",
"start_index": 21,
"end_index": 22,
"summary": "The Federal Reserve ...",
"nodes": [
{
"title": "Monitoring Financial Vulnerabilities",
"node_id": "0007",
"start_index": 22,
"end_index": 28,
"summary": "The Federal Reserve's monitoring ..."
},
{
"title": "Domestic and International Cooperation and Coordination",
"node_id": "0008",
"start_index": 28,
"end_index": 31,
"summary": "In 2023, the Federal Reserve collaborated ..."
}
]
}