⭐ What is PageIndex

PageIndex transforms lengthy PDF document into a searchable tree structure — like a smart "tables of contents" — but optimized for use with LLMs.

Built for Reasoning-based RAG 🧠, PageIndex enables LLMs to navigate documents logically and find exactly what they need through reasoning and structured relevance — without relying on vector similarity or arbitrary chunking. It's ideal for: financial reports, legal documents, technical manuals or any document that exceeds LLM context limits.

PageIndex Logo

👉 Try it now via the API or the Web Dashboard.

💬 For support or feedback, please leave us a message or join our Discord community.

✅ Key Features

  • Hierarchical Tree Structure
    Enables LLMs to traverse documents logically—like an intelligent, LLM-optimized table of contents.

  • Chunk-Free Segmentation
    No arbitrary chunking. Nodes follow the natural structure of the document.

  • Scales to Massive Documents
    Designed to handle hundreds or even thousands of pages with ease.

  • Precise Page Referencing
    Every node contains its summary and start/end page physical index, allowing pinpoint retrieval.

📦 PageIndex Format

Here is an example output. See more example documents and generated trees.

{
  "title": "Financial Stability",
  "node_id": "0006",
  "start_index": 21,
  "end_index": 22,
  "summary": "The Federal Reserve ...",
  "nodes": [
    {
      "title": "Monitoring Financial Vulnerabilities",
      "node_id": "0007",
      "start_index": 22,
      "end_index": 28,
      "summary": "The Federal Reserve's monitoring ..."
    },
    {
      "title": "Domestic and International Cooperation and Coordination",
      "node_id": "0008",
      "start_index": 28,
      "end_index": 31,
      "summary": "In 2023, the Federal Reserve collaborated ..."
    }
  ]
}

PageIndex Logo