⭐ What is PageIndex

PageIndex can transform lengthy PDF documents into a semantic tree structure, similar to a "table of contents" but optimized for use with Large Language Models (LLMs). It's ideal for: financial reports, regulatory filings, academic textbooks, legal or technical manuals or any document that exceeds LLM context limits.

Try it now in the Dashboard.

✅ Key Features

  • Scales to Massive Documents
    Designed to handle hundreds or even thousands of pages with ease.

  • Hierarchical Tree Structure
    Enables LLMs to traverse documents logically—like an intelligent, LLM-optimized table of contents.

  • Precise Page Referencing
    Every node contains its summary and start/end page physical index, allowing pinpoint retrieval.

  • Chunk-Free Segmentation
    No arbitrary chunking. Nodes follow the natural structure of the document.

📦 PageIndex Format

Here is an example output. See more example documents and generated trees.

{
  "title": "Financial Stability",
  "node_id": "0006",
  "start_index": 21,
  "end_index": 22,
  "summary": "The Federal Reserve ...",
  "nodes": [
    {
      "title": "Monitoring Financial Vulnerabilities",
      "node_id": "0007",
      "start_index": 22,
      "end_index": 28,
      "summary": "The Federal Reserve's monitoring ..."
    },
    {
      "title": "Domestic and International Cooperation and Coordination",
      "node_id": "0008",
      "start_index": 28,
      "end_index": 31,
      "summary": "In 2023, the Federal Reserve collaborated ..."
    }
  ]
}