ChatGPT 5.1 Still Struggles on Long Documents

ChatGPT 5.1 (Instant and Thinking) launched a few weeks ago with a 400K-token context window, meaning it can fit hundreds of pages of PDFs directly in a single chat.

However, being able to fit a document in the context doesn’t automatically mean the model can reliably work with that document. As the context grows, models experience what researchers call “context rot” problem: their ability to retrieve, interpret, and use information degrades as more tokens are added.

In practical use, this results in the following issues:

Context Confusion — Mixing up numbers or details drawn from different sections of the document
Missing Information — Omitting values or definitions that are explicitly provided in the document
Hallucination — Producing confident but incorrect information
Low Traceability — Giving answers that are difficult to verify because they are not tied to any page or reference

This matters a lot for professional and high-stakes use cases. So how is this problem usually addressed?

From Traditional RAG to Reasoning-based Retrieval

The standard way to solve context limits is Retrieval Augmented Generation (RAG): instead of passing the entire document into the model, the traditional vector-based RAG splits a document into chunks, embeds them into a vector database, and at query time retrieves the most “semantically similar” chunks to feed into the model together with the query as input.

While simple and effective for short texts, this approach has fundamental limitations. The core issue is that semantic similarity is not equivalent to true relevance, especially for long, technical documents where many passages share near-identical semantics but differ in what actually matters for the question.

To address this, PageIndex takes a different approach than the traditional RAG. Unlike vector-based methods that rely on static semantic similarity, PageIndex uses a dynamic, iterative reasoning process to actively decide where to look next based on the evolving context of the question. In other words, PageIndex doesn’t just search for text that looks similar — it reasons through the document to find the parts that are actually relevant to the query. We describe this framework in detail in PageIndex: Next-Generation Vectorless, Reasoning-based RAG.

Introducing PageIndex Chat

PageIndex Chat is a chat platform powered by PageIndex's reasoning-based indexing and retrieval techniques. It is a long-document AI analyst designed specifically for accurate, page-grounded question answering on long, professional documents.

Whenever you ask a question about a lengthy document, instead of searching for information by matching embeddings or scanning text superficially (like most other AI tools do), it reasons through the document like an experienced human expert to find precise, contextual answers.

PageIndex vs. ChatGPT 5.1

Industry-Aligned Long-Document QA Benchmark

We constructed a benchmark grounded in real industry practice: five real-world energy-sector business plans (each ~200 pages) paired with 22 practical questions that domain practitioners routinely ask, such as:

“How much is the average domestic bill decrease expected in the base case for London Power Networks?”

We compared PageIndex Chat to ChatGPT 5.1 Instant, and ChatGPT 5.1 Thinking, and evaluated their responses. We have published the full test here.

Higher Accuracy and Faster Responses

On this evaluation set, PageIndex achieved 100% accuracy — perfect performance across all queries. By comparison, ChatGPT 5.1 Instant reached 59.1% accuracy, and ChatGPT 5.1 Thinking reached 81.8% accuracy.

Accuracy Comparison

Even more surprising, PageIndex’s reasoning-based approach was also significantly faster than both ChatGPT 5.1 Instant and ChatGPT 5.1 Thinking. In other words, it delivered higher accuracy and lower latency simultaneously. The figure below shows the average response time across all models.

Response Time Comparison

You can browse the full test results here and see a detailed walkthrough of one of the test cases.

Verifiable Answers with Page-Level References

Beyond accuracy and speed, traceability is also critical for professional use cases.

Unlike ChatGPT 5.1, which typically returns answers without precise source locations, PageIndex Chat provides every answer with specific page references, so each figure or statement is tied back to its source. This makes it easy to verify numbers directly against the original text.

PageIndex Chat provides page-level references with its answers

Try PageIndex Chat

Use PageIndex Chat to query long documents with a human-like AI document analyst that delivers accurate answers, fast responses, and page-level traceability, turning hundreds of pages into decision-ready insights.

Ready to see how this works on your own documents? Try PageIndex Chat now.