Skip to main content
Today: Today February 19, 2026
HubNews
Blockchain+
Cybersecurity+
Development+
Economy & Finance+
Gaming+
Artificial Intelligence+
Hardware+
Startups
Blockchain+
Cybersecurity+
Development+
Economy & Finance+
Gaming+
Artificial Intelligence+
Hardware+
Startups

HubNews

Receive weekly the main news and analyses about Artificial Intelligence directly in your email.

Sign Up for Free

News

  • Home Page
  • Feed
  • Guides
  • AI Products
  • Top
  • Deep Dives
  • Search

More

  • Games
  • Tools
  • Subscribe Free
  • Podcast

Information

  • About Us
  • Contact
  • FAQ
  • Developers
  • Sponsors

Legal

  • Privacy Policy
  • Terms of Service

© 2026 HubNews.ai. All rights reserved.

Artificial Intelligence
PageIndex Improves Search Precision in Long Documents to 98.7%

PageIndex Improves Search Precision in Long Documents to 98.7%

TL;DR

The <strong>PageIndex</strong>, a new open-source framework, offers a solution to a persistent problem in the realm of <strong>retrieval-augmented generation</strong> (RAG): searching through lengthy documents. The framework achieves a precision rate of 98.7% in its searches, where traditional methods fail.

venturebeat.com•January 30, 2026•
3 min read
•0 views

PageIndex Revolutionizes Search in Long Documents

The PageIndex, a new open-source framework, offers a solution to a persistent problem in the field of retrieval-augmented generation (RAG): searching in lengthy documents. The framework achieves a precision rate of 98.7% in its searches, where traditional methods fail.

Traditionally, RAG involves breaking down documents, calculating embeddings (vector representations), and storing them in a vector database. This method is effective for simple tasks, such as question answering in short documents.

However, PageIndex abandons this linear approach and redefines the search as a navigation problem rather than just lookup.

Innovation Through Tree Search

PageIndex utilizes a game AI concept – tree search. Instead of scanning each paragraph, the system mimics human behavior, consulting a virtual content table that maps the document’s structure.

This model creates a Global Index where nodes represent chapters and sections of the document. When a query is made, the system performs a tree search, categorizing each node as relevant or irrelevant based on the user’s request context.

According to Mingtian Zhang, co-founder of PageIndex, this approach transforms passive retrieval into active navigation, improving the efficiency of finding relevant information.

Challenges of Traditional RAG

The traditional RAG approach has significant limitations for complex data. Vector retrieval assumes that the text most semantically similar to a query is the most relevant, which is not always true, especially in professional domains.

Zhang illustrates with financial reports, where a query about EBITDA may return multiple sections containing the term, but only one contains the desired precise definition. This reveals the gap between user intent and available content.

Additionally, embedding models often overlook the full context of the conversation when addressing a query, making search less effective.

Multi-hop Reasoning Challenges

PageIndex's structural approach excels at multi-hop queries, where it is necessary to follow clues in different parts of a document. In benchmark tests, such as FinanceBench, the Mafin 2.5 system, built on PageIndex, achieved a precision of 98.7%.

For example, a query about the total value of deferred assets in a Federal Reserve report may fail in vector systems, which cannot recognize internal references. PageIndex, however, locates relevant information by following the document's structure, ensuring precision in answers.

Latency Trade-offs and Simplified Infrastructure

One of the immediate challenges for implementing PageIndex is the latency time. Vector queries occur in milliseconds, while tree search may introduce delays. However, Zhang explains that this latency can be imperceptible, as retrieval happens inline during the model's reasoning process.

This model also simplifies data infrastructure. By eliminating the need for a vector database, PageIndex allows for storing the structural index in a traditional relational database, such as PostgreSQL.

Deciding Between Search Techniques

Despite PageIndex's precision gains, this approach does not universally replace vector searches. It is better suited for long, structured documents where the cost of error is high.

For shorter documents, where the context is easily understandable, vector search may be more efficient. PageIndex excels in scenarios that require high auditability and a clear path to the answer, such as technical manuals and legal documentation.

The Future of Proactive Retrieval

The emergence of frameworks like PageIndex indicates a broader trend in the AI stack: the movement toward the RAG Agent, where the responsibility for data retrieval is shifting from the database level to the model level.

This is already visible in areas like code development, where agents are replacing simple vector searches with active exploration of code bases. Zhang believes that document retrieval will follow this same trajectory, signaling an evolution in the traditional authorities of databases.

Content selected and edited with AI assistance. Original sources referenced above.

Share

Sources

venturebeat.com

Primary
https://venturebeat.com/infrastructure/this-tree-search-framework-hits-98-7-on-documents-where-vector-search-fails

Jan 30, 2026

Enjoyed this article?

Get the best tech news delivered to your inbox every day.

Comments

Write a comment

More in Artificial Intelligence

Introduces 'Observational Memory' and Reduces AI Costs by Up to 10x
Artificial Intelligence

Introduces 'Observational Memory' and Reduces AI Costs by Up to 10x

Observational memory is a new memory architecture approach that promises to cut artificial intelligence (AI) costs by up to 10 times, developed by Mastra.

HubNews • FEB 10 • 1 min read
Nvidia launches DreamDojo, AI model for training robots
Artificial Intelligence

Nvidia launches DreamDojo, AI model for training robots

Nvidia has announced DreamDojo, a new artificial intelligence system designed to teach robots how to interact with the physical world. Utilizing 44 thousand hours of human video, this advancement aims to reduce time and costs in training humanoid robots.

HubNews • FEB 9 • 1 min read
Google Integrates Agentive Vision into Gemini 3 Flash
Artificial Intelligence

Google Integrates Agentive Vision into Gemini 3 Flash

Google has implemented the concept of agentive vision in its Gemini 3 Flash model, enabling a combination of visual reasoning with code execution.

HubNews • FEB 6 • 1 min read