Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

by Delarno June 6, 2026

June 6, 2026 0 comments

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

[ad_1]

Experiments and results

We evaluated agentic RAG on FramesQA, which is based on the FRAMES paper. An example multi-hop question is:

“Of the top two most watched television season finales (as of June 2024), which finale ran the longest in length and by how much?”

The RAG system needs to perform multiple steps to arrive at the correct answer. First, it has to identify that the two most watched finales are from the shows M*A*S*H and Cheers. Then, it has to find their running times, and calculate the length difference. In many RAG settings (Vanilla RAG or agentic RAG without sufficient context), we could end up in a situation where the model says something like:

“Despite multiple scans, I found no explicit runtimes for M*A*S*H or Cheers. The documents provide viewership data, but not the duration in minutes or hours.”

This does not answer the question.

Fortunately, our agentic RAG can solve this by first searching for the TV shows, then using the Query Rewriter and Sufficient Context Agent to have a targeted search for the run time of M*A*S*H or Cheers. Then, Gemini can easily determine which finale ran the longest in length and by how much:

“The M*A*S*H finale ran for 150 minutes, making it the longest of the top two. It was 52 minutes longer than the Cheers finale, which ran for approximately 98 minutes.”

We ran an experiment to test this ability at scale (FramesQA has 824 queries along with a corpus containing 2,676 PDF documents). In the “Vanilla” RAG setting, we use Google’s RAG Engine (which has an advanced retrieval engine, LLM parser, and re-ranker). We compared this with our agentic RAG in two settings. In the single-corpus setting, we retrieve from the FramesQA documents. In the cross-corpus setting, we also include three other distracting datasets, where the Planner Agent must determine where to retrieve from. This cross-corpus setting mimics use cases where companies have databases managed by separate teams. We compute accuracy by using an LLM-as-a-judge to compare the system responses to the ground truth answers in the dataset.

In the cross-corpus setting, our system nearly matches its single-corpus accuracy. Even when the Planner Agent must select the correct corpus out of 4 possibilities, we successfully route the search queries and answer 90.1% of questions correctly. Also, the latency of both single- and cross-corpus versions is about the same (within 3% on average). This demonstrates that our Agentic RAG system can reason over multiple, unrelated data sources, which opens up possibilities for more flexible retrieval scenarios.

[ad_2]

Source link

Delarno

I Am Who I Am, to not become what people want me to be.

Experiments and results

Useful Links

Edtior's Picks

Latest Articles

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Experiments and results

Delarno

The ‘Vessel’ That Carried Tagore’s Best Treasures

Lebanon and Israel’s perpetual war machine | US-Israel war on Iran

You may also like

Moving sales and service organizations forward with agentic CX and Microsoft 365...

Tiny robot boats build floating structures | MIT News

LLM Orchestration Frameworks Compared: LangChain vs. LlamaIndex vs. Raw API Calls

Quantum mechanics once baffled scientists. Now it’s changing the world

Your identity stack was built for two kinds of actor. Agents are...

Posit AI Blog: Getting into the flow: Bijectors in TensorFlow Probability

Leave a Comment Cancel Reply

Useful Links

Edtior's Picks

Latest Articles