Retrieval augmented generation (RAG) enhances large language models (LLMs) by providing them with relevant external context. For example, when using a RAG system for a question-answer (QA) task, the LLM receives a context that may be a combination of information from multiple sources, such as public webpages, private document corpora, or knowledge graphs. Ideally, the LLM either produces the correct answer or responds with “I don’t know” if certain key information is lacking.
A main challenge with RAG systems is that they may mislead the user with hallucinated (and therefore incorrect) information. Another challenge is that most prior work only considers how relevant the context is to the user query. But we believe that the context’s relevance alone is the wrong thing to measure — we really want to know whether it provides enough information for the LLM to answer the question or not.
In “Sufficient Context: A New Lens on Retrieval Augmented Generation Systems”, which appeared at ICLR 2025, we study the idea of “sufficient context” in RAG systems. We show that it’s possible to know when an LLM has enough information to provide a correct answer to a question. We study the role that context (or lack thereof) plays in factual accuracy, and develop a way to quantify context sufficiency for LLMs. Our approach allows us to investigate the factors that influence the performance of RAG systems and to analyze when and why they succeed or fail.
Moreover, we have used these ideas to launch the LLM Re-Ranker in the Vertex AI RAG Engine. Our feature allows users to re-rank retrieved snippets based on their relevance to the query, leading to better retrieval metrics (e.g., nDCG) and better RAG system accuracy.