The recent advances in large language models (LLMs) have fueled the emergence of deep research (DR) agents. These agents demonstrate remarkable capabilities, including the generation of novel ideas, efficient information retrieval, experimental execution, and the subsequent drafting of comprehensive reports and academic papers.
Currently, most public DR agents use a variety of clever techniques to improve their results, like performing reasoning via chain-of-thought or generating multiple answers and selecting the best one. While they’ve made impressive progress, they often bolt different tools together without considering the iterative nature of human research. They’re missing the key process (i.e., planning, drafting, researching, and iterating based on feedback) on which people rely when writing a paper about a complex topic. A key part of that revision process is to do more research to find missing information or strengthen your arguments. This human pattern is surprisingly similar to the mechanism of retrieval-augmented diffusion models that start with a “noisy” or messy output and gradually refine it into a high-quality result. What if an AI agent’s rough draft is the noisy version, and a search tool acts as the denoising step that cleans it up with new facts?
Today we introduce Test-Time Diffusion Deep Researcher (TTD-DR), a DR agent that imitates the way humans do research. To our knowledge, TTD-DR is the first research agent that models research report writing as a diffusion process, where a messy first draft is gradually polished into a high-quality final version. We introduce two new algorithms that work together to enable TTD-DR. First, component-wise optimization via self-evolution enhances the quality of each step in the research workflow. Then, report-level refinement via denoising with retrieval applies newly retrieved information to revise and improve the report draft. We demonstrate that TTD-DR achieves state-of-the-art results on long-form report writing and multi-hop reasoning tasks.