Agentic RAG: Combining AI Agents with Retrieval-Augmented Generation for Complex Workflows

Beyond single-shot retrieval

Standard RAG follows a simple pattern: user asks question → system retrieves documents → LLM generates answer. One retrieval, one generation. This works for straightforward questions (“What’s the deadline for filing a counterclaim?”) but fails for complex queries that require reasoning across multiple sources, follow-up retrieval based on initial findings, or integration of information from different types of sources.

Agentic RAG adds an agent layer that can plan retrieval strategies, evaluate initial results, reformulate queries, retrieve additional context, and iterate until it has enough information to answer. The agent doesn’t just retrieve — it reasons about what to retrieve and whether the retrieved information is sufficient.

How it works

A user asks: “Compare how New York and California courts have handled force majeure claims in commercial leases since 2020.” Standard RAG would retrieve the top 5–10 chunks matching this query and try to generate an answer — often producing a shallow or incomplete response because a single retrieval can’t cover both jurisdictions, both time ranges, and both the substantive law and case outcomes.

An agentic RAG system handles this differently. The planner decomposes the query into sub-tasks: retrieve New York force majeure case law since 2020, retrieve California force majeure case law since 2020, identify key holdings from each jurisdiction, compare approaches. The agent executes each sub-task as a separate retrieval, evaluates whether the results are sufficient (enough cases? representative of the jurisdictions? recent enough?), and may reformulate queries based on what it finds (if initial results mention a landmark case, retrieve that case specifically).

The output is a structured comparison — not a single paragraph but an organised analysis with citations from both jurisdictions.

Architecture patterns

Query decomposition is the simplest agentic RAG pattern. The agent splits a complex query into simpler sub-queries, retrieves for each, and synthesises the results. This works well for comparative questions, multi-faceted research tasks, and questions that span multiple topics or time periods.

Iterative retrieval adds a feedback loop. After initial retrieval, the agent evaluates the results: are they relevant? sufficient? contradictory? If not, it reformulates the query and retrieves again. This handles ambiguous queries (where the user’s intent becomes clearer after seeing initial results) and research-style tasks (where each finding suggests new avenues to explore).

Tool-augmented retrieval lets the agent use tools during the retrieval process — calling a calculation API to compute statistics from retrieved data, querying a structured database alongside the vector store, or checking an external source to verify a claim found in the primary corpus. This is the pattern we use in the Denovo AI Engine, where agents can search internal knowledge, query CourtListener, and process results using domain-specific rules.

When to use agentic RAG vs. standard RAG

Standard RAG is sufficient for factual questions with straightforward answers, queries where the top 5–10 retrieved chunks are likely to contain the full answer, and applications where response time matters more than comprehensiveness (standard RAG: 2–4 seconds; agentic RAG: 10–30 seconds).

Agentic RAG is worth the complexity for research-style queries requiring synthesis across multiple sources, comparative analysis across categories/jurisdictions/time periods, tasks where the query needs refinement (the user doesn’t know exactly what they’re looking for), and workflows where the AI needs to verify or cross-reference its findings.

Production considerations

Agentic RAG is slower and more expensive than standard RAG. Each agent step involves at least one LLM call (for reasoning) and one or more retrieval operations. A complex query might involve 5–10 LLM calls and 3–5 retrievals. Budget $0.05–$0.30 per query at current API prices, compared to $0.01–$0.05 for standard RAG.

The trade-off is quality. For complex questions, agentic RAG produces dramatically better answers. The key is routing: use standard RAG for simple questions and agentic RAG for complex ones, with a classifier that decides which path to take based on query characteristics.

Budget: adding agentic RAG capabilities to an existing RAG system: $20K–$40K, 4–6 weeks. Building a full agentic RAG system from scratch: $50K–$100K, 6–10 weeks.

Need agentic RAG for complex research workflows? Contact us — we’ve built these for legal and financial applications.