Summary
Claude Code /deep-research API rate limits are not just a vague “the Verify step is heavy” problem. In my case, many agents repeatedly searched, fetched, summarized, and re-verified material, and the accumulated token usage finally hit the Anthropic API limit during the Verify phase.
The practical fix was to stop treating the whole research run as one large parallel job. I split the work into a queue-style pipeline: Verify ran in small claim batches, Fetch stopped overlapping with Verify, and a fresh run avoided carrying over the failed Fetch cache. The result was stable and good enough to keep the three-vote verification quality.
In this article
What this article covers
The situation: Verify failed at the end
While using Claude Code /deep-research, I hit an API rate limit during the final Verify phase. The earlier research steps had mostly progressed, but the run failed right when the result was being checked.
At first glance, it feels like Verify itself is the problem. In reality, Verify was closer to the point where the accumulated cost became visible: many agents had already produced large intermediate results, and the final check needed to read and compare them again.
The core cause: too many agents used too many tokens. The Verify phase did not create the whole problem by itself; it exposed the token spike that had built up across the run.
Why the rate limit happened
Anthropic’s API rate limits are not measured only by request count. The Messages API can be limited by RPM, input tokens per minute (ITPM), and output tokens per minute (OTPM). If any one of those is exceeded, a 429 error can occur.
| Limit | Meaning | Why it matters in deep-research |
|---|---|---|
| RPM | Requests per minute | Parallel agents can increase calls quickly |
| ITPM | Input tokens per minute | Search results, fetched pages, previous summaries, and claim lists keep entering prompts |
| OTPM | Output tokens per minute | Agents produce long analysis and verification notes |
| Acceleration limit | A protection against sudden usage spikes | A wide research run can jump from low usage to a large burst |
Why it is common in deep-research
Claude Code subagents are useful because exploration can happen in a separate context window. That is exactly why deep research works well for complex work. But the same structure can also multiply token usage when the task fans out too broadly.
- Several subtopics are explored in parallel.
- Each agent reads documents, code, issues, and web pages.
- Each agent creates intermediate summaries.
- The main agent merges those summaries.
- Verify rereads the claims, evidence, and conflicts.
That is good for quality, but expensive for tokens. Verify is not a tiny final checkbox; it can become a second pass over everything that was already produced.
Splitting deep-research into a queue
The fix that worked for me was to change the execution shape, not just ask the model to “try again.” The goal was to reduce peak concurrency and keep token-per-minute usage under the ceiling.
The exact restructuring I used
| Stage | Before | After | Effect |
|---|---|---|---|
| Verify | parallel over all 75 claim checks | Process 3 claims per batch and sequentially await each batch | Reduced peak concurrency, which had reached about 9 simultaneous checks, below the TPM ceiling |
| Fetch | Nested parallel inside the pipeline | Finish Search first as a barrier, then Fetch 4 items per sequential batch | Fetch and Verify no longer overlap |
| Total work | 25 claims and 15 fetches | 18 claims and 12 fetches | Lower total token volume while preserving the important checks |
| Verification quality | Large one-shot verification | Kept the three-vote verification quality | Reduced the bottleneck without dropping the quality bar |
| Run state | Could reuse the failed Fetch cache from the previous attempt | Fresh run and clean refetch | Avoided carrying over stale failure state |
The biggest improvement came from changing Verify from a 75-item parallel verification into 3-claim sequential batches. The quality target stayed the same; only the burst pattern changed. The Search barrier also mattered because Fetch and Verify no longer competed for token budget at the same time.
A prompt you can reuse
This is the kind of instruction I would use next time before starting a large /deep-research run.
Run /deep-research, but split the whole process into a queue-style pipeline.
Rules:
1. Do not verify all claims in one parallel block. Verify 3 claims per batch and await each batch sequentially.
2. Do not run Fetch as nested parallel work inside the pipeline.
3. Finish Search first, then Fetch 4 items per sequential batch.
4. Keep Fetch and Verify separated by a barrier so they do not overlap.
5. Reduce total volume if needed, for example claims 25→18 and fetches 15→12, while preserving the core three-vote verification quality.
6. Use a fresh run if a previous Fetch stage failed, so stale cache does not affect the next attempt.
7. If token usage may spike, stop and report the current queue state.
Practical failure cases and fixes
Case 1. Verify hits 429 or a rate limit error
Do not blindly retry the same shape. First compress the current findings, split the remaining claims into small batches, and verify them one batch at a time.
Compress the current research into 15 lines.
Split the remaining Verify work into 3-claim batches.
Verify the most important batch first and report the queue state after each batch.
Case 2. Fetch and Verify overlap
If fetching sources and verifying claims happen at the same time, both input and output token usage can spike. Add a barrier: complete Search, then Fetch in small batches, then Verify.
Case 3. A failed Fetch cache affects the next run
If the previous run failed in Fetch, a clean fresh run can be better than resuming from a messy intermediate state. This is especially true when the failure happened near a rate-limit boundary.
Conclusion
When Claude Code /deep-research fails during Verify with an API rate limit, the direct cause is often not one bad Verify call. It is the accumulated work of many agents, many fetched sources, many intermediate summaries, and a final verification pass that tries to read too much at once.
For large research tasks, I would now start with a queue design: small Verify batches, Fetch separated from Verify, lower total volume, and a clean run when a previous attempt failed. It is slightly slower, but much more reliable than hitting the TPM ceiling at the end.
References
- Anthropic API Docs: Rate limits
- Claude Code Docs: Create custom subagents
- Claude Code Docs: Troubleshooting performance and stability
- Claude Code Docs: Slash commands