Common failure modes for Opus 4.6 agents are usually system-level issues: bad retrieval, tool misuse, format drift, runaway context, and overconfident synthesis. Even strong models can fail if the surrounding workflow is loose. The most frequent real-world pattern is “it answered confidently, but the answer wasn’t in the provided sources,” which is a grounding failure. Another common pattern in agent loops is “it kept taking actions without converging,” which is a tool/stop-condition failure. These are predictable and fixable when you log the right traces and enforce contracts.
Here’s a practical failure-mode checklist with fixes:
Retrieval mismatch (wrong version / wrong tenant / wrong doc type)
- Fix: mandatory metadata filters, better chunking, retrieval eval set.
Prompt injection via retrieved content
- Fix: treat retrieved text as untrusted; keep system rules strict; strip instructions from context.
Tool-call errors (bad arguments, unnecessary calls, loops)
- Fix: JSON schema validation, tool allowlists, max tool-call budget, “ask user when uncertain.”
Format drift (invalid JSON, missing sections, missing citations)
- Fix: output schema + validator + automatic re-prompt on parse errors.
Context bloat (slow, unfocused, contradictory)
- Fix: compact state object, trim history, retrieval-first prompting.
Unverified code changes
- Fix: require tests/lint; do not finalize without verification; apply diffs in sandbox.
The best way to operationalize this is to log: retrieved chunk IDs, tool calls, output validation status, and verification results. That turns failures into debuggable incidents.
In RAG-based agents, most “agent failures” are retrieval failures. If you store knowledge in Milvus or Zilliz Cloud, you can diagnose quickly: did top-k retrieval include the right chunk? If not, fix chunking/metadata filters. If yes, tighten prompt rules and enforce citations so the model can’t wander. Over time, you’ll find that a disciplined system—retrieval grounding, strict output contracts, tool validation, and verification loops—lets Opus 4.6 behave consistently even in complex agent workflows.