Tool-calling works best when tools are small, deterministic, and schema-validated, and when the model is instructed to use tools for facts instead of guessing. The highest-value patterns are: retrieval (search_docs), code navigation (read_file, search_repo), and verification (run_tests, run_lint). Opus 4.6 is strong at multi-step workflows, so tools let it behave like an agent: it can fetch what it needs, act, and then verify.
A practical set of tools for developer workflows:
search_docs(query, top_k, filters)→ returns chunk IDs, text, URLsread_file(path)→ returns file content (bounded size)search_repo(pattern)→ returns file paths + matchesrun_tests(command)→ returns exit code + logsapply_patch(diff)→ applies a unified diff in a sandbox
Then apply guardrails:
Validate tool arguments against JSON schema.
Allowlist file paths and commands.
Cap tool calls per run (prevents loops).
Require a final “explain what you did + how to verify.”
A useful pattern is “tool-first policy”: “If you need a fact (API behavior, docs detail), call search_docs.” Another is “verify-before-final”: the model must run tests before producing a final answer when it changes code.
Retrieval tool-calling is where vector databases shine. Back search_docs with Milvus or Zilliz Cloud, using metadata filters for version/lang/tenant. This gives Opus 4.6 a reliable knowledge source and makes behavior auditable: you can log tool calls and the chunk IDs returned. Over time, this becomes your most powerful debugging lever: if answers are wrong, you fix retrieval and chunking rather than trying to “prompt harder.”