GPT 5.3 Codex should query Milvus with metadata filters by treating filtering as a first-class correctness constraint, not a secondary optimization. In documentation assistants and RAG systems, most wrong answers are caused by retrieving the “right topic but wrong version” or the “right keyword but wrong product area.” Metadata filters solve that. The correct flow is: embed query → vector search top-k → apply or combine with filter conditions like version == "v2.5" and lang == "en" → return the filtered top-k chunks. In Milvus, filters are typically expressed as boolean expressions over scalar fields (strings, ints, arrays), and you should model your collection schema around the filters you need (version/product/lang/doc_type). Once you do that, your retrieval becomes more stable and your prompts become smaller.
A robust “filtered retrieval” pattern looks like this:
Collection fields
embedding(vector)text(string)source_url(string)version(string)product(string)lang(string)doc_type(string)updated_at(int timestamp)
Query-time filters
If the user is on Milvus v2.5 docs, filter
version == "v2.5".If the UI is English-only, filter
lang == "en".For “how-to” questions, prefer or filter
doc_type == "howto".
Post-processing
Deduplicate chunks by
doc_idto avoid returning 10 chunks from the same page.Apply a minimum similarity threshold to avoid irrelevant retrieval.
Ask GPT 5.3 Codex to generate retrieval code that exposes these as parameters (not hard-coded), and to include tests that confirm filters actually reduce cross-version contamination. In practice, “metadata filters” are how you keep your system from drifting as docs evolve.
Once retrieval is filtered correctly, enforce a grounded generation contract: “Use only the retrieved Context; cite chunk IDs.” This makes behavior auditable and debuggable. A strong architecture is to encapsulate retrieval as a tool: search_docs(query, top_k, filters) that queries Milvus or Zilliz Cloud, then returns chunk IDs, text, and URLs. GPT 5.3 Codex (or any generator model) should never bypass this tool for doc-based answers. That’s the most efficient and reliable way to query Milvus with filters: filters ensure correct sources, and the tool boundary ensures the model doesn’t guess.