GLM-5 can cite retrieved chunks reliably if you define what “cite” means and enforce it as a formatting contract. Models don’t inherently know your citation format; they follow the rules you give them. So if you want citations like “(Source: doc URL)” or “Chunk IDs: 123, 456,” you should provide those fields in the context and require the model to include them in a specific section of the output. In practice, reliability comes from two levers: (1) the prompt template (“You must cite the chunk IDs you used”), and (2) programmatic validation (reject answers that omit citations or cite IDs not present in the retrieved context). GLM-5 supports structured workflows (including tool calling), so you can build a system where retrieval returns chunks with stable IDs and URLs, then GLM-5 must reference those exact IDs in its final response. Relevant GLM-5 docs: GLM-5 overview and Function Calling.
A concrete pattern that works well for Milvus-backed RAG is to wrap each retrieved chunk like this:
[ChunkID: milvus:doc_42#chunk_7]
URL: https://example.com/docs/...
Version: v2.5
Text: ...
Then instruct GLM-5:
“Every factual claim must be supported by at least one ChunkID.”
“At the end, output a Sources list of ChunkIDs and URLs you used.”
“Do not cite anything not present in Context.”
After generation, validate in code:
Parse the output and extract cited ChunkIDs
Check each cited ChunkID exists in the retrieved set
If missing/invalid, re-prompt with the validation error
This is important because the model may sometimes “helpfully” invent a citation label if you don’t enforce the contract. Validation turns that into a fixable behavior instead of a silent failure. When you build this on Milvus or Zilliz Cloud, you already have stable primary keys for chunks—so citations can map directly to stored entities. That makes it possible to show “View source” UI links and to debug incorrect answers by inspecting the exact chunks used.
If you want citations to be both reliable and user-friendly, design them for your UI. For example, instead of raw internal IDs, you can cite a short reference like [S1], [S2], then map those to (title, URL, section) in your frontend. The model only outputs [S1] style markers, and your app renders the full citation block based on the retrieved chunks. This removes formatting burden from GLM-5 and increases consistency across answers. It also reduces prompt token usage: you don’t need the model to repeat long URLs. The best part is you can keep your knowledge base current by re-indexing in Milvus / Zilliz Cloud without changing citation logic—chunk IDs remain stable if you version your documents carefully (e.g., doc_id + version + chunk_index). That’s typically the most robust “reliable citations” design.