Synonym

The synonym filter rewrites tokens according to a synonym dictionary, so that related terms match during search. It supports two modes of operation and two ways of supplying the dictionary:

Operation modes — expand mode preserves the original token and emits additional synonyms alongside it; normalization mode (expand: false) rewrites tokens to a canonical form.
Dictionary sources — small dictionaries can be inlined into the filter configuration via the synonyms array; large dictionaries should be stored as a file resource and referenced via synonyms_file.

Dictionary format

A synonym dictionary is a plain-text document (or inline array) in which each line defines one rule. Two rule forms are supported.

Mapping rule

fast, quick => speedy

The tokens on the left (fast, quick) rewrite to the tokens on the right (speedy). Multiple targets are allowed:

small, little => tiny, compact

With expand: true, the original tokens are kept alongside the targets:

Input fast with expand: true → fast, speedy
Input fast with expand: false → speedy

Equivalence group

happy, joyful, cheerful

All listed tokens are considered equivalent:

With expand: true, any occurrence of any token in the group emits every token in the group. Input happy → happy, joyful, cheerful.
With expand: false, every occurrence is rewritten to the first token in the group. Input joyful → happy; input happy is already the first token and is unchanged.

Configuration

The synonym filter is a custom filter. Specify "type": "synonym" along with at least one of synonyms (inline) or synonyms_file (external), plus an expand flag.

analyzer_params = {
    "tokenizer": "standard",
    "filter": [
        {
            "type": "synonym",
            "synonyms": [                       # inline rules (optional)
                "fast, quick => speedy",
                "happy, joyful, cheerful",
            ],
            "synonyms_file": {                  # external rules (optional)
                "type": "remote",
                "resource_name": "en_synonyms",
                "file_name": "synonyms.txt",
            },
            "expand": True,
        }
    ],
}

The synonym filter accepts the following parameters.

Parameter	Description	Default
`synonyms`	An inline array of rule strings. Each string uses the dictionary format described above. Suitable for small dictionaries (up to a few dozen rules).	—
`synonyms_file`	A reference to a file resource that stores synonym rules, one per line. Use for larger dictionaries. See External dictionary file below.	—
`expand`	A boolean flag that controls how rules apply. true preserves the original token and emits synonyms alongside it; false rewrites tokens to their canonical form (the right-hand side of a mapping, or the first token of an equivalence group).	false

You can specify synonyms, synonyms_file, or both. When both are present, the filter merges the two sources. The filter operates on tokens produced by the tokenizer; it must therefore be combined with a tokenizer such as the standard tokenizer.

External dictionary file

For production-sized dictionaries, register the file as a remote file resource and reference it from synonyms_file.

from pymilvus import MilvusClient

client = MilvusClient(uri="http://localhost:19530")

# Register the file once, then reference it from any analyzer that needs it.
client.add_file_resource(
    name="en_synonyms",
    path="file/synonyms.txt",     # full S3 object key, including rootPath
)

analyzer_params = {
    "tokenizer": "standard",
    "filter": [{
        "type": "synonym",
        "synonyms_file": {
            "type": "remote",
            "resource_name": "en_synonyms",
            "file_name": "synonyms.txt",
        },
        "expand": True,
    }],
}

See Manage File Resources for the full workflow (upload, register, list, remove) and for the alternative "type": "local" form.

Examples

Before applying the analyzer to a collection schema, verify its behavior with run_analyzer. The following examples use the inline synonyms array for brevity; replace with synonyms_file for larger dictionaries.

`expand: true` — keep the original, add synonyms

from pymilvus import MilvusClient

client = MilvusClient(uri="http://localhost:19530")

analyzer_params = {
    "tokenizer": "standard",
    "filter": [{
        "type": "synonym",
        "synonyms": [
            "fast, quick => speedy",
            "happy, joyful, cheerful",
        ],
        "expand": True,
    }],
}

print(client.run_analyzer(["a fast car"], analyzer_params))
# → [['a', 'fast', 'speedy', 'car']]

print(client.run_analyzer(["i am happy today"], analyzer_params))
# → [['i', 'am', 'happy', 'joyful', 'cheerful', 'today']]

Both fast and happy are preserved; their synonyms are emitted alongside.

`expand: false` — rewrite to canonical form

analyzer_params_norm = {
    "tokenizer": "standard",
    "filter": [{
        "type": "synonym",
        "synonyms": [
            "fast, quick => speedy",
            "happy, joyful, cheerful",
        ],
        "expand": False,
    }],
}

print(client.run_analyzer(["a fast car"], analyzer_params_norm))
# → [['a', 'speedy', 'car']]

print(client.run_analyzer(["i am happy today"], analyzer_params_norm))
# → [['i', 'am', 'happy', 'today']]

The mapping rule rewrites fast to speedy. The equivalence group leaves happy unchanged because it is the first token of the group; an input containing joyful or cheerful would have been rewritten to happy.

Synonym
Dictionary format
Mapping rule
Equivalence group
Configuration
External dictionary file
Examples
expand: true — keep the original, add synonyms
expand: false — rewrite to canonical form

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?