🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

milvus-logo
LFAI
Home
  • User Guide

Standard

The standard analyzer is the default analyzer in Milvus, which is automatically applied to text fields if no analyzer is specified. It uses grammar-based tokenization, making it effective for most languages.

Definition

The standard analyzer consists of:

  • Tokenizer: Uses the standard tokenizer to split text into discrete word units based on grammar rules. For more information, refer to Standard.

  • Filter: Uses the lowercase filter to convert all tokens to lowercase, enabling case-insensitive searches. For more information, refer to lowercase filter.

The functionality of the standard analyzer is equivalent to the following custom analyzer configuration:

analyzer_params = {
    "tokenizer": "standard",
    "filter": ["lowercase"]
}

Configuration

To apply the standard analyzer to a field, simply set type to standard in analyzer_params, and include optional parameters as needed.

analyzer_params = {
    "type": "standard", # Specifies the standard analyzer type
}

The standard analyzer accepts the following optional parameters:

Parameter

Description

stop_words

An array containing a list of stop words, which will be removed from tokenization. Defaults to _english_, a built-in set of common English stop words. The details of _english_ can be found here.

Example configuration of custom stop words:

analyzer_params = {
    "type": "standard", # Specifies the standard analyzer type
    "stop_words", ["of"] # Optional: List of words to exclude from tokenization
}

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For more information, refer to Example use.

Example output

Here’s how the standard analyzer processes text.

Original text:

"The Milvus vector database is built for scale!"

Expected output:

["the", "milvus", "vector", "database", "is", "built", "for", "scale"]

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?