Manage Schema
This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.
Field schema
A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and managing collections.
Milvus supports only one primary key field in a collection.
Field schema properties
Properties | Description | Note |
---|---|---|
name |
Name of the field in the collection to create | Data type: String. Mandatory |
dtype |
Data type of the field | Mandatory |
description |
Description of the field | Data type: String. Optional |
is_primary |
Whether to set the field as the primary key field or not | Data type: Boolean (true or false ).Mandatory for the primary key field |
auto_id (Mandatory for primary key field) |
Switch to enable or disable automatic ID (primary key) allocation. | True or False |
max_length (Mandatory for VARCHAR field) |
Maximum byte length for strings allowed to be inserted. Note that multibyte characters (e.g., Unicode characters) may occupy more than one byte each, so ensure the byte length of inserted strings does not exceed the specified limit. | [1, 65,535] |
dim |
Dimension of the vector | Data type: Integer ∈[1, 32768]. Mandatory for a dense vector field. Omit for a sparse vector field. |
is_partition_key |
Whether this field is a partition-key field. | Data type: Boolean (true or false ). |
Create a field schema
To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.
Create a regular field schema:
from pymilvus import DataType, FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
# The following creates a field and use it as the partition key
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
Create a field schema with default field values:
from pymilvus import DataType, FieldSchema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
# configure default value `25` for field `age`
FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
]
Supported data types
DataType
defines the kind of data a field contains. Different fields support different data types.
Primary key field supports:
- INT64: numpy.int64
- VARCHAR: VARCHAR
Scalar field supports:
- BOOL: Boolean (
true
orfalse
) - INT8: numpy.int8
- INT16: numpy.int16
- INT32: numpy.int32
- INT64: numpy.int64
- FLOAT: numpy.float32
- DOUBLE: numpy.double
- VARCHAR: VARCHAR
- JSON: JSON
- Array: Array
JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type.
- BOOL: Boolean (
Vector field supports:
- BINARY_VECTOR: Stores binary data as a sequence of 0s and 1s, used for compact feature representation in image processing and information retrieval.
- FLOAT_VECTOR: Stores 32-bit floating-point numbers, commonly used in scientific computing and machine learning for representing real numbers.
- FLOAT16_VECTOR: Stores 16-bit half-precision floating-point numbers, used in deep learning and GPU computations for memory and bandwidth efficiency.
- BFLOAT16_VECTOR: Stores 16-bit floating-point numbers with reduced precision but the same exponent range as Float32, popular in deep learning for reducing memory and computational requirements without significantly impacting accuracy.
- SPARSE_FLOAT_VECTOR: Stores a list of non-zero elements and their corresponding indices, used for representing sparse vectors. For more information, refer to Sparse Vectors.
Milvus supports multiple vector fields in a collection. For more information, refer to Hybrid Search.
Collection schema
A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and managing collections.
Collection schema properties
Properties | Description | Note |
---|---|---|
field |
Fields in the collection to create | Mandatory |
description |
Description of the collection | Data type: String. Optional |
partition_key_field |
Name of a field that is designed to act as the partition key. | Data type: String. Optional |
enable_dynamic_field |
Whether to enable dynamic schema or not | Data type: Boolean (true or false ).Optional, defaults to False .For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections. |
Create a collection schema
from pymilvus import DataType, FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
# Enable partition key on a field if you need to implement multi-tenancy based on the partition-key field
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
# Set enable_dynamic_field to True if you need to use dynamic fields.
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")
Create a collection with the schema specified:
from pymilvus import Collection, connections
conn = connections.connect(host="127.0.0.1", port=19530)
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
- You can define the shard number with
shards_num
. - You can define the Milvus server on which you wish to create a collection by specifying the alias in
using
. - You can enable the partition key feature on a field by setting
is_partition_key
toTrue
on the field if you need to implement partition-key-based multi-tenancy. - You can enable dynamic schema by setting
enable_dynamic_field
toTrue
in the collection schema if you need to enable dynamic field.
You can also create a collection with Collection.construct_from_dataframe
, which automatically generates a collection schema from DataFrame and creates a collection.
from pymilvus import Collection
import pandas as pd
df = pd.DataFrame({
"id": [i for i in range(nb)],
"age": [random.randint(20, 40) for i in range(nb)],
"embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
"position": "test_pos"
})
collection, ins_res = Collection.construct_from_dataframe(
'my_collection',
df,
primary_field='id',
auto_id=False
)
What’s next
- Learn how to prepare schema when managing collections.
- Read more about dynamic schema.
- Read more about partition-key in Multi-tenancy.