Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs-site/content/0.25.0/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1030,9 +1030,15 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
the [Collection Alias](./collection-alias.md) feature to do a zero-downtime switch over to the new collection:

1. [Create your collection](#create-a-collection) as usual with a timestamped name. For eg: `movies_jan_1`
Expand Down
18 changes: 17 additions & 1 deletion docs-site/content/0.25.0/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,25 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.

Here's an example:

Expand Down
8 changes: 7 additions & 1 deletion docs-site/content/0.25.1/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1084,9 +1084,15 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
the [Collection Alias](./collection-alias.md) feature to do a zero-downtime switch over to the new collection:

1. [Create your collection](#create-a-collection) as usual with a timestamped name. For eg: `movies_jan_1`
Expand Down
18 changes: 17 additions & 1 deletion docs-site/content/0.25.1/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,25 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.

Here's an example:

Expand Down
8 changes: 7 additions & 1 deletion docs-site/content/0.25.2/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1090,9 +1090,15 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
If you need to do zero-downtime schema changes, you could also re-create the collection fully and use
the [Collection Alias](./collection-alias.md) feature to do a zero-downtime switch over to the new collection:

1. [Create your collection](#create-a-collection) as usual with a timestamped name. For eg: `movies_jan_1`
Expand Down
18 changes: 17 additions & 1 deletion docs-site/content/0.25.2/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,25 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.

Here's an example:

Expand Down
8 changes: 7 additions & 1 deletion docs-site/content/26.0/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1156,9 +1156,15 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully with the updated schema and use
If you need to do zero-downtime schema changes, you could also re-create the collection fully with the updated schema and use
the [Collection Alias](./collection-alias.md) feature to do a zero-downtime switch over to the new collection:

Let's say you have a collection called `movies_jan_1` that you want to change the schema for.
Expand Down
18 changes: 17 additions & 1 deletion docs-site/content/26.0/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,25 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.

Here's an example:

Expand Down
6 changes: 6 additions & 0 deletions docs-site/content/27.0/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1225,6 +1225,12 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully with the updated schema and use
Expand Down
18 changes: 17 additions & 1 deletion docs-site/content/27.0/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,25 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.

Here's an example:

Expand Down
6 changes: 6 additions & 0 deletions docs-site/content/27.1/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1225,6 +1225,12 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Using an alias

If you need to do zero-downtime schema changes, you could also re-create the collection fully with the updated schema and use
Expand Down
16 changes: 16 additions & 0 deletions docs-site/content/27.1/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,22 @@ To simplify the process of embedding generation, Typesense can automatically use

When you do a search query on this automatically-generated vector field, your search query will be vectorized using the same model used for the field, which then allows you to do semantic search or combine keyword and semantic search to do hybrid search.

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
Expand Down
6 changes: 6 additions & 0 deletions docs-site/content/28.0/api/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -1313,6 +1313,12 @@ curl "http://localhost:8108/collections/companies" \
}'
```

:::danger Adding Embedding Fields
If you use this alter operation to add an [auto-embedding field](./vector-search.html#option-b-auto-embedding-generation-within-typesense), Typesense will generate embeddings for **all existing documents** in the collection. This is highly CPU and RAM intensive and **can make your cluster nodes unresponsive**.

For production clusters, we strongly recommend creating a new collection with the embedding field in the schema and indexing documents in controlled batches, rather than altering an existing collection. See the [alias feature](#using-an-alias) for zero-downtime migration, and [GPU Acceleration](./vector-search.html#using-a-gpu-optional) to speed up embedding generation.
:::

### Get Schema Change Status

You can check the status of in-progress schema change operations by using the schema changes endpoint.
Expand Down
16 changes: 16 additions & 0 deletions docs-site/content/28.0/api/vector-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,22 @@ When you do a search query on this automatically-generated vector field, your se
Embeddings are only regenerated when one or more fields specified in the `embed.from` configuration are updated. This helps avoid unnecessary embedding recreation and API calls when other fields in the document are modified.
:::

:::warning Adding Embeddings to Existing Collections
When you add an auto-embedding field to an **existing collection** via [schema alter](./collections.html#update-or-alter-a-collection) (PATCH), Typesense will generate embeddings for **all existing documents at once**. This is computationally intensive and can cause:

- **CPU spikes to 100%**, making nodes unresponsive to API calls
- **High RAM usage**, as the cluster must have enough memory to hold all document embeddings
- **Blocked writes** to the collection until the operation completes

**Recommended approach for production clusters:**

1. Create a **new collection** with the embedding field already in the schema
2. Index documents into the new collection in a controlled manner (e.g., in batches)
3. Use the [Collection Alias](./collections.html#using-an-alias) feature to switch traffic to the new collection with zero downtime

If you must use schema alter, do so during off-peak hours on a cluster with sufficient CPU and RAM headroom. Enabling [GPU Acceleration](#using-a-gpu-optional) significantly speeds up embedding generation.
:::

### Creating an auto-embedding field

To create a field that automatically embeds other string or string array fields, you need to set the `embed` property of the field.
Expand Down
Loading