Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
bbec2b9
加入ours算法
mindtravel Aug 5, 2025
69f58fc
add script to test origin ver. improved ver. improved multi ver.
mindtravel Aug 12, 2025
940ce9c
add script to test origin ver. improved ver. improved multi ver.
mindtravel Aug 12, 2025
4c0e40e
ready to autodl
mindtravel Aug 15, 2025
405a734
fix docs
mindtravel Aug 15, 2025
477c540
fix scripts
mindtravel Aug 15, 2025
209a976
working good on both lab49 & autodl
mindtravel Aug 15, 2025
3e61f1a
working good on both lab49 & autodl
mindtravel Aug 15, 2025
e99d48e
update scripts
mindtravel Aug 19, 2025
98b0211
支持测试cuvs ivfflat和cuvs ivfpq
mindtravel Aug 20, 2025
1b94945
更新环境配置文档
mindtravel Aug 20, 2025
cc8417b
可以针对数据集大小自适应设置probes,
mindtravel Aug 22, 2025
6499ddd
根据数据库大小自适应设置聚类数
mindtravel Aug 22, 2025
b4f9d06
调整脚本,把编译的过程独立出来
mindtravel Aug 22, 2025
2ef573b
更新ours pgvector测试模块
mindtravel Aug 22, 2025
19bc104
脚本debug
mindtravel Aug 22, 2025
7cd6250
只保留单线程、多线程两个版本的module,其他用config.yml链接到这两个module
mindtravel Aug 24, 2025
7b08503
补全单线程的modules
mindtravel Aug 24, 2025
27c43dc
更新cuvs环境配置
mindtravel Aug 24, 2025
208c2ee
整理脚本
mindtravel Aug 25, 2025
2ac301c
添加直接接收ndarray并行查询的module
mindtravel Sep 2, 2025
070460c
更新gpu测试环境
mindtravel Sep 6, 2025
a18ac55
支持TEXT
antheham Sep 8, 2025
21f599f
支持TEXT
antheham Sep 8, 2025
57c2f9b
支持TEXT
antheham Sep 8, 2025
9ac74c3
Add files via upload
antheham Sep 10, 2025
53e4119
support TEXT
antheham Sep 10, 2025
df8bdea
SUPPORT TEXT
antheham Sep 10, 2025
186faca
支持 TEXT
antheham Sep 10, 2025
959aa4a
支持 TEXT
antheham Sep 10, 2025
fd27e20
.
antheham Sep 11, 2025
20a1bb0
添加直接接收ndarray并行查询的module
antheham Sep 11, 2025
02a209f
支持TEXT 同步项目文档
mindtravel Sep 11, 2025
b1f02fa
更改pgvector可用的测试类型,现有有pgvector_ivfflat_single, pgvector_ivfflat_multi,…
mindtravel Sep 12, 2025
8baf21c
Merge branch 'main' of https://github.com/mindtravel/ann-benchmarks
mindtravel Sep 12, 2025
b776e1b
update项目文档
mindtravel Sep 17, 2025
63b844e
一键安装pgvector_gpu-python支持
mindtravel Dec 7, 2025
881fafb
将数据集软连接到数据盘,添加pgvector_gpu的环境配置脚本
mindtravel Dec 8, 2025
09c36f7
pgvector-gpu环境配置
mindtravel Dec 25, 2025
7f4ab69
忽略符号链接
mindtravel Dec 25, 2025
403e472
忽略符号链接文件
mindtravel Dec 25, 2025
80c0874
忽略软链接文件
mindtravel Dec 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,15 @@ protocol/c/fr-*
install/*.txt
install/*.yaml
install/lib-*/
data/*

data/
data/**
data

raw-data/
raw-data/**
raw-data

*.class

*.log
Expand All @@ -17,3 +25,7 @@ results/*
venv

.idea

annoy/

tags
5 changes: 3 additions & 2 deletions ann_benchmarks/algorithms/annoy/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,6 @@ float:
run_groups:
annoy:
args: [[100, 200, 400]]
query_args: [[100, 200, 400, 1000, 2000, 4000, 10000, 20000, 40000, 100000,
200000, 400000]]
query_args: [[100]]
#, 200, 400, 1000, 2000, 4000, 10000, 20000, 40000, 100000,
# 200000, 400000]]
3 changes: 3 additions & 0 deletions ann_benchmarks/algorithms/annoy/env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
git clone https://github.com/spotify/annoy
cd annoy && python3 setup.py install
python3 -c 'import annoy'
39 changes: 39 additions & 0 deletions ann_benchmarks/algorithms/cuvs_ivfflat/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
FROM ann-benchmarks

# https://github.com/pgvector/pgvector/blob/master/Dockerfile

RUN git clone https://github.com/pgvector/pgvector /tmp/pgvector

RUN DEBIAN_FRONTEND=noninteractive apt-get -y install tzdata
RUN apt-get update && apt-get install -y --no-install-recommends build-essential postgresql-common
RUN /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y
RUN apt-get install -y --no-install-recommends postgresql-16 postgresql-server-dev-16
RUN sh -c 'echo "local all all trust" > /etc/postgresql/16/main/pg_hba.conf'

# Dynamically set OPTFLAGS based on the architecture
RUN ARCH=$(uname -m) && \
if [ "$ARCH" = "aarch64" ]; then \
OPTFLAGS="-march=native -msve-vector-bits=512"; \
elif [ "$ARCH" = "x86_64" ]; then \
OPTFLAGS="-march=native -mprefer-vector-width=512"; \
else \
OPTFLAGS="-march=native"; \
fi && \
cd /tmp/pgvector && \
make clean && \
make OPTFLAGS="$OPTFLAGS" && \
make install

USER postgres
RUN service postgresql start && \
psql -c "CREATE USER ann WITH ENCRYPTED PASSWORD 'ann'" && \
psql -c "CREATE DATABASE ann" && \
psql -c "GRANT ALL PRIVILEGES ON DATABASE ann TO ann" && \
psql -d ann -c "GRANT ALL ON SCHEMA public TO ann" && \
psql -d ann -c "CREATE EXTENSION vector" && \
psql -c "ALTER USER ann SET maintenance_work_mem = '4GB'" && \
psql -c "ALTER USER ann SET max_parallel_maintenance_workers = 0" && \
psql -c "ALTER SYSTEM SET shared_buffers = '4GB'"
USER root

RUN pip install psycopg[binary] pgvector
14 changes: 14 additions & 0 deletions ann_benchmarks/algorithms/cuvs_ivfflat/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
float:
any:
- base_args: ['@metric']
constructor: CuvsIVFFlat
disabled: false
docker_tag: ann-benchmarks-cuvs_ivfflat
module: ann_benchmarks.algorithms.cuvs_ivfflat
name: cuvs_ivfflat
run_groups:
# 固定配置:lists=100, workers=20, gpu=true
lists-100-workers-20-gpu:
arg_groups: [{lists: 100, use_gpu: true, batch_size: 10000}]
args: {}
query_args: [[1, 5, 10, 20, 40]]
177 changes: 177 additions & 0 deletions ann_benchmarks/algorithms/cuvs_ivfflat/module.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
"""
This module supports connecting to a PostgreSQL instance and performing vector
indexing and search using the pgvector extension. The default behavior uses
the "ann" value of PostgreSQL user name, password, and database name, as well
as the default host and port values of the psycopg driver.

If PostgreSQL is managed externally, e.g. in a cloud DBaaS environment, the
environment variable overrides listed below are available for setting PostgreSQL
connection parameters:

ANN_BENCHMARKS_PG_USER
ANN_BENCHMARKS_PG_PASSWORD
ANN_BENCHMARKS_PG_DBNAME
ANN_BENCHMARKS_PG_HOST
ANN_BENCHMARKS_PG_PORT

This module starts the PostgreSQL service automatically using the "service"
command. The environment variable ANN_BENCHMARKS_PG_START_SERVICE could be set
to "false" (or e.g. "0" or "no") in order to disable this behavior.

This module will also attempt to create the pgvector extension inside the
target database, if it has not been already created.

Enhanced with cuVS optimization for GPU-accelerated vector operations.
"""

import os
import subprocess
import sys
import threading
import time
import numpy as np

# 添加 cuvs 相关导入
from cuvs.neighbors import ivf_flat
import cupy

from typing import Dict, Any, Optional, List, Tuple

from ..base.module import BaseANN

# cuVS 距离度量映射
CUVS_METRIC_MAP = {
"angular": "cosine",
"euclidean": "euclidean"
}

class CuvsIVFFlat(BaseANN):
def __init__(self, metric, method_param):
self._metric = metric
self._lists = method_param['lists'] # Number of lists for IVFFlat
self._batch_size = method_param.get('batch_size', 5000) # 批处理大小
self._cur = None
self._conn = None

# cuVS 相关属性
self._cuvs_index = None
self._cuvs_vectors = None
self._cuvs_ids = None
self._vector_dim = None
self._probes = 1 # 默认probes值

def _build_cuvs_index(self, dataset: np.ndarray) -> None:
"""构建 cuVS 索引"""
print("Building cuVS index for GPU acceleration...")
self._vector_dim = dataset.shape[1]
self._lists = int(np.sqrt(dataset.shape[0]))

# 将数据转换为 GPU 数组
self._cuvs_vectors = cupy.asarray(dataset.astype(np.float32))
self._cuvs_ids = cupy.arange(len(dataset))

# 构建 cuVS IVF-Flat 索引
cuvs_metric = CUVS_METRIC_MAP.get(self._metric, "euclidean")
index_params = ivf_flat.IndexParams(n_lists=self._lists, metric=cuvs_metric)
self._cuvs_index = ivf_flat.build(index_params, self._cuvs_vectors)
print(f"cuVS IVF-Flat index built with {self._lists} lists")

def _cuvs_search(self, query_vector: np.ndarray, k: int) -> List[int]:
"""使用 cuVS 执行单个查询"""
if self._cuvs_index is None:
raise RuntimeError("cuVS index not available")

# 将查询向量转换为 GPU 数组
query_gpu = cupy.asarray(np.array([query_vector], dtype=np.float32))

# 执行搜索 - 确保n_probes参数正确设置
search_params = ivf_flat.SearchParams(n_probes=self._probes)
print(f"DEBUG: Searching with n_probes={self._probes}")

distances, indices = ivf_flat.search(search_params, self._cuvs_index, query_gpu, k)

# 转换回 CPU
indices_cpu = cupy.asnumpy(indices)
ids_cpu = cupy.asnumpy(self._cuvs_ids)

# 返回对应的 ID
result_ids = [ids_cpu[idx] for idx in indices_cpu[0]]
return result_ids

def _cuvs_batch_search(self, query_vectors: np.ndarray, k: int) -> List[List[int]]:
"""使用 cuVS 执行批量查询 - 真正的 GPU 批量处理"""
if self._cuvs_index is None:
raise RuntimeError("cuVS index not available")

print(f"Performing cuVS IVF-Flat batch search for {len(query_vectors)} queries...")

# 将查询向量转换为 GPU 数组
queries_gpu = cupy.asarray(query_vectors.astype(np.float32))

# 执行批量搜索 - 确保n_probes参数正确设置
search_params = ivf_flat.SearchParams(n_probes=self._probes)
print(f"DEBUG: Batch searching with n_probes={self._probes}")

distances, indices = ivf_flat.search(search_params, self._cuvs_index, queries_gpu, k)

# 转换回 CPU
indices_cpu = cupy.asnumpy(indices)
ids_cpu = cupy.asnumpy(self._cuvs_ids)

# 返回对应的 ID 列表
results = []
for i in range(len(query_vectors)):
result_ids = [ids_cpu[idx] for idx in indices_cpu[i]]
results.append(result_ids)

print(f"IVF-Flat batch search completed for {len(query_vectors)} queries")
return results

def fit(self, dataset):
if dataset.shape[0] > 1000000:
self._lists = int(np.sqrt(dataset.shape[0]))
else:
self._lists = int(dataset.shape[0]/1000)
# 构建 cuVS IVF-Flat 索引
self._build_cuvs_index(dataset)
print("cuVS IVF-Flat GPU acceleration initialized")

def set_query_arguments(self, probes):
"""设置查询参数 - 确保probes参数被正确设置"""
self._probes = int(probes) # 确保是整数
print(f"Set cuVS IVF-Flat search probes to {self._probes}")

# 验证参数设置
if self._probes < 1:
print(f"WARNING: n_probes={self._probes} is too small, setting to 1")
self._probes = 1
elif self._probes > self._lists:
print(f"WARNING: n_probes={self._probes} is larger than n_lists={self._lists}")

print(f"DEBUG: Current configuration - n_lists={self._lists}, n_probes={self._probes}")

def query(self, v, n):
"""单次查询,使用 cuVS IVF-Flat GPU 加速"""
return self._cuvs_search(v, n)

def batch_query(self, X, n):
"""执行批量查询,使用 cuVS IVF-Flat GPU 加速"""
self._batch_results = self._cuvs_batch_search(X, n)

def get_batch_results(self):
"""获取批量查询的结果"""
return self._batch_results

def get_memory_usage(self):
if self._cur is None:
return 0
self._cur.execute("SELECT pg_relation_size('items_embedding_idx')")
return self._cur.fetchone()[0] / 1024

def __del__(self):
"""清理资源"""
# cuVS 会自动清理 GPU 资源
pass

def __str__(self):
return f"CuvsIVFFlat(lists={self._lists}, probes={self._probes})"
39 changes: 39 additions & 0 deletions ann_benchmarks/algorithms/cuvs_ivfpq/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
FROM ann-benchmarks

# https://github.com/pgvector/pgvector/blob/master/Dockerfile

RUN git clone https://github.com/pgvector/pgvector /tmp/pgvector

RUN DEBIAN_FRONTEND=noninteractive apt-get -y install tzdata
RUN apt-get update && apt-get install -y --no-install-recommends build-essential postgresql-common
RUN /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh -y
RUN apt-get install -y --no-install-recommends postgresql-16 postgresql-server-dev-16
RUN sh -c 'echo "local all all trust" > /etc/postgresql/16/main/pg_hba.conf'

# Dynamically set OPTFLAGS based on the architecture
RUN ARCH=$(uname -m) && \
if [ "$ARCH" = "aarch64" ]; then \
OPTFLAGS="-march=native -msve-vector-bits=512"; \
elif [ "$ARCH" = "x86_64" ]; then \
OPTFLAGS="-march=native -mprefer-vector-width=512"; \
else \
OPTFLAGS="-march=native"; \
fi && \
cd /tmp/pgvector && \
make clean && \
make OPTFLAGS="$OPTFLAGS" && \
make install

USER postgres
RUN service postgresql start && \
psql -c "CREATE USER ann WITH ENCRYPTED PASSWORD 'ann'" && \
psql -c "CREATE DATABASE ann" && \
psql -c "GRANT ALL PRIVILEGES ON DATABASE ann TO ann" && \
psql -d ann -c "GRANT ALL ON SCHEMA public TO ann" && \
psql -d ann -c "CREATE EXTENSION vector" && \
psql -c "ALTER USER ann SET maintenance_work_mem = '4GB'" && \
psql -c "ALTER USER ann SET max_parallel_maintenance_workers = 0" && \
psql -c "ALTER SYSTEM SET shared_buffers = '4GB'"
USER root

RUN pip install psycopg[binary] pgvector
14 changes: 14 additions & 0 deletions ann_benchmarks/algorithms/cuvs_ivfpq/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
float:
any:
- base_args: ['@metric']
constructor: CuvsIVFPQ
disabled: false
docker_tag: ann-benchmarks-cuvs_ivfpq
module: ann_benchmarks.algorithms.cuvs_ivfpq
name: cuvs_ivfpq
run_groups:
# 固定配置:lists=100, workers=20, gpu=true
lists-100-workers-20-gpu:
arg_groups: [{lists: 100, use_gpu: true, batch_size: 5000}]
args: {}
query_args: [[1, 5, 10, 20, 40]]
Loading