
Greetings! I am doing some benchmarking on MariaDB and got to the deep-image-96-angular dataset from https://github.com/erikbern/ann-benchmarks. This dataset has 9.99 million vectors of dimension 96. I am running the vector index create with M=24,ef_construction = 200 (similar parameters as pgvector in ANN-Benchmark). I wanted to check and confirm a couple of observations based on the progress of the index create at present - MariaDB [(none)]> show processlist; +----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----+--------+-----------+------+---------+------+-------------------+----------------------------------------------------------+----------+ | 4 | ubuntu | localhost | ann | Query | 4092 | copy to tmp table | ALTER TABLE t1 ADD VECTOR INDEX (v) M=24 DISTANCE=cosine | 9.636 | MariaDB [(none)]> select trx_rows_modified from information_schema.innodb_trx; +-------------------+ | trx_rows_modified | +-------------------+ | 125215100 | +-------------------+ 1 row in set (0.004 sec) -rw-rw---- 1 ubuntu ubuntu 40420507648 Feb 19 06:20 undo002 Can you please confirm that the undo log reaching 40GB+ is expected for a progress of 9%? Did I miss something? I have configured the hnsw cache size and buffer pool to 16GB each. Should I increase them further? I want to benchmark for "hnsw index fits in memory" use-case. MariaDB [(none)]> show variables like '%hnsw%'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | mhnsw_default_distance | euclidean | | mhnsw_default_m | 24 | | mhnsw_ef_search | 200 | | mhnsw_max_cache_size | 17179869184 | +------------------------+-------------+ 4 rows in set (0.001 sec) Just to let know, I am working on the vector plugin for MySQL (MyVector) Thanks!