medium severitypgvector HNSW index

Queries using HNSW index return different/wrong top-k results vs exact sequential scan (e.g. missing closest matches, lower similarity scores). Fewer results with filters. Recall drops below expectations at large scale (#M+ vectors).

Root cause

HNSW provides approximate nearest neighbor search, trading perfect recall for speed via graph-based exploration limited by ef_search (default 40 candidates). Low build params (m, ef_construction) create sparse graphs with poor recall. At scale: dead tuples, post-filtering discards candidates, or memory spills degrade effective recall/QPS. Docs warn: \"you will see different results\" with ANN indexes. [pgvector docs](https://github.com/pgvector/pgvector#hnsw)

pgvectorhnswrecallapproximate-searchann

Citations

https://github.com/pgvector/pgvector/issues/543