Start with the database you already run
If your stack has PostgreSQL, pgvector turns it into a perfectly good vector store for up to a few million chunks. One extension, one column, SQL you already know, backups you already do:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE chunks (
id bigserial PRIMARY KEY,
doc_id text NOT NULL,
heading text,
body text NOT NULL,
embedding vector(768) -- nomic-embed dimension
);
-- HNSW index: the two parameters that matter
CREATE INDEX ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
-- query: top 5 nearest neighbours + metadata filter
SELECT id, heading, body
FROM chunks
WHERE doc_id LIKE 'contracts/%'
ORDER BY embedding <=> $1
LIMIT 5;
When Qdrant earns its place
Qdrant (self-hosted, single binary or Docker) wins when you need: tens of millions of vectors, heavy metadata filtering during the ANN search rather than after, quantized in-memory indexes for speed, or multi-tenant collections. Its payload filtering is genuinely better than post-filtering in SQL once filters get selective.
The HNSW knobs, honestly
m (graph connectivity, 16 is fine) and ef_search (candidates explored per query) trade recall against latency. Measure recall against exact brute-force search on a 1,000-query sample; raise ef_search until recall@5 exceeds 0.95, then stop. Most teams over-tune this and under-tune chunking.
Our default
For SME deployments: pgvector, because it's one fewer service to operate inside the customer's VPN and the ops team already knows Postgres. We move to Qdrant past ~5M chunks or when filter-heavy multi-department search appears. Both run fully on-prem — the vectors are your documents in disguise, and they deserve the same protection.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment