Vector Databases On-Prem: pgvector vs. Qdrant for Self-Hosted RAG

Start with the database you already run

If your stack has PostgreSQL, pgvector turns it into a perfectly good vector store for up to a few million chunks. One extension, one column, SQL you already know, backups you already do:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
  id        bigserial PRIMARY KEY,
  doc_id    text NOT NULL,
  heading   text,
  body      text NOT NULL,
  embedding vector(768)         -- nomic-embed dimension
);

-- HNSW index: the two parameters that matter
CREATE INDEX ON chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- query: top 5 nearest neighbours + metadata filter
SELECT id, heading, body
FROM chunks
WHERE doc_id LIKE 'contracts/%'
ORDER BY embedding <=> $1
LIMIT 5;

When Qdrant earns its place

Qdrant (self-hosted, single binary or Docker) wins when you need: tens of millions of vectors, heavy metadata filtering during the ANN search rather than after, quantized in-memory indexes for speed, or multi-tenant collections. Its payload filtering is genuinely better than post-filtering in SQL once filters get selective.

The HNSW knobs, honestly

m (graph connectivity, 16 is fine) and ef_search (candidates explored per query) trade recall against latency. Measure recall against exact brute-force search on a 1,000-query sample; raise ef_search until recall@5 exceeds 0.95, then stop. Most teams over-tune this and under-tune chunking.

Our default

For SME deployments: pgvector, because it's one fewer service to operate inside the customer's VPN and the ops team already knows Postgres. We move to Qdrant past ~5M chunks or when filter-heavy multi-department search appears. Both run fully on-prem — the vectors are your documents in disguise, and they deserve the same protection.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment