What an embedding encodes
An embedding model maps text to a point in high-dimensional space (768 or 1024 numbers) where distance means meaning. "Kündigungsfrist" and "notice period" land close together; "delivery delay" lands far away. The model learned this geometry from hundreds of millions of text pairs — you inherit it for free.
Why cosine similarity works
Embedding vectors are normalized to length 1, so the cosine of the angle between them is just the dot product — one multiply-accumulate per dimension. Similar meaning → small angle → cosine near 1. The entire "semantic" part of semantic search is this single number.
A complete index in 60 lines
import numpy as np, requests, json, glob
def embed(texts: list[str]) -> np.ndarray:
r = requests.post("http://127.0.0.1:11434/api/embed",
json={"model": "nomic-embed-text", "input": texts})
v = np.array(r.json()["embeddings"], dtype=np.float32)
return v / np.linalg.norm(v, axis=1, keepdims=True)
# --- build ---
chunks, meta = [], []
for path in glob.glob("docs/**/*.txt", recursive=True):
text = open(path, encoding="utf-8").read()
for i in range(0, len(text), 1500): # naive chunking
chunks.append(text[i:i+1800])
meta.append({"file": path, "offset": i})
index = np.vstack([embed(chunks[i:i+64])
for i in range(0, len(chunks), 64)])
np.save("index.npy", index)
json.dump(meta, open("meta.json", "w"))
# --- search ---
def search(query: str, k: int = 5):
q = embed([query])[0]
scores = index @ q # cosine, all docs at once
top = np.argsort(-scores)[:k]
return [(float(scores[i]), meta[i], chunks[i][:200]) for i in top]
for s, m, preview in search("Wie lange ist die Gewährleistung?"):
print(f"{s:.3f} {m['file']} {preview!r}")
When NumPy is enough — and when it isn't
Brute-force dot products handle ~500k chunks in tens of milliseconds on a CPU. Past that, or when you need filters, persistence and concurrent writers, graduate to pgvector or Qdrant. The embedding model stays the same — and it stays on your server, because every query your employees type is itself confidential data.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment