The selection matrix

License first (it's binary), then language quality, then task fit, then VRAM. Every model below is MIT or Apache 2.0 — deployable commercially without legal review cycles:

ModelSizeLicensePick it for
Qwen 2.50.5–7BApache 2.0German-language assistants, all-round
Mistral 7B / Small 37B / 24BApache 2.0Fine-tune base; Small 3 when 7B reasoning tops out
Phi-4 Mini3.8BMITLogic & extraction on tight VRAM
DeepSeek R1 distill1.5–14BMITStep-by-step analysis, code review
Qwen 2.5 Coder1.5–7BApache 2.0Code assistance, SQL generation
IBM Granite 32–8BApache 2.0Enterprise RAG; clean provenance story
OLMo 27–13BApache 2.0Auditable: fully open data + training code
Mixtral 8x7B47B (13B active)Apache 2.0MoE: near-large quality at mid-size speed
Whisper Large v31.5BMITSpeech-to-text, excellent German
BGE-M3 / nomic-embed~0.5B / 137MMIT / ApacheEmbeddings for private RAG
SmolLM2135M–1.7BApache 2.0Routing/classification, CPU-only hosts

How to read benchmarks without being fooled

Public leaderboards measure English trivia and contest math; your workload is German invoices. Treat MMLU as a coarse filter, then run a 20-prompt bake-off on your real documents — it takes an afternoon with Ollama and tells you more than every leaderboard combined. (Our eval guide shows the harness.)

Three rules that survive contact with reality

1. License purity beats 2 benchmark points. "Community licenses" with usage clauses cost legal review on every new use case; Apache/MIT cost nothing. 2. The smallest model that passes your eval wins — it's faster, cheaper, and leaves VRAM headroom for the next use case on the same box. 3. Fine-tuned small beats generic large on narrow tasks, which is most business tasks. The selection above is exactly the catalogue we deploy from — and the bake-off is the first deliverable of every engagement.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment