The selection matrix
License first (it's binary), then language quality, then task fit, then VRAM. Every model below is MIT or Apache 2.0 — deployable commercially without legal review cycles:
| Model | Size | License | Pick it for |
|---|---|---|---|
| Qwen 2.5 | 0.5–7B | Apache 2.0 | German-language assistants, all-round |
| Mistral 7B / Small 3 | 7B / 24B | Apache 2.0 | Fine-tune base; Small 3 when 7B reasoning tops out |
| Phi-4 Mini | 3.8B | MIT | Logic & extraction on tight VRAM |
| DeepSeek R1 distill | 1.5–14B | MIT | Step-by-step analysis, code review |
| Qwen 2.5 Coder | 1.5–7B | Apache 2.0 | Code assistance, SQL generation |
| IBM Granite 3 | 2–8B | Apache 2.0 | Enterprise RAG; clean provenance story |
| OLMo 2 | 7–13B | Apache 2.0 | Auditable: fully open data + training code |
| Mixtral 8x7B | 47B (13B active) | Apache 2.0 | MoE: near-large quality at mid-size speed |
| Whisper Large v3 | 1.5B | MIT | Speech-to-text, excellent German |
| BGE-M3 / nomic-embed | ~0.5B / 137M | MIT / Apache | Embeddings for private RAG |
| SmolLM2 | 135M–1.7B | Apache 2.0 | Routing/classification, CPU-only hosts |
How to read benchmarks without being fooled
Public leaderboards measure English trivia and contest math; your workload is German invoices. Treat MMLU as a coarse filter, then run a 20-prompt bake-off on your real documents — it takes an afternoon with Ollama and tells you more than every leaderboard combined. (Our eval guide shows the harness.)
Three rules that survive contact with reality
1. License purity beats 2 benchmark points. "Community licenses" with usage clauses cost legal review on every new use case; Apache/MIT cost nothing. 2. The smallest model that passes your eval wins — it's faster, cheaper, and leaves VRAM headroom for the next use case on the same box. 3. Fine-tuned small beats generic large on narrow tasks, which is most business tasks. The selection above is exactly the catalogue we deploy from — and the bake-off is the first deliverable of every engagement.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment