Evaluating Fine-Tuned Models: Build an Eval Suite Before You Trust the Demo
Golden sets, LLM-as-judge done carefully, regression gates — the evaluation harness we run before any model goes live.
Read the guide →Engineering blog
How small models, RAG pipelines, humanoid robots and world models actually work — written by the team that deploys them behind firewalls, with code you can run on your own hardware.
Golden sets, LLM-as-judge done carefully, regression gates — the evaluation harness we run before any model goes live.
Read the guide →Grammar-constrained decoding, JSON schema enforcement and tool calling with small models — no cloud required.
Read the guide →Microsoft.Extensions.AI against an Ollama endpoint: dependency injection, streaming responses and structured output in a typical line-of-business app.
Read the guide →Why Go is the right tool between your users and your GPU: a gateway with token streaming, fair queueing and circuit breaking.
Read the guide →Qwen, Mistral, Phi, Llama, Granite, OLMo — a practical selection matrix by task, language, license and VRAM budget.
Read the guide →Transfer impact assessments disappear when data never transfers. The network, logging and audit patterns we deploy for German companies.
Read the guide →A real break-even calculation: GPU hardware, electricity and maintenance against per-token bills — with the spreadsheet logic shown.
Read the guide →OpenVLA, π0 and friends — the architecture that maps camera pixels and instructions directly to joint trajectories.
Read the guide →From NVIDIA's definition to Dreamer and Cosmos: how world models predict physics, why they matter for industry, and our packaging-line use case.
Read the guide →Why robots learn to walk in simulation first, how domain randomization closes the reality gap, and a minimal PPO training loop.
Read the guide →The three-tier control stack — 1 kHz reflexes, 100 Hz skills, slow deliberation — and where LLMs and VLAs actually plug in.
Read the guide →Zero-moment point, center of mass dynamics and series-elastic actuators — the mechanics every humanoid has to solve 1,000 times per second.
Read the guide →Explore a humanoid component by component — actuators, IMU, vision stack, battery, compute — with the physics and cognition behind each. Interactive.
Read the guide →Prebuilt invoice models, custom extraction training, and a C# service that turns scanned PDFs into ERP-ready records.
Read the guide →Sentiment, key phrases, PII redaction and custom classification with Azure AI Language — plus when a local model beats the API.
Read the guide →Speech-to-text with German diarization, custom vocabulary for industry jargon, and text-to-speech that callers accept.
Read the guide →Image Analysis 4.0, Florence-based features and a C# pipeline that reads shop-floor photos into structured quality data.
Read the guide →A sober architect's guide to Azure AI Foundry and the Agent Service — tool calling, grounding, and the hybrid pattern with on-prem models.
Read the guide →A streaming chat backend in Node.js: SSE, system prompts from your knowledge base, and zero external API calls.
Read the guide →GGUF, Q4_K_M, perplexity deltas and the honest accuracy trade-offs of running compressed models on modest hardware.
Read the guide →What an embedding vector really encodes, why cosine similarity works, and a complete on-prem semantic search index in 60 lines.
Read the guide →Systemd hardening, model preloading, concurrency limits, and an Ollama reverse proxy that your IT department will sign off on.
Read the guide →When Postgres with pgvector is enough, when Qdrant earns its place, and the HNSW parameters that actually matter.
Read the guide →Chunking, embedding, hybrid retrieval, re-ranking and grounding — the full anatomy of a production RAG system that never leaves your network.
Read the guide →LoRA and QLoRA explained without hand-waving: what the adapters change, how much data you need, and a complete PEFT training script.
Read the guide →A working engineer's tour of transformer internals — tokenization, self-attention, KV cache — and why 3–7B models now solve most business tasks.
Read the guide →