Blog — On-Premises AI, RAG, Humanoid Robotics & Azure Guides

Fine-Tuning 10 Jun 2026 10 min Python

Evaluating Fine-Tuned Models: Build an Eval Suite Before You Trust the Demo

Golden sets, LLM-as-judge done carefully, regression gates — the evaluation harness we run before any model goes live.

Read the guide →

Engineering 08 Jun 2026 9 min Python

Structured Outputs and Function Calling With Local Models: JSON You Can Trust

Grammar-constrained decoding, JSON schema enforcement and tool calling with small models — no cloud required.

Read the guide →

Engineering 07 Jun 2026 9 min C#

Local LLMs in .NET: Integrating On-Prem Models Into Enterprise C# Applications

Microsoft.Extensions.AI against an Ollama endpoint: dependency injection, streaming responses and structured output in a typical line-of-business app.

Read the guide →

Engineering 05 Jun 2026 11 min Go

A High-Performance Inference Gateway in Go: Routing, Queueing, Backpressure

Why Go is the right tool between your users and your GPU: a gateway with token streaming, fair queueing and circuit breaking.

Read the guide →

Strategy 03 Jun 2026 10 min

Choosing the Right Small Model in 2026: A Decision Guide With Benchmarks

Qwen, Mistral, Phi, Llama, Granite, OLMo — a practical selection matrix by task, language, license and VRAM budget.

Read the guide →

Strategy 01 Jun 2026 9 min

GDPR-Compliant AI by Architecture: Designing Systems With No Data Exit

Transfer impact assessments disappear when data never transfers. The network, logging and audit patterns we deploy for German companies.

Read the guide →

Strategy 25 May 2026 8 min

On-Prem AI vs. Cloud APIs: The Honest Cost Math for an SME

A real break-even calculation: GPU hardware, electricity and maintenance against per-token bills — with the spreadsheet logic shown.

Read the guide →

Robotics 18 May 2026 8 min

VLA Models: How Vision-Language-Action Networks Turn "Pick That Up" Into Motion

OpenVLA, π0 and friends — the architecture that maps camera pixels and instructions directly to joint trajectories.

Read the guide →

Foundations 11 May 2026 10 min

World Models Explained: Neural Networks That Learn How Reality Behaves

From NVIDIA's definition to Dreamer and Cosmos: how world models predict physics, why they matter for industry, and our packaging-line use case.

Read the guide →

Robotics 04 May 2026 11 min Python

Training Robots: Reinforcement Learning, Domain Randomization and Sim-to-Real

Why robots learn to walk in simulation first, how domain randomization closes the reality gap, and a minimal PPO training loop.

Read the guide →

Robotics 27 Apr 2026 9 min

Cognitive Architectures for Robots: From Reflex Loops to Language-Driven Planning

The three-tier control stack — 1 kHz reflexes, 100 Hz skills, slow deliberation — and where LLMs and VLAs actually plug in.

Read the guide →

Robotics 20 Apr 2026 10 min Math

The Physics of Walking Machines: ZMP, Torque Density and Why Balance Is Hard

Zero-moment point, center of mass dynamics and series-elastic actuators — the mechanics every humanoid has to solve 1,000 times per second.

Read the guide →

Robotics 13 Apr 2026 12 min Interactive

Humanoid Robots in 2026: An Interactive Tour of What's Inside

Explore a humanoid component by component — actuators, IMU, vision stack, battery, compute — with the physics and cognition behind each. Interactive.

Read the guide →

Azure 06 Apr 2026 10 min C#

Information Extraction with Azure Document Intelligence: Invoices to JSON

Prebuilt invoice models, custom extraction training, and a C# service that turns scanned PDFs into ERP-ready records.

Read the guide →

Azure 30 Mar 2026 9 min Python

Azure Language: NLP and Text Analytics for Documents, Mail and Tickets

Sentiment, key phrases, PII redaction and custom classification with Azure AI Language — plus when a local model beats the API.

Read the guide →

Azure 23 Mar 2026 8 min C#

Azure Speech: Real-Time Transcription and Neural Voices for DACH Businesses

Speech-to-text with German diarization, custom vocabulary for industry jargon, and text-to-speech that callers accept.

Read the guide →

Azure 16 Mar 2026 9 min C#

Azure Computer Vision in Practice: OCR, Object Detection and Custom Models

Image Analysis 4.0, Florence-based features and a C# pipeline that reads shop-floor photos into structured quality data.

Read the guide →

Azure 09 Mar 2026 11 min C#

Azure for Generative AI and Agents: Foundry, Agent Service and When to Use Them

A sober architect's guide to Azure AI Foundry and the Agent Service — tool calling, grounding, and the hybrid pattern with on-prem models.

Read the guide →

Engineering 02 Mar 2026 10 min Node.js

Build an On-Prem Support Chatbot with Node.js, Ollama and Your Ticket History

A streaming chat backend in Node.js: SSE, system prompts from your knowledge base, and zero external API calls.

Read the guide →

Engineering 23 Feb 2026 7 min Bash

Quantization Demystified: How a 7B Model Fits in 5 GB Without Getting Dumb

GGUF, Q4_K_M, perplexity deltas and the honest accuracy trade-offs of running compressed models on modest hardware.

Read the guide →

RAG 16 Feb 2026 8 min Python

Embeddings Explained: Building Semantic Search Over 100,000 Documents

What an embedding vector really encodes, why cosine similarity works, and a complete on-prem semantic search index in 60 lines.

Read the guide →

Engineering 09 Feb 2026 9 min Bash · Node.js

Running Ollama in Production: From Laptop Toy to Department Workhorse

Systemd hardening, model preloading, concurrency limits, and an Ollama reverse proxy that your IT department will sign off on.

Read the guide →

RAG 02 Feb 2026 10 min SQL · Python

Vector Databases On-Prem: pgvector vs. Qdrant for Self-Hosted RAG

When Postgres with pgvector is enough, when Qdrant earns its place, and the HNSW parameters that actually matter.

Read the guide →

RAG 26 Jan 2026 12 min Python

RAG From First Principles: Architecture of a Private Retrieval Pipeline

Chunking, embedding, hybrid retrieval, re-ranking and grounding — the full anatomy of a production RAG system that never leaves your network.

Read the guide →

Fine-Tuning 19 Jan 2026 11 min Python

Fine-Tuning Small Models on Company Data with LoRA: A Practical Guide

LoRA and QLoRA explained without hand-waving: what the adapters change, how much data you need, and a complete PEFT training script.

Read the guide →

Foundations 12 Jan 2026 9 min Python

How LLMs Actually Work: Tokens, Attention and Why Small Models Got Good

A working engineer's tour of transformer internals — tokenization, self-attention, KV cache — and why 3–7B models now solve most business tasks.

Read the guide →

Guides from the machine room.