Structured Outputs and Function Calling With Local Models: JSON You Can Trust
Grammar-constrained decoding, JSON schema enforcement and tool calling with small models — no cloud required.
Read the guide →Engineering blog
How small models, RAG pipelines, humanoid robots and world models actually work — written by the team that deploys them behind firewalls, with code you can run on your own hardware.
Grammar-constrained decoding, JSON schema enforcement and tool calling with small models — no cloud required.
Read the guide →Microsoft.Extensions.AI against an Ollama endpoint: dependency injection, streaming responses and structured output in a typical line-of-business app.
Read the guide →Why Go is the right tool between your users and your GPU: a gateway with token streaming, fair queueing and circuit breaking.
Read the guide →A streaming chat backend in Node.js: SSE, system prompts from your knowledge base, and zero external API calls.
Read the guide →GGUF, Q4_K_M, perplexity deltas and the honest accuracy trade-offs of running compressed models on modest hardware.
Read the guide →Systemd hardening, model preloading, concurrency limits, and an Ollama reverse proxy that your IT department will sign off on.
Read the guide →