The Azure generative AI map in one paragraph
Azure AI Foundry is the umbrella: model catalogue (GPT-4o-class, Phi, Mistral, Llama), evaluation tooling and deployment endpoints. The Foundry Agent Service adds managed agents — threads, tool calling, code interpreter, file search — so you orchestrate less plumbing yourself. Data residency can be pinned to EU regions (e.g. Germany West Central), which matters for our DACH customers.
An agent with tool calling in C#
using Azure.AI.Agents.Persistent;
using Azure.Identity;
var client = new PersistentAgentsClient(
new Uri(Environment.GetEnvironmentVariable("FOUNDRY_ENDPOINT")!),
new DefaultAzureCredential());
var orderTool = new FunctionToolDefinition(
name: "get_order_status",
description: "Returns status for an order number",
parameters: BinaryData.FromObjectAsJson(new {
type = "object",
properties = new { order = new { type = "string" } },
required = new[] { "order" }
}));
PersistentAgent agent = client.Administration.CreateAgent(
model: "gpt-4o-mini",
name: "order-assistant",
instructions: "Answer in German. Use tools for order data. " +
"Never invent order information.",
tools: [orderTool]);
var thread = client.Threads.CreateThread();
client.Messages.CreateMessage(thread.Id, MessageRole.User,
"Wo ist Bestellung 4711?");
// run loop: when the run requests get_order_status, call your
// ERP, submit the tool output, and the agent finishes the answer.
The hybrid pattern we actually deploy
Cloud agents are excellent at orchestration; they are the wrong place for raw confidential documents. Our standard architecture: the Azure agent plans and talks, but every tool that touches sensitive data calls back into an on-prem endpoint — a fine-tuned local model plus RAG inside the customer's VPN returns only the minimal, already-filtered answer. The cloud sees the question and a sanitized result, never the document base.
When to skip the cloud entirely
If the workload is one department, one data domain, and steady volume — a local 7B with function calling does the same job with zero egress and zero per-token cost. Use Foundry when you need frontier-model reasoning, bursty scale or the managed agent runtime; use local when the data is the point. We build both, and the decision is a worksheet, not a religion.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment