The Azure generative AI map in one paragraph

Azure AI Foundry is the umbrella: model catalogue (GPT-4o-class, Phi, Mistral, Llama), evaluation tooling and deployment endpoints. The Foundry Agent Service adds managed agents — threads, tool calling, code interpreter, file search — so you orchestrate less plumbing yourself. Data residency can be pinned to EU regions (e.g. Germany West Central), which matters for our DACH customers.

An agent with tool calling in C#

using Azure.AI.Agents.Persistent;
using Azure.Identity;

var client = new PersistentAgentsClient(
    new Uri(Environment.GetEnvironmentVariable("FOUNDRY_ENDPOINT")!),
    new DefaultAzureCredential());

var orderTool = new FunctionToolDefinition(
    name: "get_order_status",
    description: "Returns status for an order number",
    parameters: BinaryData.FromObjectAsJson(new {
        type = "object",
        properties = new { order = new { type = "string" } },
        required = new[] { "order" }
    }));

PersistentAgent agent = client.Administration.CreateAgent(
    model: "gpt-4o-mini",
    name: "order-assistant",
    instructions: "Answer in German. Use tools for order data. " +
                  "Never invent order information.",
    tools: [orderTool]);

var thread = client.Threads.CreateThread();
client.Messages.CreateMessage(thread.Id, MessageRole.User,
    "Wo ist Bestellung 4711?");
// run loop: when the run requests get_order_status, call your
// ERP, submit the tool output, and the agent finishes the answer.

The hybrid pattern we actually deploy

Cloud agents are excellent at orchestration; they are the wrong place for raw confidential documents. Our standard architecture: the Azure agent plans and talks, but every tool that touches sensitive data calls back into an on-prem endpoint — a fine-tuned local model plus RAG inside the customer's VPN returns only the minimal, already-filtered answer. The cloud sees the question and a sanitized result, never the document base.

When to skip the cloud entirely

If the workload is one department, one data domain, and steady volume — a local 7B with function calling does the same job with zero egress and zero per-token cost. Use Foundry when you need frontier-model reasoning, bursty scale or the managed agent runtime; use local when the data is the point. We build both, and the decision is a worksheet, not a religion.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment