The three services that matter

Speech-to-text with real-time streaming and speaker diarization (German quality is genuinely strong), text-to-speech with de-DE neural voices callers don't hang up on, and custom speech for vocabulary the base model fumbles — product names, internal abbreviations, Swiss-German tinted dialects.

Real-time transcription with German jargon

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

var cfg = SpeechConfig.FromSubscription(key, "germanywestcentral");
cfg.SpeechRecognitionLanguage = "de-DE";

// teach it your world without retraining anything
var phrases = PhraseListGrammar.FromRecognizer(
    recognizer: null!); // attached below

using var audio = AudioConfig.FromDefaultMicrophoneInput();
using var rec = new SpeechRecognizer(cfg, audio);

var list = PhraseListGrammar.FromRecognizer(rec);
foreach (var p in new[] { "Wareneingangsprüfung", "SAP-Buchung",
                          "Localized AI", "Q4_K_M" })
    list.AddPhrase(p);

rec.Recognized += (_, e) =>
{
    if (e.Result.Reason == ResultReason.RecognizedSpeech)
        Console.WriteLine($"[{e.Result.OffsetInTicks/10_000_000}s] " +
                          e.Result.Text);
};
await rec.StartContinuousRecognitionAsync();

Compliance posture for call data

Voice is biometric-adjacent and calls routinely contain personal data, so: pin the resource to an EU region, disable audio logging on the resource (it's a setting, check it), and keep transcripts — not audio — as the system of record with a retention policy.

The hybrid we deploy for meeting notes

For internal meeting transcription, many customers prefer zero cloud contact: local Whisper Large v3 (MIT-licensed, runs on the same GPU as the chat model) produces the transcript, and the on-prem LLM summarizes it. Azure Speech earns its place where you need live streaming, diarization at scale, or production-grade TTS — capabilities local stacks still trail on.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment