Document Intelligence: the extraction workhorse
Azure AI Document Intelligence reads PDFs and scans into structured data. The prebuilt invoice model already understands European invoice layouts — VAT IDs, IBANs, line items, German field labels — and the custom extraction trainer learns your own forms from as few as five labeled samples.
Invoices → ERP-ready JSON in C#
using Azure;
using Azure.AI.DocumentIntelligence;
var client = new DocumentIntelligenceClient(
new Uri(cfg["DI_ENDPOINT"]), new AzureKeyCredential(cfg["DI_KEY"]));
var op = await client.AnalyzeDocumentAsync(
WaitUntil.Completed, "prebuilt-invoice",
BinaryData.FromBytes(File.ReadAllBytes("eingang_2026_0142.pdf")));
var doc = op.Value.Documents[0];
string F(string name) =>
doc.Fields.TryGetValue(name, out var f) && f.Confidence > 0.85
? f.Content : throw new LowConfidenceException(name);
var record = new {
Vendor = F("VendorName"),
VendorVat = F("VendorTaxId"),
Number = F("InvoiceId"),
Date = F("InvoiceDate"),
Net = F("SubTotal"),
Vat = F("TotalTax"),
Gross = F("InvoiceTotal"),
Iban = doc.Fields["PaymentDetails"].Content
};
// low-confidence fields fall into a human review queue —
// extraction without a confidence gate is how wrong IBANs get paid.
The pipeline around the API
Production extraction is 20% model, 80% process: a watch folder or mail ingestion, deduplication by content hash, the confidence gate above, a review UI for the 5–10% of fields that need eyes, and write-back to the ERP with a full audit trail. Build the review loop first; it's also your future training data.
The on-prem variant
Document Intelligence ships as a container for disconnected environments — the model runs in your Docker host, documents never leave, and only billing telemetry goes out. For customers whose invoices contain trade-secret pricing, that container plus a local LLM for the unstructured remainder (delivery conditions, free-text clauses) is our standard recommendation.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment