What LoRA actually changes
Full fine-tuning rewrites all 7 billion weights — expensive and risky. LoRA (Low-Rank Adaptation) freezes the base model and learns two small matrices A and B per layer whose product is added to the original weight: W' = W + BA. With rank 16, you train roughly 0.2–1% of the parameters. QLoRA goes further: the frozen base is held in 4-bit, so a 7B fine-tune fits in ~10 GB of VRAM.
How much data do you need?
Less than most teams fear. For style and format adherence: 500–2,000 high-quality examples. For domain knowledge tasks (support answers, contract clauses): 2,000–20,000. Quality dominates quantity — one afternoon of an expert reviewing examples is worth more than 50,000 scraped pairs. We build datasets from what companies already have: resolved tickets, approved documents, past correspondence.
A complete training script
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from transformers import (AutoModelForCausalLM, AutoTokenizer,
TrainingArguments)
from trl import SFTTrainer
base = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base, load_in_4bit=True, device_map="auto") # QLoRA
peft_cfg = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
target_modules=["q_proj","k_proj","v_proj","o_proj"])
model = get_peft_model(model, peft_cfg)
ds = load_dataset("json", data_files="tickets_sft.jsonl")["train"]
trainer = SFTTrainer(
model=model, train_dataset=ds, tokenizer=tok,
args=TrainingArguments(
output_dir="out", num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4, bf16=True,
logging_steps=20, save_strategy="epoch"))
trainer.train()
model.save_pretrained("adapter-support-v1") # ~60 MB artifact
The adapter is the product
The output is a ~60 MB adapter you can version, diff, roll back, and merge into the base for deployment. Your proprietary knowledge lives in that small file — and in our deployments, it never leaves your infrastructure: training runs on your hardware or an air-gapped staging machine, exactly as the data protection officer wants it.
Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.
Plan my deployment