Fine-Tuning Small Models on Company Data with LoRA: A Practical Guide

What LoRA actually changes

Full fine-tuning rewrites all 7 billion weights — expensive and risky. LoRA (Low-Rank Adaptation) freezes the base model and learns two small matrices A and B per layer whose product is added to the original weight: W' = W + BA. With rank 16, you train roughly 0.2–1% of the parameters. QLoRA goes further: the frozen base is held in 4-bit, so a 7B fine-tune fits in ~10 GB of VRAM.

How much data do you need?

Less than most teams fear. For style and format adherence: 500–2,000 high-quality examples. For domain knowledge tasks (support answers, contract clauses): 2,000–20,000. Quality dominates quantity — one afternoon of an expert reviewing examples is worth more than 50,000 scraped pairs. We build datasets from what companies already have: resolved tickets, approved documents, past correspondence.

A complete training script

from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from transformers import (AutoModelForCausalLM, AutoTokenizer,
                          TrainingArguments)
from trl import SFTTrainer

base = "Qwen/Qwen2.5-7B-Instruct"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base, load_in_4bit=True, device_map="auto")   # QLoRA

peft_cfg = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["q_proj","k_proj","v_proj","o_proj"])
model = get_peft_model(model, peft_cfg)

ds = load_dataset("json", data_files="tickets_sft.jsonl")["train"]

trainer = SFTTrainer(
    model=model, train_dataset=ds, tokenizer=tok,
    args=TrainingArguments(
        output_dir="out", num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4, bf16=True,
        logging_steps=20, save_strategy="epoch"))
trainer.train()
model.save_pretrained("adapter-support-v1")  # ~60 MB artifact

The adapter is the product

The output is a ~60 MB adapter you can version, diff, roll back, and merge into the base for deployment. Your proprietary knowledge lives in that small file — and in our deployments, it never leaves your infrastructure: training runs on your hardware or an air-gapped staging machine, exactly as the data protection officer wants it.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment