Cognitive Architectures for Robots: From Reflex Loops to Language-Driven Planning

Three loops, three timescales

Every working robot brain is a hierarchy of loops running at wildly different speeds:

Reflex tier — 1 kHz. Torque control, balance, contact response. Pure control theory; no learning surprises allowed here. If this tier misses deadlines, the robot is on the floor before the next tier notices.
Skill tier — 10–100 Hz. Learned policies: walk, grasp, place, open-door. Today usually neural networks trained in simulation, outputting joint targets the reflex tier tracks.
Deliberation tier — 0.1–1 Hz. Task planning: decompose "clear the workbench" into a skill sequence, monitor progress, recover from failure. This is where language models entered robotics.

Where the LLM plugs in (and where it must not)

An LLM is a superb deliberator: it knows that screws go in boxes and that you don't put the box on the wet paint. It is a catastrophic reflex controller: 200 ms of token latency is two falls. The architecture rule is strict — language plans, policies act, control survives. The LLM emits skill calls with parameters; it never touches a motor.

VLAs compress the middle

Vision-language-action models (our VLA guide) merge perception and skill selection into one network: pixels + instruction → actions. The hierarchy doesn't disappear — the reflex tier still guards the hardware — but the skill tier becomes general instead of a library of hand-trained behaviors.

Memory and world state

Between the tiers sits a world state: object poses, task progress, semantic map. The deliberator reads and writes it; the new ingredient is the world model — a learned simulator the robot can query: "if I pull this tote, what happens?" Prediction before action is the difference between competence and confidence.

The enterprise echo

Strip the motors and this is exactly the architecture of a good business agent: deterministic guardrails at the bottom (your reflex tier: validation, permissions), specialized tools in the middle, and a language model deliberating on top — never executing directly. We build that stack on-prem for documents and processes; robotics just makes the layering visible because failures fall over.

Want this running inside your own VPN?

Localized AI fine-tunes small open models on your data and deploys them on your hardware — GDPR by architecture, zero per-token costs. Average setup: 72 hours.

Plan my deployment