After getting the inference layer stable, the natural next question was: can BrainiaK actually do things, not just answer questions?
We wired up a Dev Agent node with 8 tools:
TOOLS = [
"read_file", # Line-numbered file output
"write_file", # Create or overwrite
"edit_file", # Surgical string replacement
"list_dir", # Directory listing with metadata
"glob_files", # Pattern matching across repo
"grep_files", # Ripgrep-powered content search
"bash", # Shell execution (30s timeout, 50KB output cap)
"notify_jeanphi", # Push notification to owner
]
The git workspace is mounted at /workspace inside the Docker container. BrainiaK can read, edit, commit, and push autonomously:
# BrainiaK executing from inside the container
git -C /workspace add brainiak/core/api/ui_routes.py
git -C /workspace commit -m "[ui] Logo en background-image sur.messages"
git -C /workspace push origin brainiak/agent
The agent runs on its own branch — brainiak/agent — and never touches the main development branch. Human reviews and merges. That is the
governance model for now.
One hard constraint before launching: anti-hallucination enforcement. We had seen LLMs claim to have called tools without actually calling
them, describe commits that didn't exist, report test results they hadn't run. The system prompt carries this rule explicitly:
## Anti-hallucination rule — CRITICAL
An action only exists if its tool output is visible in the response.
- Never claim to have called a tool without showing its exact output.
- Never summarize a tool output — quote the raw result.
- For git: include the commit hash returned by bash.
- For file writes: include the confirmation returned by write_file.
Principle: show, don't tell. Actions are proven by their outputs.
This matters more than it sounds. When an agent has real system access, a hallucinated commit message means a real file was never saved.
The rule enforces a clean contract between reasoning and action.
---
12 Hours UnattendedWe started a session in the evening with a backlog of tasks: UI improvements, memory architecture, a persistent cross-device session
store. Then we went to bed.
By morning, BrainiaK had made 8 commits to brainiak/agent:
a897584 [ui] Logo en fond d'écran, toggles alignés horizontalement
48c6f8c [ui] Logo centré, plus visible (opacity 0.25, width 300px)
d5e3d1f [ui] Logo en background-image sur.messages (fix fonds opaques)
43e5934 [memory] Phase A — L0→L1→L3 architecture — context_manager
8ac88f5 [memory] inline memory note à l'éviction — zéro perte de fil
8750eab [mathcore] Analyse fine-tuning LoRA — internalisation équilibre
d08b7cc [mathcore] Création dossier FINE_TUNE + déplacement docs
Here is a real trace from one of those sessions — BrainiaK implementing a UI fix autonomously:
[turn 1] grep_files("background-image", "*.py")
→ brainiak/core/api/ui_routes.py:247: background-image: url(...)
[turn 2] read_file("brainiak/core/api/ui_routes.py")
→ [503 lines, line-numbered output]
[turn 3] edit_file(
file="brainiak/core/api/ui_routes.py",
old='.messages { background-color: var(--bg-primary); }',
new='.messages { background-image: __LOGO_BG__;
background-repeat: no-repeat;
background-position: top center;
background-size: contain; }'
)
→ File updated successfully.
[turn 4] bash("git -C /workspace add brainiak/core/api/ui_routes.py && \
git -C /workspace commit -m '[ui] Logo en background-image'")
→ [brainiak/agent d5e3d1f] [ui] Logo en background-image sur.messages
1 file changed, 4 insertions(+), 1 deletion(-)
[turn 5] notify_jeanphi("✓ Logo fix committed — d5e3d1f")
→ Notification sent.
No human in the loop. Task received at 23:14. Commit at 23:17.
---
What Went WrongHere is the part most articles skip. It is also the most instructive.
The recursive notification loop.
BrainiaK has a nightly cron job that analyzes the session and sends a summary notification to the owner. Normal behavior. But the
notification endpoint fires a POST /v0/dev/chat — it routes through the same agent pipeline.
What happened at 03:00:
[cron] nightly_summary → notify_jeanphi("Session complete. 6 commits.")
[pipeline] notification received → trigger dev_chat
[dev_chat] BrainiaK processes notification → calls notify_jeanphi("Processed.")
[pipeline] notification received → trigger dev_chat
[dev_chat] BrainiaK processes notification → calls notify_jeanphi("Processed.")
...
Infinite recursion. The model was consuming its own output, generating tokens, notifying itself, consuming again. By the time we caught
it, it had run approximately 40 cycles in 8 minutes.
The fix was a single rule in the system prompt:
## Self-notification rule
Never call notify_jeanphi in response to receiving a notification.
If the current message is a notification event,
respond with analysis only — no further notifications.
But the episode revealed something more interesting: we had given the system a feedback loop with no termination condition, and it found
it immediately. Not maliciously — it was doing exactly what it was designed to do. The architecture had a hole, and the hole got found at
3am.
Earlier in testing, we asked BrainiaK to restart a service. It responded — correctly, from its perspective — by running:
docker compose -f /workspace/deploy/docker-compose.yml down brainiak-core
Which killed itself. Container down, session lost, context gone.
The fix was explicit and permanent:
## Docker rule — ABSOLUTE PROHIBITION
The following commands are strictly forbidden:
docker compose down / stop / rm
docker stop / rm / kill
You are running INSIDE the brainiak-core container.
If you stop it, you commit suicide and lose all context.
If a service needs restarting, notify Jeanphi — he does it from the host.
Both failures are the same class of problem: the agent did not know the boundaries of its own environment. It did not know it was inside a
container. It did not know its own notification system was a feedback loop. These are not hallucinations — they are missing environmental
awareness. The system prompt is currently the only mechanism for injecting that awareness. This is an open problem in agentic system
design.
---
The Memory LayerAfter the 12-hour session, memory became critical infrastructure rather than a nice-to-have.
The architecture has three layers:
L0 — Active context window (12K token sliding window)
↓ [threshold exceeded → evict oldest N messages]
L1 — Postgres (conversation_segment + memory_index)
summary, keywords, scope, timestamp
↓ [async LLM summarization — never blocks current request]
L3 — brainiak-memory git repo
full exchange text + summary, one commit per segment
When the conversation exceeds 12, 000 tokens, the oldest batch is evicted. An LLM summarization pipeline runs asynchronously in the
background. The summary lands in Postgres. The full exchange lands in a git commit in a dedicated memory repository.
On the next session start, recent summaries are retrieved and injected into the system prompt:
## Processed memory (L1 segments — evicted context)
### [project] 2026-03-09 23:41
- Implemented composite memory L0→L1→L3 pipeline
- context_manager.py: eviction threshold 12K tokens
- migration m004 applied, backup cron 02:00 UTC
### [user] 2026-03-10 08:17
- MathCore LoRA analysis requested
- Paper I §6 tâtonnement, Paper III Π_LoRA operator
- Simulator architecture designed
The model picks up context from sessions it no longer has in its active window. It does not remember verbatim — it remembers structured
summaries. For a dev agent, this is sufficient. It knows what it built last week.
Cross-device session persistence. The chat UI stores session history server-side, keyed by session_id. Any device that connects — laptop,
phone via NordVPN Meshnet — sees the same conversation state. Same session, same history, no sync required.
---
Performance in ProductionAfter one week of real workload, the numbers that matter for a dev agent:
Operation | Avg duration | Notes
----------------------------+---------------+---------------------------
Simple file read + answer | 8-12s | Fast mode, no tools
Bug diagnosis (3-5 tools) | 45-90s | Normal mode
Full fix + test + commit | 2-4 min | Normal, 4-8 tool calls
Deep analysis + write file | 4-8 min | Heavy thinking enabled
12h autonomous session | 8 commits | Mixed modes
Tool call overhead is negligible — bash and file ops complete in milliseconds. The latency is almost entirely model inference. At 13
tok/s, a 2, 000-token response takes roughly 2.5 minutes. That is the hardware ceiling for 122B AWQ-4bit on this device, and it is
acceptable for async tasks.
One operational note worth highlighting: Heavy (Thinking) mode for code review is worth the cost. In normal mode, BrainiaK missed a
dimensional inconsistency in matrix operations during a code review. Same task in Heavy mode: it caught all four bugs including a subtle
np.corrcoef indexing error that would have silently corrupted output data. The extended chain-of-thought handles mathematical precision
significantly better than direct mode.
---
What This Actually MeansA 122B model running on-premise, with full filesystem and git access, maintaining session memory across devices, capable of 12-hour
autonomous work sessions.
This is not a research demo. The commits are real. The bugs it found are real. The recursive loop at 3am was real.
The architecture lesson: tools + memory + constraints is the minimum viable stack for a useful autonomous agent. Tools without memory
means no continuity. Memory without constraints means emergent failure modes at 3am. All three together: something that actually does
work.
The hardware lesson: the Jetson Thor T5000's 128 GB unified memory is not just about fitting a large model. It is about running the full
inference pipeline — model weights, KV cache, memory database, tool execution — on a single device with zero cloud dependency. That
changes what sovereign AI means in practice.
---
What's Next — Part 3: MathCoreSince day one of this deployment, every request has been emitting a telemetry event: node ID, latency, token counts, outcome. After 7 days
of production data, a nightly pipeline activates.
It analyzes behavioral distributions across the request history, detects when a node starts drifting from its historical envelope, and
feeds recommendations back into the Arbiter — adjusting token budgets and routing weights for the next cycle. The pipeline reconfigures
itself based on its own execution history.
The mathematical framework grounding this is a formal theory of multi-agent general equilibrium applied to agentic systems — preprint at
arXiv:2602.21255. Part 3 is where that theory meets production data.
As of publishing: MathCore activation window opening now.
---
BrainiaK is an open R&D project. Architecture papers forthcoming (NeurIPS/ICML target).
Contact: jeanphi.garnier@brainiak.tech



Comments