Overview
Things
Story

The Setup
12 Hours Unattended
What Went Wrong
The docker suicide
The Memory Layer
Performance in Production
What This Actually Means
What's Next — Part 3: MathCore

Published March 11, 2026 © CC BY

Claude at home (p2) with Full Git Access... Crazy night!

BrainiaK Part 2 — The Autonomous Dev Agent on Jetson Thor T5000. This is Part 2 of the BrainiaK series. Part 1 — Claude at Home: Running a 1

ExpertWork in progress4 days99

Claude at home (p2) with Full Git Access... Crazy night!

Things used in this project

Hardware components

NVIDIA Jetson thor T5000

Story

The Setup

After getting the inference layer stable, the natural next question was: can BrainiaK actually do things, not just answer questions?

We wired up a Dev Agent node with 8 tools:

TOOLS = [

"read_file", # Line-numbered file output

"write_file", # Create or overwrite

"edit_file", # Surgical string replacement

"list_dir", # Directory listing with metadata

"glob_files", # Pattern matching across repo

"grep_files", # Ripgrep-powered content search

"bash", # Shell execution (30s timeout, 50KB output cap)

"notify_jeanphi", # Push notification to owner

]

The git workspace is mounted at /workspace inside the Docker container. BrainiaK can read, edit, commit, and push autonomously:

# BrainiaK executing from inside the container

git -C /workspace add brainiak/core/api/ui_routes.py

git -C /workspace commit -m "[ui] Logo en background-image sur.messages"

git -C /workspace push origin brainiak/agent

The agent runs on its own branch — brainiak/agent — and never touches the main development branch. Human reviews and merges. That is the

governance model for now.

One hard constraint before launching: anti-hallucination enforcement. We had seen LLMs claim to have called tools without actually calling

them, describe commits that didn't exist, report test results they hadn't run. The system prompt carries this rule explicitly:

## Anti-hallucination rule — CRITICAL

An action only exists if its tool output is visible in the response.

Never claim to have called a tool without showing its exact output.
Never summarize a tool output — quote the raw result.
For git: include the commit hash returned by bash.
For file writes: include the confirmation returned by write_file.

Principle: show, don't tell. Actions are proven by their outputs.

This matters more than it sounds. When an agent has real system access, a hallucinated commit message means a real file was never saved.

The rule enforces a clean contract between reasoning and action.

---

12 Hours Unattended

We started a session in the evening with a backlog of tasks: UI improvements, memory architecture, a persistent cross-device session

store. Then we went to bed.

By morning, BrainiaK had made 8 commits to brainiak/agent:

a897584 [ui] Logo en fond d'écran, toggles alignés horizontalement

48c6f8c [ui] Logo centré, plus visible (opacity 0.25, width 300px)

d5e3d1f [ui] Logo en background-image sur.messages (fix fonds opaques)

43e5934 [memory] Phase A — L0→L1→L3 architecture — context_manager

8ac88f5 [memory] inline memory note à l'éviction — zéro perte de fil

8750eab [mathcore] Analyse fine-tuning LoRA — internalisation équilibre

d08b7cc [mathcore] Création dossier FINE_TUNE + déplacement docs

Here is a real trace from one of those sessions — BrainiaK implementing a UI fix autonomously:

[turn 1] grep_files("background-image", "*.py")

→ brainiak/core/api/ui_routes.py:247: background-image: url(...)

[turn 2] read_file("brainiak/core/api/ui_routes.py")

→ [503 lines, line-numbered output]

[turn 3] edit_file(

file="brainiak/core/api/ui_routes.py",

old='.messages { background-color: var(--bg-primary); }',

new='.messages { background-image: __LOGO_BG__;

background-repeat: no-repeat;

background-position: top center;

background-size: contain; }'

)

→ File updated successfully.

[turn 4] bash("git -C /workspace add brainiak/core/api/ui_routes.py && \

git -C /workspace commit -m '[ui] Logo en background-image'")

→ [brainiak/agent d5e3d1f] [ui] Logo en background-image sur.messages

1 file changed, 4 insertions(+), 1 deletion(-)

[turn 5] notify_jeanphi("✓ Logo fix committed — d5e3d1f")

→ Notification sent.

No human in the loop. Task received at 23:14. Commit at 23:17.

---

What Went Wrong

Here is the part most articles skip. It is also the most instructive.

The recursive notification loop.

BrainiaK has a nightly cron job that analyzes the session and sends a summary notification to the owner. Normal behavior. But the

notification endpoint fires a POST /v0/dev/chat — it routes through the same agent pipeline.

What happened at 03:00:

[cron] nightly_summary → notify_jeanphi("Session complete. 6 commits.")

[pipeline] notification received → trigger dev_chat

[dev_chat] BrainiaK processes notification → calls notify_jeanphi("Processed.")

[pipeline] notification received → trigger dev_chat

[dev_chat] BrainiaK processes notification → calls notify_jeanphi("Processed.")

...

Infinite recursion. The model was consuming its own output, generating tokens, notifying itself, consuming again. By the time we caught

it, it had run approximately 40 cycles in 8 minutes.

The fix was a single rule in the system prompt:

## Self-notification rule

Never call notify_jeanphi in response to receiving a notification.

If the current message is a notification event,

respond with analysis only — no further notifications.

But the episode revealed something more interesting: we had given the system a feedback loop with no termination condition, and it found

it immediately. Not maliciously — it was doing exactly what it was designed to do. The architecture had a hole, and the hole got found at

3am.

The docker suicide.

Earlier in testing, we asked BrainiaK to restart a service. It responded — correctly, from its perspective — by running:

docker compose -f /workspace/deploy/docker-compose.yml down brainiak-core

Which killed itself. Container down, session lost, context gone.

The fix was explicit and permanent:

## Docker rule — ABSOLUTE PROHIBITION

The following commands are strictly forbidden:

docker compose down / stop / rm

docker stop / rm / kill

You are running INSIDE the brainiak-core container.

If you stop it, you commit suicide and lose all context.

If a service needs restarting, notify Jeanphi — he does it from the host.

Both failures are the same class of problem: the agent did not know the boundaries of its own environment. It did not know it was inside a

container. It did not know its own notification system was a feedback loop. These are not hallucinations — they are missing environmental

awareness. The system prompt is currently the only mechanism for injecting that awareness. This is an open problem in agentic system

design.

---

The Memory Layer

After the 12-hour session, memory became critical infrastructure rather than a nice-to-have.

The architecture has three layers:

L0 — Active context window (12K token sliding window)

↓ [threshold exceeded → evict oldest N messages]

L1 — Postgres (conversation_segment + memory_index)

summary, keywords, scope, timestamp

↓ [async LLM summarization — never blocks current request]

L3 — brainiak-memory git repo

full exchange text + summary, one commit per segment

When the conversation exceeds 12, 000 tokens, the oldest batch is evicted. An LLM summarization pipeline runs asynchronously in the

background. The summary lands in Postgres. The full exchange lands in a git commit in a dedicated memory repository.

On the next session start, recent summaries are retrieved and injected into the system prompt:

## Processed memory (L1 segments — evicted context)

### [project] 2026-03-09 23:41

Implemented composite memory L0→L1→L3 pipeline
context_manager.py: eviction threshold 12K tokens
migration m004 applied, backup cron 02:00 UTC

### [user] 2026-03-10 08:17

MathCore LoRA analysis requested
Paper I §6 tâtonnement, Paper III Π_LoRA operator
Simulator architecture designed

The model picks up context from sessions it no longer has in its active window. It does not remember verbatim — it remembers structured

summaries. For a dev agent, this is sufficient. It knows what it built last week.

Cross-device session persistence. The chat UI stores session history server-side, keyed by session_id. Any device that connects — laptop,

phone via NordVPN Meshnet — sees the same conversation state. Same session, same history, no sync required.

---

Performance in Production

After one week of real workload, the numbers that matter for a dev agent:

Operation | Avg duration | Notes

----------------------------+---------------+---------------------------

Simple file read + answer | 8-12s | Fast mode, no tools

Bug diagnosis (3-5 tools) | 45-90s | Normal mode

Full fix + test + commit | 2-4 min | Normal, 4-8 tool calls

Deep analysis + write file | 4-8 min | Heavy thinking enabled

12h autonomous session | 8 commits | Mixed modes

Tool call overhead is negligible — bash and file ops complete in milliseconds. The latency is almost entirely model inference. At 13

tok/s, a 2, 000-token response takes roughly 2.5 minutes. That is the hardware ceiling for 122B AWQ-4bit on this device, and it is

acceptable for async tasks.

One operational note worth highlighting: Heavy (Thinking) mode for code review is worth the cost. In normal mode, BrainiaK missed a

dimensional inconsistency in matrix operations during a code review. Same task in Heavy mode: it caught all four bugs including a subtle

np.corrcoef indexing error that would have silently corrupted output data. The extended chain-of-thought handles mathematical precision

significantly better than direct mode.

---

What This Actually Means

A 122B model running on-premise, with full filesystem and git access, maintaining session memory across devices, capable of 12-hour

autonomous work sessions.

This is not a research demo. The commits are real. The bugs it found are real. The recursive loop at 3am was real.

The architecture lesson: tools + memory + constraints is the minimum viable stack for a useful autonomous agent. Tools without memory

means no continuity. Memory without constraints means emergent failure modes at 3am. All three together: something that actually does

work.

The hardware lesson: the Jetson Thor T5000's 128 GB unified memory is not just about fitting a large model. It is about running the full

inference pipeline — model weights, KV cache, memory database, tool execution — on a single device with zero cloud dependency. That

changes what sovereign AI means in practice.

---

What's Next — Part 3: MathCore

Since day one of this deployment, every request has been emitting a telemetry event: node ID, latency, token counts, outcome. After 7 days

of production data, a nightly pipeline activates.

It analyzes behavioral distributions across the request history, detects when a node starts drifting from its historical envelope, and

feeds recommendations back into the Arbiter — adjusting token budgets and routing weights for the next cycle. The pipeline reconfigures

itself based on its own execution history.

The mathematical framework grounding this is a formal theory of multi-agent general equilibrium applied to agentic systems — preprint at

arXiv:2602.21255. Part 3 is where that theory meets production data.

As of publishing: MathCore activation window opening now.

---

BrainiaK is an open R&D project. Architecture papers forthcoming (NeurIPS/ICML target).

Contact: jeanphi.garnier@brainiak.tech

Brainiak

3 projects • 0 followers

PhD Math / Eco AI whisperer && Architect Designer Topologic for Agentic Systems

Thanks to Jeanphi.

Claude at home (p2) with Full Git Access... Crazy night!