Human-in-the-loop as local-minima escape
By Yad Konrad — @0bserver07
“I have the feeling I was useful by pinging some wave of agents to do diagnostics and that got the solution out of a local minima.” — Cosmin Negruseri, 2026-05-14
“Seems to enter the local minima fairly quickly so adding some skills to creatively explore different directions.” — Sung Jae Bae, 2026-05-21
A claim worth taking seriously: a long-running autonomous agent loop will hit local minima fast, and a sparse human ping is enough to escape. This page tests that claim against the 40 Yad-typed prompts logged in the schmidhuber-problems orchestrator session.
Important framing. The orchestrator’s JSONL contains 192 records of
type=user, but only 40 are prompts Yad actually typed. The other 152 are worker sessions reporting back to the orchestrator (their idle/summary messages route throughSendMessageand surface in the lead’s transcript astype=userrecords), plus 6 slash commands, 2 skill-loader outputs, and 2 redacted entries. Every number below uses the 40-prompt count.
The numbers
- Yad-typed prompts to the orchestrator: 40
- Orchestrator assistant turns: 1,026
- Turns per Yad prompt (orchestrator): 1,026 / 40 ≈ 25.7
- Active wall-clock: ~21 hours of attention spread across 41 hours (two ~10-hour overnight idle gaps)
If every Yad prompt was load-bearing, this would be a high human-attention build. It wasn’t. Most were 1-line nudges. A small minority did the load-bearing work.
Three classes of hop
Classified manually from the orchestrator’s full hop list. (Worker-side hops are mostly templated <teammate-message> envelopes — see Worker prompt anatomy.)
Type A — direction-changing (rare, high-leverage)
These reshape the build. The autonomous loop wouldn’t have found these on its own.
| Timestamp (UTC) | Yad’s prompt | What changed |
|---|---|---|
| 2026-05-06T23:08 | Pasted the SPEC link, the hinton-problems precedent, and Yaroslav’s Schmidhuber-papers suggestion. Set the goal. | Triggered the entire build — TeamCreate + wave 0 dispatch within 16 min. |
| 2026-05-07T00:11 | “alright shall we do clean up and dispathc multiple agents to finish the rest of the waves?” | Wave-1 trigger. Started the parallel-dispatch protocol. |
| 2026-05-07T01:31 | “why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!” | Wave 1 → wave 2 protocol pivot. PR #2 closed, reissued as PR #5 on wave/0-sanity. All per-stub remote branches deleted. From wave 2 onward, per-stub branches stay LOCAL ONLY. |
| 2026-05-07T02:11 | “I need you to not rely on me anymore until you finish it all, basically, do wave into 1 per, audit, post to pr then trigger next wave” | Autonomous-mode engaged. Eight subsequent waves ran without further user intervention. Lead ran the audit→merge→dispatch loop end-to-end. |
| 2026-05-08T13:55 | “BUT FIRST FIRST FINISH THESE THINGS REMAINING” | Wave 11 (v1.5) trigger. Prioritized closing v1+v1.5 before site/docs. |
| 2026-05-08T15:21 | “why are there teams reaminign thouhg?” | Caught the not-yet-shutdown teammate processes. Triggered the team cleanup. |
| 2026-05-08T15:42 | “i still see the agents man / where and why the site link is not in the gihub repo / have we verified thse things to be truely done or left over?” | Surfaced the unmerged-PRs gap. Explicit merge instruction followed. The batch-merge of all 13 PRs happened minutes after this. |
| 2026-05-08T16:09 | Redacted (frustrated venting about agent identity) | Triggered the git filter-branch rewrite (74 commits → Yad Konrad). |
Count: 8 Type-A hops out of 40 Yad-typed prompts = 20%. One in five prompts Yad sent reshaped the protocol.
Type B — status checks (frequent, low-cost)
Brief check-ins. Pattern: orchestrator summarizes per-wave progress and continues.
| Pattern | Approximate count |
|---|---|
status / status? / status, what is left? / whats left rl? | ~10 |
These cost ~25 turns of orchestrator output each (the summary), but didn’t change the protocol. They were Yad confirming the autonomous loop was still on track.
Type C — review-and-merge gate
Explicit approval moments. Few but load-bearing.
| Timestamp (UTC) | Hop | Effect |
|---|---|---|
| 2026-05-07T00:15 | “review it/audit and post the comment, then dispatch after please” | Locked in the audit-then-dispatch loop. |
| 2026-05-08T13:55 | “lets please finish everything and deal with the full impelmentations / remember what Yaroslav asked for? when we finish we need to draw the full pictres as well / BUT FIRST FIRST FINISH” | Wave 11 + downstream artifacts (BUILD_NOTES, site). |
| 2026-05-08T19:37 | “fix merge both PRs?” | Final merge approval for the meta + docs PRs. |
| 2026-05-08T16:52 | “udpate the github issues” | Closed-out and reorganized the v2 ByteDMD + v1.5 follow-up issues. |
The actual batch-merge of 13 PRs happened automatically after the autonomous loop completed; Yad’s role was only to approve. ~5 Type-C hops total.
Where the other 152 type=user records came from
Of the 192 type=user records in the orchestrator’s JSONL transcript:
- 142 were workers reporting back to the orchestrator — each
SendMessagefrom a worker toteam-leadsurfaces in the lead’s transcript as atype=userrecord with a<teammate-message teammate_id="<worker>">envelope. - 6 were slash commands (
/login, etc.) - 2 were skill-loader outputs (
sutro-syncskill invocations) - 2 were redacted (one foul-language frustrated venting, one image attachment path leak)
Yad’s actual hand-typed prompts are the 40 classified below.
The local-minima-escape claim, tested
Claim (Cosmin’s): a manual ping helps the agent loop escape local minima it would otherwise stay stuck in.
The two strongest examples:
Example 1 — branch-per-stub local minimum
By wave 1, the lead had settled on impl/<slug> branches pushed to origin. Pattern: 6 workers per wave × 12 waves = 72 remote branches. This was branch spam in a way the lead didn’t recognize. It wasn’t wrong in any narrow sense — the workflow was internally consistent.
Yad’s hop at 01:31 (“THIS IS WRONG PRACTICE COURSE CORRECT!”) gave the lead the outside perspective it lacked. Within 7 minutes (01:38), PR #2 was closed and reissued as PR #5 on wave/0-sanity. All impl/<slug> branches were deleted. The new protocol — wave-N-local/<slug> LOCAL ONLY — held for the next 10 waves without revision.
Without this nudge, the autonomous loop would have left 72 branches on origin. The lead wouldn’t have flagged it; the workflow looked fine from inside the loop.
Example 2 — silent-after-commit local minimum
In wave 3, multiple workers committed locally and went idle without sending a summary. The protocol said send-a-summary, but workers were treating “task done” as equivalent to “send summary” — they’d hit done, idle, and the lead would wait.
The lead’s first detection was around 03:09 UTC when it pinged the silent worker: “Looks like you committed locally (commit 1f1cbcb on wave-3-local/curiosity-three-regions) and went idle without sending a summary. Per the wave-3 protocol, please send your summary now.”
This was a local-minima escape by the lead, not by Yad. Three later wave-3/wave-10/wave-11 workers triggered the same nudge pattern (visible in team_messages.tsv as “Request summary message” SendMessages). The fix was a worker-template update: “DO NOT GO SILENT — send a summary explicitly before idling.”
What doesn’t escape local minima
The autonomous loop did not self-discover the branch-spam problem. The lead’s audit subagent didn’t either — the audits checked stub quality, not workflow correctness. It took an outside perspective (Yad) at 01:31 to point at the pattern.
This is the load-bearing observation: outside perspective is the rare commodity. A long-running agent loop builds internal consistency fast and protects it. Yad’s 40 prompts broke down as roughly:
| Class | Count | Share of Yad’s 40 |
|---|---|---|
| Type A (direction-changing) | 8 | 20% |
| Type B (status checks) | ~10 | 25% |
| Type C (review/merge gate) | ~5 | 13% |
| Type D (small clarifications, acks, copy-edits, follow-up work after wave 11) | ~17 | 42% |
The build was carried by 8 high-leverage Yad prompts out of 40 total. The other 32 were either status check-ins, approval-gate one-liners, or low-cost follow-up after the wave-11 merge.
Implications for the next build
If 8 prompts out of 40 did most of the human work, the design questions are:
- Can the lead self-detect protocol drift? Branch-spam was visible in the orchestrator’s own
git pushcalls. A “wait, am I pushing one branch per stub?” self-check at wave end might have caught it without Yad’s prompt. - Can workers self-audit before idling? Silent-after-commit was caught reactively. A worker prompt that mandates “send summary then await shutdown” instead of “send summary before idling” might have shifted the failure mode.
- What’s the autonomy ceiling? Orchestrator: ~25.7 turns per Yad-typed prompt. Hinton-problems (same machinery, week earlier) likely sits in a similar range — re-measure with the same hop classification to compare. Is 50 turns/prompt reachable? 100? Beyond what point does the build silently degrade?
- Is Type A predictable? The two pivotal Type-A hops were both about protocol, not technical content. Future builds might want a “protocol-only” intervention budget and a “technical-only” intervention budget.
Quotes worth keeping for the writeup
“why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!” — wave-1 → wave-2 pivot, 2026-05-07T01:31 UTC
“I need you to not rely on me anymore until you finish it all, basically, do wave into 1 per, audit, post to pr then trigger next wave” — autonomous mode, 2026-05-07T02:11 UTC
“have we verified thse things to be truely done or left over?” — surface the unmerged-PR gap, 2026-05-08T15:42 UTC
Each is a single sentence that reshaped the protocol or unstuck the loop. The blog post should quote them verbatim.
Sources
- All 192
type=userrecords (40 Yad-typed + 152 routing/system) with timestamps: orchestrator session inanalysis/data/sessions.jsonl - 267 SendMessage calls across all sessions:
analysis/data/team_messages.tsv - 139 Agent dispatches across all sessions (orchestrator’s share: 73):
analysis/data/agent_dispatches.tsv - Cosmin and Sung Jae’s observations: chat-yad Telegram thread, 2026-05-14 and 2026-05-21