Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

What worked, what didn’t

By Yad Konrad — @0bserver07

Binary-classified observations. Each row has the moment of discovery (timestamp from the orchestrator’s session log) and the fix that landed. Drawn from the actual 40 Yad-typed prompts + 73 Agent dispatches + 69 SendMessages in orchestrator session 63285119-154e-42ab-9555-7a42471b0309. (The orchestrator’s JSONL has 192 type=user records total, but 152 of those are workers reporting back to the orchestrator, slash commands, and skill loaders — not Yad’s input. See Sessions for the breakdown.)

See Sessions and Orchestration map for the data behind each row.

What worked

PatternWhy it workedWhen it stuck
One persistent team (schmidhuber-impl), ephemeral teammatesTeamCreate once at 23:23 UTC, then 73 Agent dispatches inherited the team description. Per-worker prompts stayed short because the team contract did the heavy lifting.Wave 0 onward. The team persisted across all 12 waves; only teammates turned over.
One SPEC issue as the source of truthEvery worker prompt links to issue #1 and reads it first. No SPEC drift across 58 workers.Wave 0. Yad opened the SPEC at 23:20 UTC, before the first teammate dispatch.
LOCAL-ONLY per-stub branches + one wave PRWorkers commit to wave-N-local/<stub> and never push. The orchestrator collects branches and opens one PR per wave. Stopped the per-stub branch spam.After Yad’s “THIS IS WRONG PRACTICE COURSE CORRECT!” pushback at 2026-05-07T01:31 UTC. PR #2 was closed and reissued as PR #5 on wave/0-sanity.
One Explore audit subagent per wave, before opening the PRRead-only pass catches inconsistencies (e.g. orphan problem.py files) before review. Cheap because Explore doesn’t edit.Started wave 0 with “Audit wave 0 PR #2”. Held through all 12 waves.
SendMessage(shutdown_request) to free worker contextEach wave’s workers got a shutdown after their PR landed, so the orchestrator’s context wasn’t bloated by stale teammate transcripts.Every wave-end SendMessage. 69 total.
Pure-numpy + matplotlib constraintForced workers to demonstrate algorithmic faithfulness (no torch/gym shortcuts). All 58 stubs ran <5 min/seed on a laptop.Codified in SPEC #1 before wave 0. Enforced by per-worker prompt and verified in audits.
Honest non-replication acknowledged in audithq-learning-pomdp failed to reproduce the paper’s HQ-vs-flat gap. The wave-3 audit at 03:35 UTC documented it with the γ^Δt · HV ≤ R_goal analysis instead of fudging the result.One incident. Reproduced in BUILD_NOTES.md ➜ Honest non-replication.
Yad’s “I need you to not rely on me anymore” hop at wave-3 entrySwitched from per-wave-approval to fully autonomous wave→audit→PR→next-wave loop. Eight subsequent waves ran without further user intervention.2026-05-07T02:11:39 UTC. Verified by the 8+ hour gap from wave 3 launch to the next user prompt.
Algorithmic faithfulness rule (per family)Wave-6 LSTM stubs all use a hand-rolled LSTM cell + BPTT; wave-8 evolutionary stubs all evolve weights; Levin/OOPS stubs keep universal search. No shortcut substitutions.Codified in SPEC #1 before wave 0. Surface symptom: linear-transformers-fwp proves the 1992-FWP ≡ 2021-linear-attention equivalence to 2.22e-16 — couldn’t have happened with a shortcut substitution.
mdBook + GitHub Pages for the catalogSingle source for stub READMEs renders as a navigable site. Per-stub GIF assets + viz directories ship together.After meta PR #16 at 15:50 UTC. Site live at cybertronai.github.io/schmidhuber-problems/.

What didn’t (and the fix)

PatternDiscoveryFix
Per-stub remote branches (impl/<slug>) were branch spam2026-05-07T01:31:11 UTC — Yad: “why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!”Wave 2+ switched to wave-N-local/<stub> LOCAL ONLY. PR #2 closed, reissued as PR #5 on wave/0-sanity. All impl/<slug> remote branches deleted.
Workers committed locally then went silent without sending a summaryWave 3 (multiple workers): the orchestrator detected idle after-commit and had to nudgeLead’s first SendMessage to silent workers was “Looks like you committed locally and went idle without sending a summary. Per the wave-3 protocol, please send your summary now.” Pattern repeated at least 4 times across waves 3, 10, 11. Worker prompt template was updated for later waves: “DO NOT GO SILENT — send a summary explicitly.”
Wave 6 + wave 7 left orphan problem.py stub filesCaught by the per-wave audit agent. Workers wrote the new implementation files but didn’t git rm the placeholderLead added a cleanup commit on top of each wave merge. After wave 7, SPEC was updated to emphasize “Remove problem.py explicitly” in every dispatch prompt. No further orphans.
One commit in wave 3 was authored as agent-pomdp-flag-maze-builder <agent@anthropic.com>Wave-3 audit at 03:35 UTC flagged it as non-blocking. The per-worktree git config was overridden by Claude Code’s session-default identityResolved by a bulk git filter-branch rewrite at 2026-05-08T16:12 UTC — 74 agent-authored commits → Yad Konrad <yad.konrad@gmail.com>. Force-pushed main. Memory now has feedback_git_author.md so this is checked before every commit.
GitHub Pages deploy failed first try2026-05-08T15:50:41 UTC — “Ensure GitHub Pages has been enabled”One API call: gh api -X POST repos/cybertronai/schmidhuber-problems/pages -F build_type='workflow'. Workflow re-run succeeded at 15:53.
Token math in BUILD_NOTES was first written from memoryDiscovered post-merge that the original prose had fabricated countsReissued PR #20 docs/token-math-correction (closes #19) — rewrote BUILD_NOTES from the actual JSONL session log. Lesson: prose from memory is unreliable when the source data is right there.
Long single pages get dense and hard to navigateYad on 2026-05-23: “i feel like the analysis should be better separated, in the sidebar at least”Sidebar regrouped into 4 sections (Build internals / The orchestration / The worker template / Roadmap). worker-prompt-anatomy.md got an “On this page” TOC and a <details> wrapper around the verbose full-prompt block.

Open

PatternWhy open
Was every audit worth its cost?12 audit dispatches added ~3-8% overhead each. Wave 0 (1 stub) almost certainly didn’t need one.
Could waves with no family overlap run in parallel?Waves 5 (predictability) and 8 (evolutionary) share no code. They ran serially, costing ~6 wall-hours total; could have been ~3.
Worker self-audit instead of separate Explore?If each worker wrote its own audit notes before sending the summary, the per-wave Explore dispatch could be merged in. Saves one dispatch per wave (12 total).
cache_write_1h was 36% of total cost.Each long-running cache invalidation cost real money. Open question whether tighter tool-list discipline (don’t reload tools mid-session) would cut this.

How to read this with the data

Every row above can be cross-checked:

  • Timestamps: grep the timestamp field in ../analysis/data/sessions.jsonl for the orchestrator session.
  • Worker silence incidents: filter ../analysis/data/team_messages.tsv for Request summary message in the head column.
  • Audit dispatches: filter ../analysis/data/agent_dispatches.tsv for subagent_type=Explore.
  • Yad’s quotes: see Pivot moments for the full verbatim list with timestamps.