What worked, what didn’t
By Yad Konrad — @0bserver07
Binary-classified observations. Each row has the moment of discovery (timestamp from the orchestrator’s session log) and the fix that landed. Drawn from the actual 40 Yad-typed prompts + 73 Agent dispatches + 69 SendMessages in orchestrator session 63285119-154e-42ab-9555-7a42471b0309. (The orchestrator’s JSONL has 192 type=user records total, but 152 of those are workers reporting back to the orchestrator, slash commands, and skill loaders — not Yad’s input. See Sessions for the breakdown.)
See Sessions and Orchestration map for the data behind each row.
What worked
| Pattern | Why it worked | When it stuck |
|---|---|---|
One persistent team (schmidhuber-impl), ephemeral teammates | TeamCreate once at 23:23 UTC, then 73 Agent dispatches inherited the team description. Per-worker prompts stayed short because the team contract did the heavy lifting. | Wave 0 onward. The team persisted across all 12 waves; only teammates turned over. |
| One SPEC issue as the source of truth | Every worker prompt links to issue #1 and reads it first. No SPEC drift across 58 workers. | Wave 0. Yad opened the SPEC at 23:20 UTC, before the first teammate dispatch. |
| LOCAL-ONLY per-stub branches + one wave PR | Workers commit to wave-N-local/<stub> and never push. The orchestrator collects branches and opens one PR per wave. Stopped the per-stub branch spam. | After Yad’s “THIS IS WRONG PRACTICE COURSE CORRECT!” pushback at 2026-05-07T01:31 UTC. PR #2 was closed and reissued as PR #5 on wave/0-sanity. |
One Explore audit subagent per wave, before opening the PR | Read-only pass catches inconsistencies (e.g. orphan problem.py files) before review. Cheap because Explore doesn’t edit. | Started wave 0 with “Audit wave 0 PR #2”. Held through all 12 waves. |
SendMessage(shutdown_request) to free worker context | Each wave’s workers got a shutdown after their PR landed, so the orchestrator’s context wasn’t bloated by stale teammate transcripts. | Every wave-end SendMessage. 69 total. |
| Pure-numpy + matplotlib constraint | Forced workers to demonstrate algorithmic faithfulness (no torch/gym shortcuts). All 58 stubs ran <5 min/seed on a laptop. | Codified in SPEC #1 before wave 0. Enforced by per-worker prompt and verified in audits. |
| Honest non-replication acknowledged in audit | hq-learning-pomdp failed to reproduce the paper’s HQ-vs-flat gap. The wave-3 audit at 03:35 UTC documented it with the γ^Δt · HV ≤ R_goal analysis instead of fudging the result. | One incident. Reproduced in BUILD_NOTES.md ➜ Honest non-replication. |
| Yad’s “I need you to not rely on me anymore” hop at wave-3 entry | Switched from per-wave-approval to fully autonomous wave→audit→PR→next-wave loop. Eight subsequent waves ran without further user intervention. | 2026-05-07T02:11:39 UTC. Verified by the 8+ hour gap from wave 3 launch to the next user prompt. |
| Algorithmic faithfulness rule (per family) | Wave-6 LSTM stubs all use a hand-rolled LSTM cell + BPTT; wave-8 evolutionary stubs all evolve weights; Levin/OOPS stubs keep universal search. No shortcut substitutions. | Codified in SPEC #1 before wave 0. Surface symptom: linear-transformers-fwp proves the 1992-FWP ≡ 2021-linear-attention equivalence to 2.22e-16 — couldn’t have happened with a shortcut substitution. |
| mdBook + GitHub Pages for the catalog | Single source for stub READMEs renders as a navigable site. Per-stub GIF assets + viz directories ship together. | After meta PR #16 at 15:50 UTC. Site live at cybertronai.github.io/schmidhuber-problems/. |
What didn’t (and the fix)
| Pattern | Discovery | Fix |
|---|---|---|
Per-stub remote branches (impl/<slug>) were branch spam | 2026-05-07T01:31:11 UTC — Yad: “why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!” | Wave 2+ switched to wave-N-local/<stub> LOCAL ONLY. PR #2 closed, reissued as PR #5 on wave/0-sanity. All impl/<slug> remote branches deleted. |
| Workers committed locally then went silent without sending a summary | Wave 3 (multiple workers): the orchestrator detected idle after-commit and had to nudge | Lead’s first SendMessage to silent workers was “Looks like you committed locally and went idle without sending a summary. Per the wave-3 protocol, please send your summary now.” Pattern repeated at least 4 times across waves 3, 10, 11. Worker prompt template was updated for later waves: “DO NOT GO SILENT — send a summary explicitly.” |
Wave 6 + wave 7 left orphan problem.py stub files | Caught by the per-wave audit agent. Workers wrote the new implementation files but didn’t git rm the placeholder | Lead added a cleanup commit on top of each wave merge. After wave 7, SPEC was updated to emphasize “Remove problem.py explicitly” in every dispatch prompt. No further orphans. |
One commit in wave 3 was authored as agent-pomdp-flag-maze-builder <agent@anthropic.com> | Wave-3 audit at 03:35 UTC flagged it as non-blocking. The per-worktree git config was overridden by Claude Code’s session-default identity | Resolved by a bulk git filter-branch rewrite at 2026-05-08T16:12 UTC — 74 agent-authored commits → Yad Konrad <yad.konrad@gmail.com>. Force-pushed main. Memory now has feedback_git_author.md so this is checked before every commit. |
| GitHub Pages deploy failed first try | 2026-05-08T15:50:41 UTC — “Ensure GitHub Pages has been enabled” | One API call: gh api -X POST repos/cybertronai/schmidhuber-problems/pages -F build_type='workflow'. Workflow re-run succeeded at 15:53. |
| Token math in BUILD_NOTES was first written from memory | Discovered post-merge that the original prose had fabricated counts | Reissued PR #20 docs/token-math-correction (closes #19) — rewrote BUILD_NOTES from the actual JSONL session log. Lesson: prose from memory is unreliable when the source data is right there. |
| Long single pages get dense and hard to navigate | Yad on 2026-05-23: “i feel like the analysis should be better separated, in the sidebar at least” | Sidebar regrouped into 4 sections (Build internals / The orchestration / The worker template / Roadmap). worker-prompt-anatomy.md got an “On this page” TOC and a <details> wrapper around the verbose full-prompt block. |
Open
| Pattern | Why open |
|---|---|
| Was every audit worth its cost? | 12 audit dispatches added ~3-8% overhead each. Wave 0 (1 stub) almost certainly didn’t need one. |
| Could waves with no family overlap run in parallel? | Waves 5 (predictability) and 8 (evolutionary) share no code. They ran serially, costing ~6 wall-hours total; could have been ~3. |
| Worker self-audit instead of separate Explore? | If each worker wrote its own audit notes before sending the summary, the per-wave Explore dispatch could be merged in. Saves one dispatch per wave (12 total). |
cache_write_1h was 36% of total cost. | Each long-running cache invalidation cost real money. Open question whether tighter tool-list discipline (don’t reload tools mid-session) would cut this. |
How to read this with the data
Every row above can be cross-checked:
- Timestamps: grep the
timestampfield in../analysis/data/sessions.jsonlfor the orchestrator session. - Worker silence incidents: filter
../analysis/data/team_messages.tsvforRequest summary messagein theheadcolumn. - Audit dispatches: filter
../analysis/data/agent_dispatches.tsvforsubagent_type=Explore. - Yad’s quotes: see Pivot moments for the full verbatim list with timestamps.