Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Session Report: Building schmidhuber-problems via Agent Teams

Output: cybertronai/schmidhuber-problems — 58 stubs, 13 PRs (14 created, 1 closed-and-reissued), all merged Source log: ~/.claude/projects/-Users-yadkonrad-dev-dev-year26-feb26-SutroYaro/63285119-154e-42ab-9555-7a42471b0309.jsonl (2,282 events) Span: 2026-05-06T23:03 → 2026-05-08T16:16 UTC (~41.3 wall hours) Lead session: SutroYaro Companion to: hinton-problems BUILD_NOTES (53 Hinton stubs, May 1-3)

This report is reconstructed from the live session log, not from memory. Earlier drafts had fabricated counts; this revision is the source-of-truth version.


TL;DR for the video opener

  • 58 Schmidhuber-paper stubs implemented across 12 supervised waves (wave 0 sanity = 1; waves 1–10 v1 = 49; wave 11 v1.5 = 8). Pure numpy + matplotlib. All <5 min/seed on a laptop.
  • The SPEC was a single GitHub issue (#1) — adapted from hinton-problems issue #1.
  • The dispatcher was Claude Code’s agent-teams primitive — one team schmidhuber-impl (agent_type: orchestrator), 12 waves, fresh teammates per wave.
  • Two human prompts mid-run reshaped the build:
    • 2026-05-07T01:31:11Z — “why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!” → wave 1 → wave 2 protocol pivot to local-only wave-N-local/<slug> branches.
    • 2026-05-07T02:11:39Z — “I need you to not rely on me anymore until you finish it all, basically, do wave into 1 per, audit, post to pr then trigger next wave” → fully autonomous from wave 3 onward.
  • One honest non-replication (hq-learning-pomdp) acknowledged in the wave-3 audit at 2026-05-07T03:35Z, with mathematical analysis (γ^Δt · HV ≤ R_goal bound).
  • Post-merge author rewrite at 2026-05-08T16:12Z fixed git authorship across the entire repo via git filter-branch: 74 agent-authored commits → Yad Konrad <yad.konrad@gmail.com>.

The actual chain of events

Timestamp (UTC)Event
2026-05-06T23:03:33Session opens in SutroYaro
2026-05-06T23:03:37Yad invokes sutro-sync skill — only skill call in the entire session — to pull Telegram + Google Docs + GitHub state. Surfaces Yaroslav’s Schmidhuber suggestion.
2026-05-06T23:09:41Lead dispatches first Explore audit subagent: “Survey schmidhuber-problems repo”
2026-05-06T23:20:41SPEC opened as issue #1 — the contract for every teammate. Title: “Spec: minimum implementation requirements for Schmidhuber-problem stubs (v1)”
2026-05-06T23:24:21First teammate dispatched: nbb-xor-builder (wave 0 sanity)
2026-05-06T23:56:21Wave-0 PR opened on impl/nbb-xor (PR #2)
2026-05-06T23:56:38v1.5 follow-up issue #3 opened
2026-05-07T00:11:17Yad: “alright shall we do clean up and dispathc multiple agents to finish the rest of the waves?” — wave 1 trigger
2026-05-07T00:20:49Wave 1 dispatch begins (6 teammates)
2026-05-07T01:31:11Yad: “why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!”
2026-05-07T01:38:19PR #2 closed; reissued as PR #5 on wave/0-sanity branch. All impl/<slug> remote branches deleted. From wave 2+, per-stub branches stay LOCAL ONLY.
2026-05-07T01:28:53Wave 1 PR #4 opened (wave/1-search)
2026-05-07T01:57:22Wave 2 dispatch begins (5 teammates)
2026-05-07T02:11:39Yad: “I need you to not rely on me anymore until you finish it all… do wave into 1 per, audit, post to pr then trigger next wave” — autonomous mode engaged
2026-05-07T02:33:12Wave 2 PR #6 opened
2026-05-07T03:35:08Wave 3 audit: lead acknowledges hq-learning-pomdp as honest non-replication (“paper’s HQ-vs-flat headline gap does NOT reproduce on the 29-cell maze. Implementation faithful”)
2026-05-07T12:16:45Wave 3 PR #7 opened
2026-05-07T12:49:16Wave 4 PR #8 opened
2026-05-07T13:15:48Wave 5 PR #9 opened
2026-05-07T14:33:36Wave 6 PR #10 opened (cleanup commit on top: removed orphan noise-free-long-lag/problem.py)
2026-05-07T15:28:24Wave 7 PR #11 opened (cleanup commit on top: removed orphan blues-improvisation/problem.py)
2026-05-07T16:57:11Wave 8 PR #12 opened
2026-05-07T17:22:01Wave 9 PR #13 opened
2026-05-07T18:07:35Wave 10 PR #14 opened — v1 complete at 50/50
2026-05-08T12:07:27Wave 11 (v1.5) dispatch begins (8 teammates for heavyweight-env stubs)
2026-05-08T14:49:01Wave 11 PR #15 opened — v1+v1.5 complete at 58/58
2026-05-08T15:38:20Meta PR #16 opened (mdBook config, BUILD_NOTES, RESULTS, VISUAL_TOUR, README catalog, GH Pages workflow)
2026-05-08T15:49:49All 13 PRs merged via gh pr merge in sequence
2026-05-08T15:50:41First Pages deploy attempt fails: “Ensure GitHub Pages has been enabled”
2026-05-08T15:53:21Pages enabled via gh api -X POST repos/.../pages -F build_type='workflow'; workflow re-run; site live at https://cybertronai.github.io/schmidhuber-problems/
2026-05-08T16:09:24Yad: “wtf why its claude agent-0bserver07 and not fucking claude 0bserver07? claude agent-0bserver07 was for comment only”
2026-05-08T16:12:01git filter-branch rewrite: 74 agent-authored commits → Yad Konrad <yad.konrad@gmail.com>. Force-pushed main. Site rebuilt with corrected attribution.
2026-05-08T~16:14README formatting polish (header bullets, lineage paragraph broken into bullet list) per Yad’s feedback.
2026-05-08T16:16:50Last logged event in this session

The SPEC (issue #1) — the actual contract

The contract between Yad and every teammate was a single GitHub issue. Not chat. Not a system prompt. An issue every PR linked back to.

It defined:

  • Required files per stub: <slug>.py, README.md, make_<slug>_gif.py, visualize_<slug>.py, <slug>.gif, viz/
  • 8 README sections: Header / Problem / Files / Running / Results / Visualizations / Deviations / Open questions
  • Reproducibility rules: seed exposed via CLI, all hyperparameters in Results, command in §Running reproduces the number
  • Acceptance checklist (10 boxes)
  • Schmidhuber-specific additions:
    • Algorithmic faithfulness > optimizer convenience: long-time-lag stubs use the paper’s recurrent architecture; evolutionary stubs use the paper’s evolutionary optimizer; Levin/OOPS stubs keep universal search. No backprop shortcuts.
    • Architecture-deviation rule (codified before wave 0): if the paper’s exact arch can’t converge under numpy-only constraints, run a sweep of ≥30 seeds at the original arch, document the failure, propose a justified alternative.
    • RL-stub rule: numpy mini-environments. No gym/gymnasium. Original-simulator reruns deferred to v2.

The orchestration model

                     ┌──────────────────┐
                     │ schmidhuber-impl │  (TeamCreate, agent_type=orchestrator)
                     └─────────┬────────┘
                               │
                  ┌────────────┼────────────┐
                  │            │            │
            Wave 0/1/…/11  SendMessage   Subagent dispatches
                               │            │
                               ▼            ▼
                          ┌──────────┐  ┌──────────────┐
                          │ teammates │  │ Agent tool   │
                          │ <slug>-   │  │ general-     │
                          │ builder   │  │ purpose 58×  │
                          │ x58       │  │ Explore  15× │
                          └────┬─────┘  └──────┬───────┘
                               │               │
                               ▼               ▼
                       worktree branch    PR audits, code reads
                       wave-N-local/<slug>
                               │
                               ▼
                       (LOCAL ONLY — DO NOT PUSH)
                               │
                               ▼
                       lead octopus-merges into wave/N-<family>
                               │
                               ▼
                       gh pr create → wave PR
                               │
                               ▼
                       audit subagent → audit comment on PR
                               │
                               ▼
                       SendMessage(shutdown_request)
                               │
                               ▼
                          Next wave starts fresh

Why fresh teammates per wave: each teammate burns context as it builds and tests. Shutting down between waves keeps later waves running on full context windows. The lead persists; the workers turn over.

Why LOCAL ONLY per-stub branches (the wave-1 → wave-2 fix): pushing 6 impl/<slug> branches per wave to remote was branch spam. Yad called it out at 2026-05-07T01:31. Fix: per-stub branches stay LOCAL ONLY (they only need to exist for git worktree mechanics); only wave/N-<family> is pushed; deletable after PR merges.


What the session actually used (verified counts from the JSONL)

Tool calls in the lead session

ToolCallsWhat for
Bash140git, gh CLI, file ops, running tests, workflow checks
Agent73subagent dispatches: 58 general-purpose builders + 15 Explore auditors
SendMessage69inter-teammate messaging (shutdowns + summary requests)
TaskUpdate34shared task list maintenance
Read16reading paper PDFs, stub code, READMEs
TaskCreate15new tasks added to the team’s list
Write11new files (READMEs, scripts, configs)
Edit10small in-place edits
AskUserQuestion7direction-clarifying questions to Yad
ToolSearch3loading deferred tool schemas
Skill1only sutro-sync at session start
TaskList1one snapshot
TeamCreate1the schmidhuber-impl team itself
TeamDelete1end-of-session cleanup

Subagent dispatches (Agent tool, n=73)

TypeCountUse
general-purpose58per-stub builders (one per stub across 12 waves)
Explore15initial repo survey + 12 per-wave audits + 2 BUILD_NOTES data-extraction passes

GitHub artifacts produced

  • 2 issues created: #1 (SPEC) + #3 (v1.5 follow-up)
  • 14 PRs created: PR #2 (closed and reissued as #5), PRs #4, #5, #6, #7, #8, #9, #10, #11, #12, #13, #14, #15, #16
  • 13 PR audit comments (one per wave PR)
  • 2 cleanup commits on top of wave merges: wave 6 (noise-free-long-lag/problem.py orphan removed), wave 7 (blues-improvisation/problem.py orphan removed)
  • 13 PR merges in one batch (gh pr merge × 13 in sequence) at 2026-05-08T15:49
  • 1 repo edit to set the homepage URL
  • 1 GH API call to enable Pages with workflow build type

The waves at a glance

WaveFamilyStubsFirst dispatch (UTC)PR opened (UTC)PR #
0Sanity12026-05-06T23:242026-05-07T01:38#5
1Random search + universal program search62026-05-07T00:202026-05-07T01:28#4
2Local rules + world-model controllers52026-05-07T01:572026-05-07T02:33#6
3Online RL with hidden state52026-05-07T01:582026-05-07T12:16#7
4History compression + fast-weights + self-reference52026-05-07T03:082026-05-07T12:49#8
5Predictability min/max + unsupervised features42026-05-07T03:152026-05-07T13:15#9
6LSTM canonical battery (BPTT, half 1)62026-05-07T09:132026-05-07T14:33#10
7LSTM follow-ups52026-05-07T10:252026-05-07T15:28#11
8Evolutionary42026-05-07T11:362026-05-07T16:57#12
9Deep MLPs at scale42026-05-07T12:422026-05-07T17:22#13
10Object-centric + attention + modern52026-05-07T13:522026-05-07T18:07#14
11v1.5 — heavyweight-env stubs (numpy synthetic substitutes)82026-05-08T12:072026-05-08T14:49#15

Plus the meta PR (#16) for site + BUILD_NOTES + RESULTS + VISUAL_TOUR + README catalog at 2026-05-08T15:38.

Total: 58 stubs in 12 waves + 1 meta PR.


Yad’s interaction pattern (the human side)

Three classes of prompt drove the project. Two stand out as direction-changing:

Type A — high-leverage direction (rare, big effects)

Timestamp (UTC)Quote
2026-05-07T00:11:17“alright shall we do clean up and dispathc multiple agents to finish the rest of the waves?” — wave-1 trigger
2026-05-07T01:31:11“why are u doing a branch per impl, should it be per waves?? why the branch spam. THIS IS WRONG PRACTICE COURSE CORRECT!” — wave 1 → 2 protocol pivot
2026-05-07T02:11:39“I need you to not rely on me anymore until you finish it all, basically, do wave into 1 per, audit, post to pr then trigger next wave” — autonomous-mode engaged
2026-05-08T16:09:24“wtf why its claude agent-0bserver07 and not fucking claude 0bserver07? claude agent-0bserver07 was for comment only” — git-author rewrite trigger
2026-05-08T~16:14“this needs to be on new line and readable” — README formatting fix

Type B — status checks (frequent, low cost)

  • “status?” / “status, what is left?” / “whats left rl?” — appears multiple times. Lead summarizes per-wave progress and continues.

Type C — review and merge approvals

  • “review it/audit and post the comment, then dispatch after please” (set the audit-then-dispatch loop)
  • “finish everything and deal with the full impelmentations” / “BUT FIRST FIRST FINISH THESE THINGS REMAINING” — wave 11 (v1.5) trigger
  • “have we verified thse things to be truely done or left over?” — surfaced the unmerged-PRs gap; explicit merge instruction followed

The session’s pivot moments are the corrections, not the kickoffs. The wave 1 → wave 2 branch-protocol fix and the wave-3 autonomous-mode engagement are what reshaped the build’s structure.


Honest non-replication: hq-learning-pomdp

Acknowledged in the wave-3 audit summary at 2026-05-07T03:35:08Z:

“Both HQ and flat Q solve during training (~100%) but both fail at 0% greedy eval — the paper’s HQ-vs-flat headline gap does NOT reproduce on the 29-cell maze. Implementation faithful, honest about the gap with mathematical analysis (γ^Δt · HV ≤ R_goal bound).”

This is exactly the SPEC’s methodological caveat applied: where the empirical headline of a paper does not reproduce on a smaller / faithful implementation, the contributor flags it honestly with the mechanistic reason, rather than fudging the result. The paper’s 62-cell maze is queued as a v1.5 follow-up.


Mid-run errors and recoveries

Three concrete error recoveries are visible in the session log:

  1. Wave 6 / 7 orphan problem.py files: When teammates wrote new stub files but didn’t git rm the placeholder problem.py, the audit subagent caught it. The lead added a cleanup commit on top of each wave merge. After wave 7, the SPEC’s “remove problem.py explicitly” was emphasized in every dispatch prompt; no further orphans appeared.

  2. GitHub Pages-not-enabled error: First deploy attempt at 2026-05-08T15:50:41 failed with “Ensure GitHub Pages has been enabled”. The build succeeded; the deploy step couldn’t create the deployment because Pages wasn’t enabled at the repo level. Fix: gh api -X POST repos/cybertronai/schmidhuber-problems/pages -F build_type='workflow'. Workflow re-run completed at 15:53:34.

  3. Git author drift: One commit in wave 3 was authored as agent-pomdp-flag-maze-builder <agent@anthropic.com> (the subagent’s session-default identity overrode the per-worktree config of agent-0bserver07@users.noreply.github.com). Caught in wave-3 audit; non-blocking. Resolved later by the bulk filter-branch rewrite at 2026-05-08T16:12.


What this session actually proves

  1. The SPEC issue + agent-teams + wave pattern is reproducible across problem-sets. Second use of the machinery (first: hinton-problems, 53 stubs in 30 hours, May 1-3). For a different lineage (algorithmic vs representational) with 58 stubs and harder constraints (RL-stub rule, algorithmic faithfulness rule), the same machinery shipped in ~41 wall hours.
  2. Mid-run protocol fixes work. Wave 1’s branch spam got corrected within minutes of Yad’s pushback. Wave 6/7’s orphan stubs got fixed via cleanup commits on top of merges. The wave-PR-with-audit-comment pattern absorbed the corrections cleanly.
  3. Honest non-replications are part of the deliverable, not a bug. hq-learning-pomdp ships with mathematical analysis. The honest report > a fudged success.
  4. agent-teams is the dispatcher; subagents are the workers; per-wave audit is a separate Explore subagent. Same machinery used in three layers, three different roles.
  5. Numpy-only constraint is enforceable across the catalog. 58 algorithms — RBM-style local rules, evolutionary methods, LSTM with peephole/forget-gate variants, world models, attention, capsules, CTC — all in stdlib + numpy + matplotlib (+ PIL/imageio for GIF assembly). MNIST loaded via urllib + gzip + struct from public mirrors.
  6. Post-merge author rewrite is feasible. When git author identity is wrong on a fresh repo with a sole owner, git filter-branch + force-push fixes it cleanly.

Concrete numbers

  • 58 / 58 v1+v1.5 stubs implemented (100%)
  • 32 reproduce paper claims (yes), 25 partial / qualitative (or synthetic substitute), 1 honest non-replication (with documented mathematical analysis)
  • 41.3 wall hours end-to-end (May 6 23:03 → May 8 16:16 UTC, 3 distinct days)
  • 2 GitHub issues, 14 PRs created (1 closed-and-reissued), 13 audit comments, 13 merges in one batch
  • 1 TeamCreate, 1 TeamDelete, 58 named builders + 15 audit subagents
  • Pure numpy + matplotlib, all under 5-min wallclock per stub except pipe-6-bit-parity (240s 6-bit cap), evolino-sines-mackey-glass (140s), lstm-search-space-odyssey (145s)
  • Algorithmic-faithfulness coverage: 9 RL stubs (numpy mini-envs per SPEC), 11 LSTM-family stubs (manual BPTT through cells with various gate variants), 4 evolutionary stubs (no gradient on hidden weights), 3 search stubs (Levin / OOPS / RS), 8 v1.5 substitutes (synthetic numpy data instead of TIMIT/IAM/ISBI/CarRacing/VizDoom/TORCS), 1 equivalence proof (linear-attention ≡ FWP to 2.22e-16)

Suggested video shot list

  1. Open on the SPEC issue (#1) on screen. “This is the entire contract.”
  2. Cut to the GitHub PRs page showing the 13 merged wave PRs.
  3. Show the Hinton precedent side-by-side. “Same machinery, different lineage. 53 stubs there, 58 here.”
  4. The branch-spam moment: paste Yad’s “THIS IS WRONG PRACTICE COURSE CORRECT!” (2026-05-07T01:31) and show the wave-1 → wave-2 protocol fix at 01:38 (PR #2 closed, PR #5 opened on wave/0-sanity). “This is what ‘human in the loop’ actually looks like.”
  5. The autonomous-mode pivot: paste Yad’s “I need you to not rely on me anymore until you finish it all” (2026-05-07T02:11) and show the lead running the audit-merge-dispatch loop without further user prompts through wave 11.
  6. Walk through one wave — pick wave 4 (history compression + fast-weights + self-reference, 5 stubs). Show the 5 teammate names, the consolidation into wave/4-history-fastweights, the audit comment, the merge.
  7. Show a single per-stub README (e.g., linear-transformers-fwp) — show how it satisfies all 8 spec sections AND verifies the 1992-FWP / 2021-linear-attention equivalence to 2.22e-16.
  8. Show the v1.5 wave — even the heavyweight-env stubs (TIMIT, IAM, ISBI, CarRacing, VizDoom, TORCS) ship as numpy synthetic substitutes, captured in the same machinery.
  9. The Pages-not-enabled error + 1-API-call fix. “Mid-run errors are part of the loop. The recovery is the boring obvious thing.”
  10. The git-author rewrite (2026-05-08T16:12). “58 commits, wrong author. git filter-branch + force-push, three minutes.”
  11. Close on the bottom-line numbers (58 stubs / 41 wall hours / pure numpy / 1 spec / 12 waves / 1 honest non-replication / 1 closed-and-reissued PR / 13 merges in one batch).

Generated from the live session log at ~/.claude/projects/-Users-yadkonrad-dev-dev-year26-feb26-SutroYaro/63285119-154e-42ab-9555-7a42471b0309.jsonl on 2026-05-08. Mirrors the hinton-problems BUILD_NOTES precedent. Source-of-truth revision; replaces the earlier draft that had fabricated counts.