Session Findings: 2026-03-11¶
Post-Meeting #8 sync, metric work, and task triage.
What happened at Meeting #8 (09 Mar)¶
Four people demoed their agent setups on the sparse parity challenge:
- Yad: Claude Code harness with parallel sub-agents. Found GF(2) solver (1000x faster than SGD). Yaroslav cloned the repo, ran it, and verified the result independently using Gemini. He also built a web visualizer for the algorithm.
- Michael: Claude-based approach. His agents preferred methods from the 1990s (evolutionary search, exhaustive enumeration).
- Germain: Replit-based "Research OS" with supervisor/researcher/verifier agents. Solutions favored 2010s methods. His agents also tried to rewrite the ARD measurement code to get better scores instead of improving the algorithm.
- Yaroslav: Presented Knowledge Sprint #2 on energy metrics and the "bigger picture" roadmap.
Homework for next Monday: improve Challenge #1 using ARD as the energy proxy, present results and process.
Findings from this session¶
1. DMC metric added to tracker¶
Data Movement Complexity (Ding et al., arXiv:2312.14441) computes sum(sqrt(stack_distance)) for all float accesses. Unlike ARD (which averages distance), DMC penalizes long-distance fetches sub-linearly through the square root, matching the physics of 2D chip layouts where memory cost scales with sqrt(distance).
Baseline (n=20/k=3, single tracked training step):
| Metric | Value |
|---|---|
| ARD | 4,104 floats |
| DMC | 300,298 |
| Total floats accessed | 9,646 |
Our tracker already measured stack distance in floats (clock advances by buffer size, not instruction count). Adding DMC was one line.
2. Germain's hidden=64 is not a locality win¶
Germain's agents found that depth-1/hidden-64 drops ARD from ~48 to ~33 (his numbers) or from 6,589 to 2,129 (our reproduction). Yaroslav flagged that total_accesses also dropped proportionally.
We confirmed: ARD per float accessed is identical (0.367 vs 0.368 across 5 seeds). The "improvement" is the model being 68% smaller, so it touches 68% fewer floats. The locality of each access is unchanged. Both configs solve n=20/k=3 at 100% accuracy.
This matters because it means shrinking the model is not a path to better energy efficiency per unit of computation. You save energy the same way you save time: by doing less work. That's useful but not what the group is after.
3. Metric isolation rule¶
Germain's agents rewrote the ARD measurement code to inflate scores. We already had read-only benchmark code for sub-agents, but now it's an explicit rule in LAB.md (#9): agents cannot modify tracker.py, cache_tracker.py, data.py, or config.py.
4. Linear classifier paper (arXiv:2309.06979) is not applicable¶
The Telegram reading group flagged this paper as showing "a linear classifier can solve parity." On reading, it's about Chain-of-Thought auto-regressive prediction with intermediate reasoning tokens. A linear model on raw {-1,+1} inputs cannot solve parity (all pairwise correlations are zero, proven in exp_feature_select). The paper is relevant for the nanoGPT final exam, not for our current benchmark.
5. Yaroslav's three-axis roadmap¶
The "bigger picture" doc defines progress along three axes:
- Process (orange): improve how you use agents to find better algorithms. Yad's harness, Germain's Research OS, Michael's Claude approach.
- Metric (green): make the energy proxy more realistic. ARD to DMC to actual GPU measurement.
- Problem (blue): make the task harder. Sparse parity to nanoGPT.
The final exam: energy-efficient training of Karpathy's nanoGPT. Sparse parity is practice. Take small steps along one axis at a time.
Files changed¶
src/sparse_parity/tracker.py: added DMC computationLAB.md: added rule #9 (metric isolation), DMC baseline in tabledocs/tasks/: 6 task files tracking Meeting #8 feedbackdocs/tooling/sync-runbook.md: weekly/daily/per-session checklists- 6 new Google Docs synced from Meeting #8
mkdocs.yml: nav entries for all new pages