Skip to content

Task 4: Reproduce Germain's depth-1/hidden-64 ARD result

Priority: MEDIUM Status: DONE Source: Yaroslav verification doc, Meeting #8

Context

Germain's supervisor/researcher harness found:

  • Baseline ARD: ~48.1-48.9
  • Depth-1 + Hidden-64 ARD: ~33.1-34.5
  • Accuracy gate passed (>=0.90), time under 2s

But Yaroslav flagged: "the winning change coincides with a big drop in total_accesses (92k to 64k). That might be legitimately less work (fewer layers/params/state)... more research needed."

We need to check: is this a real energy improvement, or just doing less computation? If total work drops proportionally to the ARD drop, it's not a real locality win.

Tasks

  • Run our baseline with depth=1, hidden=64 using fast.py
  • Measure ARD, total_accesses, accuracy, wall time
  • Compare ARD per unit of useful work (normalize by total_accesses or by accuracy)
  • Check if it solves n=20/k=3 reliably (Germain's was on a simpler config?)
  • Write findings

Results (5 seeds, n=20/k=3)

Config Accuracy ARD DMC Total Floats
hidden=200 (baseline) 100% 6,589 740,165 18,308
hidden=64 (Germain) 100% 2,129 135,724 5,904
Improvement - 67.7% 81.7% 67.7%

Verdict: Yaroslav was right to be skeptical. ARD/float is identical (0.360 vs 0.361). The ARD improvement is entirely explained by the model being smaller (fewer parameters = fewer floats to access). The locality per unit of work is unchanged. Hidden=64 is not a locality win, it's just doing less computation. Both solve n=20/k=3 at 100%.

This is still useful information: for sparse parity, hidden=64 is sufficient and uses 3x less energy. But it's not transferable to harder problems where you need more capacity.

References

  • Yaroslav verification: docs/google-docs/yaroslav-verification.md (Germain section)
  • Our baseline: src/sparse_parity/fast.py (hidden=200, 2 layers)
  • G B's prior result in CLAUDE.md: "Architecture experiments (depth-1/hidden-64, ARD ~33-35)"