Task 4: Reproduce Germain's depth-1/hidden-64 ARD result¶

Priority: MEDIUM Status: DONE Source: Yaroslav verification doc, Meeting #8

Context¶

Germain's supervisor/researcher harness found:

Baseline ARD: ~48.1-48.9
Depth-1 + Hidden-64 ARD: ~33.1-34.5
Accuracy gate passed (>=0.90), time under 2s

But Yaroslav flagged: "the winning change coincides with a big drop in total_accesses (92k to 64k). That might be legitimately less work (fewer layers/params/state)... more research needed."

We need to check: is this a real energy improvement, or just doing less computation? If total work drops proportionally to the ARD drop, it's not a real locality win.

Tasks¶

Run our baseline with depth=1, hidden=64 using fast.py
Measure ARD, total_accesses, accuracy, wall time
Compare ARD per unit of useful work (normalize by total_accesses or by accuracy)
Check if it solves n=20/k=3 reliably (Germain's was on a simpler config?)
Write findings

Results (5 seeds, n=20/k=3)¶

Config	Accuracy	ARD	DMC	Total Floats
hidden=200 (baseline)	100%	6,589	740,165	18,308
hidden=64 (Germain)	100%	2,129	135,724	5,904
Improvement	-	67.7%	81.7%	67.7%

Verdict: Yaroslav was right to be skeptical. ARD/float is identical (0.360 vs 0.361). The ARD improvement is entirely explained by the model being smaller (fewer parameters = fewer floats to access). The locality per unit of work is unchanged. Hidden=64 is not a locality win, it's just doing less computation. Both solve n=20/k=3 at 100%.

This is still useful information: for sparse parity, hidden=64 is sufficient and uses 3x less energy. But it's not transferable to harder problems where you need more capacity.

References¶

Yaroslav verification: docs/google-docs/yaroslav-verification.md (Germain section)
Our baseline: src/sparse_parity/fast.py (hidden=200, 2 layers)
G B's prior result in CLAUDE.md: "Architecture experiments (depth-1/hidden-64, ARD ~33-35)"