Technical Sprint 2 Plan: COMPLETED¶

Historical

This plan was written after Sprint 1 and has been fully executed. All candidate algorithms were tested. See the findings linked below.

Background¶

Sprint 1 showed that gradient fusion improves ARD by ~16%, but the real bottleneck is parameter tensors read twice across the full forward+backward pass. Different algorithms are needed.

Candidate Algorithms (All Tested)¶

Algorithm	Result	Finding
Forward-Forward	25x WORSE ARD, fails on 20-bit	Exp E
Per-layer update	3.8% ARD improvement, converges identically	Exp C
Sign SGD	Solves k=5, 2x faster than standard SGD	findings
Curriculum learning	14.6x speedup on n=50/k=3	findings
Fourier solver	13x faster than SGD for small k	findings

Completed Tasks¶

Implement Forward-Forward on 3-bit parity as baseline
Measure ARD and compare to standard backprop
Try per-layer update scheme
Scale to sparse parity (20 bits, 3 relevant)
Document findings and prompting strategies

Answered Questions¶

Does Forward-Forward converge on sparse parity? Only 3-bit. Fails on 20-bit due to greedy layer-wise objective.
What's the theoretical minimum ARD for this task? W1 dominates at 75% of reads. Operation reordering capped at ~10% improvement.
Can we combine approaches? Per-layer + batch works but isn't useful. Single-sample SGD is 8x faster.

What's Next¶

See DISCOVERIES.md for open questions and TODO.md for remaining tasks.