Technical Sprint 2 Plan: COMPLETED¶
Historical
This plan was written after Sprint 1 and has been fully executed. All candidate algorithms were tested. See the findings linked below.
Background¶
Sprint 1 showed that gradient fusion improves ARD by ~16%, but the real bottleneck is parameter tensors read twice across the full forward+backward pass. Different algorithms are needed.
Candidate Algorithms (All Tested)¶
| Algorithm | Result | Finding |
|---|---|---|
| Forward-Forward | 25x WORSE ARD, fails on 20-bit | Exp E |
| Per-layer update | 3.8% ARD improvement, converges identically | Exp C |
| Sign SGD | Solves k=5, 2x faster than standard SGD | findings |
| Curriculum learning | 14.6x speedup on n=50/k=3 | findings |
| Fourier solver | 13x faster than SGD for small k | findings |
Completed Tasks¶
- Implement Forward-Forward on 3-bit parity as baseline
- Measure ARD and compare to standard backprop
- Try per-layer update scheme
- Scale to sparse parity (20 bits, 3 relevant)
- Document findings and prompting strategies
Answered Questions¶
- Does Forward-Forward converge on sparse parity? Only 3-bit. Fails on 20-bit due to greedy layer-wise objective.
- What's the theoretical minimum ARD for this task? W1 dominates at 75% of reads. Operation reordering capped at ~10% improvement.
- Can we combine approaches? Per-layer + batch works but isn't useful. Single-sample SGD is 8x faster.
What's Next¶
See DISCOVERIES.md for open questions and TODO.md for remaining tasks.