Skip to content

Technical Sprint 2 Plan: COMPLETED

Historical

This plan was written after Sprint 1 and has been fully executed. All candidate algorithms were tested. See the findings linked below.

Background

Sprint 1 showed that gradient fusion improves ARD by ~16%, but the real bottleneck is parameter tensors read twice across the full forward+backward pass. Different algorithms are needed.

Candidate Algorithms (All Tested)

Algorithm Result Finding
Forward-Forward 25x WORSE ARD, fails on 20-bit Exp E
Per-layer update 3.8% ARD improvement, converges identically Exp C
Sign SGD Solves k=5, 2x faster than standard SGD findings
Curriculum learning 14.6x speedup on n=50/k=3 findings
Fourier solver 13x faster than SGD for small k findings

Completed Tasks

  • Implement Forward-Forward on 3-bit parity as baseline
  • Measure ARD and compare to standard backprop
  • Try per-layer update scheme
  • Scale to sparse parity (20 bits, 3 relevant)
  • Document findings and prompting strategies

Answered Questions

  • Does Forward-Forward converge on sparse parity? Only 3-bit. Fails on 20-bit due to greedy layer-wise objective.
  • What's the theoretical minimum ARD for this task? W1 dominates at 75% of reads. Operation reordering capped at ~10% improvement.
  • Can we combine approaches? Per-layer + batch works but isn't useful. Single-sample SGD is 8x faster.

What's Next

See DISCOVERIES.md for open questions and TODO.md for remaining tasks.