Experiment exp_fourier: Fourier/Walsh-Hadamard Solver for Sparse Parity
Date: 2026-03-04
Status: SUCCESS
Answers: Blank-slate approach. Can we solve sparse parity without neural nets?
Hypothesis
If we compute Walsh-Hadamard correlations for every k-subset, the true secret will have correlation ~1.0 while all others are ~0, giving an exact solver in O(C(n,k) * n_samples) time with no training.
Config
| Parameter |
Value |
| n_bits |
20, 50, 100, 200 |
| k_sparse |
3, 5, 7 |
| hidden |
N/A (no neural net) |
| lr |
N/A |
| wd |
N/A |
| batch_size |
N/A |
| max_epochs |
N/A |
| n_train |
500 (20 sufficient for k=3) |
| seed |
42, 43, 44 |
| method |
fourier (Walsh-Hadamard exhaustive search) |
Results
| Metric |
Value |
| Best test accuracy |
100% (all configs) |
| Epochs to >90% |
N/A (single pass) |
| Wall time (n=20,k=3) |
0.009s |
| Wall time (n=50,k=3) |
0.16s |
| Wall time (n=20,k=5) |
0.14s |
| Weighted ARD (n=20,k=3) |
1,147,375 |
| ARD improvement vs baseline |
N/A (different algorithm class) |
Summary Table
| Config |
Method |
Acc |
Time |
Subsets |
Samples |
| n=20,k=3 |
Fourier |
100% |
0.009s |
1,140 |
500 |
| n=20,k=3 |
SGD (fast.py) |
100% |
0.12s |
— |
1000 |
| n=50,k=3 |
Fourier |
100% |
0.16s |
19,600 |
500 |
| n=50,k=3 |
SGD direct |
54% FAIL |
— |
— |
200 |
| n=50,k=3 |
Curriculum |
>90% |
— |
— |
— |
| n=20,k=5 |
Fourier |
100% |
0.14s |
15,504 |
500 |
| n=20,k=5 |
SGD (n=5000) |
100% |
— |
— |
5000 |
| n=100,k=3 |
Fourier |
100% |
1.3s |
161,700 |
500 |
| n=200,k=3 |
Fourier |
100% |
10.8s |
1,313,400 |
500 |
| n=20,k=7 |
Fourier |
100% |
0.7s |
77,520 |
500 |
Analysis
What worked
- Perfect accuracy on every config tested. Correlation for the true subset is always 1.0.
- 13x faster than SGD on n=20,k=3 (0.009s vs 0.12s).
- Solves n=50,k=3 trivially where SGD fails (54%) and curriculum needs 20 epochs.
- Solves n=20,k=7, 77,520 subsets in 0.7s.
- Only needs ~20 samples for k=3 (vs 500-5000 for SGD). Sample complexity is O(1/epsilon^2), independent of n.
- Scales to n=200,k=3 in 10.8s (1.3M subsets).
What didn't work
- ARD is terrible: 1,147,375 weighted ARD vs 17,976 for SGD. The algorithm reads the entire dataset for each subset, pure streaming with no locality.
- Combinatorial wall at high k: C(n,k) grows fast. C(50,5) = 2.1M, C(100,5) = 75M, C(50,7) = 99M. The method is O(C(n,k)), polynomial in n for fixed k but explodes with k.
- Not a learning algorithm. It is a brute-force search that does not generalize or transfer.
Surprise
- The Fourier solver is 13x faster than the best SGD on n=20,k=3. A trivial brute-force beats the optimized neural net approach.
- Only 20 samples are needed for reliable k=3 detection. SGD uses 500-5000.
Open Questions (for next experiment)
- Can we prune the subset search with single-variable correlations first? (Check each bit's individual correlation, discard low ones, then only check k-subsets of the top candidates.)
- Hybrid: use Fourier to find the secret, then a trivial classifier (just compute the product). What is the total energy cost vs SGD?
- At what (n,k) does SGD actually beat Fourier in wall time?
Files
- Experiment:
src/sparse_parity/experiments/exp_fourier.py
- Results:
results/exp_fourier/results.json