Experiment exp_feature_select: Feature Selection, Blank Slate Sparse Parity¶
Date: 2026-03-04 Status: PARTIAL Answers: Blank slate. Can we beat SGD without gradient descent?
Hypothesis¶
If we separate SEARCH (find which k bits matter) from LEARNING (compute parity on those bits), we can solve sparse parity faster than end-to-end SGD by exploiting the binary structure directly.
Config¶
| Parameter | Value |
|---|---|
| n_bits | 20, 50 |
| k_sparse | 3, 5 |
| hidden | 200 (SGD baseline) |
| lr | 0.1 (SGD baseline) |
| wd | 0.01 (SGD baseline) |
| batch_size | 32 (SGD baseline) |
| max_epochs | 200 (SGD baseline) |
| n_train | 1000 (k=3), 5000 (k=5) |
| seed | 42 |
| method | pairwise, greedy, exhaustive, SGD |
Results¶
| Metric | Value |
|---|---|
| Pairwise correct | 0/3 scenarios (provably broken) |
| Greedy correct | 0/3 scenarios (provably broken) |
| Exhaustive correct | 3/3 scenarios (always works) |
| Exhaustive vs SGD ops | 178x–1203x fewer ops |
| Exhaustive vs SGD time | 4.8x–86x faster wall time |
| Exhaustive solves n=50/k=3 | Yes (SGD fails at 56%) |
Summary Table¶
| Scenario | Method | Correct | Time (s) | Ops | Speedup (ops) |
|---|---|---|---|---|---|
| n=20,k=3 | Pairwise | NO | 0.001 | 570K | — |
| n=20,k=3 | Greedy | NO | 0.001 | 169K | — |
| n=20,k=3 | Exhaustive | YES | 0.002 | 652K | 1203x |
| n=20,k=3 | SGD | YES | 0.149 | 784M | 1x |
| n=50,k=3 | Exhaustive | YES | 0.127 | 49M | 163x |
| n=50,k=3 | SGD | NO (56%) | 0.603 | 8B | FAIL |
| n=20,k=5 | Exhaustive | YES | 0.026 | 19M | 178x |
| n=20,k=5 | SGD | YES | 0.559 | 3.4B | 1x |
Analysis¶
What worked¶
- Exhaustive combo check solves everything: C(n,k) product-accuracy test finds the secret every time. 100% accuracy.
- Massive ops advantage: 178x–1203x fewer operations than SGD. The exhaustive method does O(C(n,k) * n_samples * k) ops vs SGD's O(epochs * n_train * hidden * n_bits).
- Solves cases SGD cannot: n=50/k=3 is trivial for exhaustive (0.13s, 19.6K combos) but SGD fails at 56% even after 200 epochs.
- Early termination helps: On average, the correct combo is found after checking ~half the combos (163/1140 for n=20/k=3 due to ordering).
What didn't work¶
- Pairwise detection is provably broken: For k-parity, E[y * x_i * x_j] = 0 for ALL pairs, even correct ones. Reason: y = prod(x_secret), so yx_ix_j leaves (k-2) random ±1 bits whose expectation is 0. Parity is invisible to any correlation test below order k.
- Greedy forward selection is provably broken: Same root cause. E[y * x_i] = 0 for all i. The first step picks a random bit, then everything cascades wrong. Greedy needs a nonzero signal to climb, but parity gives zero signal for any partial subset.
Surprise¶
Parity is cryptographically hard for correlation methods. Any statistical test of order < k gives zero signal. This is why SGD needs grokking: the network must discover the full k-way interaction through nonlinear composition across layers. The exhaustive method works because it tests the full k-way product directly. The Fourier/Walsh-Hadamard approach (which tests all k-way interactions) is the natural solution for the same reason.
Scaling Analysis¶
Exhaustive has complexity O(C(n,k)) combos: - n=20, k=3: C(20,3) = 1,140, instant - n=50, k=3: C(50,3) = 19,600, instant (0.13s) - n=100, k=3: C(100,3) = 161,700, still fast (~1s) - n=20, k=5: C(20,5) = 15,504, fast (0.03s) - n=50, k=5: C(50,5) = 2,118,760, feasible (~10s) - n=100, k=5: C(100,5) = 75,287,520, minutes but doable - n=100, k=10: C(100,10) = 17 trillion, intractable
So exhaustive search works well for small k (up to ~7-8) regardless of n, but becomes intractable for large k. This is where SGD's implicit search through gradient descent still has value.
Open Questions (for next experiment)¶
- Can we make pairwise work with higher-order interactions (test k-1 subsets to narrow candidates)?
- How does exhaustive compare to Walsh-Hadamard transform (both are O(C(n,k)) but WHT might have better constants)?
- What's the crossover point where SGD beats exhaustive (probably around k=8-10)?
Files¶
- Experiment:
src/sparse_parity/experiments/exp_feature_select.py - Results:
results/exp_feature_select/results.json