Experiment exp_evolutionary: Evolutionary/Random Search for Sparse Parity¶
Date: 2026-03-04 Status: SUCCESS Answers: Blank slate approach. Can sparse parity be solved without neural nets or gradients?
Hypothesis¶
If we search directly over k-subsets of input bits, random search should need ~C(n,k) tries on average, and evolutionary search (with tournament selection + mutation) should beat that by exploiting partial fitness signal.
Config¶
| Parameter | Value |
|---|---|
| n_bits | 20, 50 |
| k_sparse | 3, 5 |
| method | random search, evolutionary search (pop=100-200, tournament=3) |
| n_train | 500 (k=3), 2000 (k=5) |
| seeds | 42, 43, 44, 45, 46 |
Results¶
| Config | C(n,k) | Random: avg tries | Random: avg time | Evo: avg gens | Evo: avg time | Both solved |
|---|---|---|---|---|---|---|
| n=20, k=3 | 1,140 | 881 | 0.011s | 18 | 0.041s | 5/5 |
| n=50, k=3 | 19,600 | 11,291 | 0.142s | 151 | 0.781s | 5/5 |
| n=20, k=5 | 15,504 | 18,240 | 0.426s | 74 | 0.552s | 5/5 |
Summary Table¶
| Method | n=20/k=3 | n=50/k=3 | n=20/k=5 |
|---|---|---|---|
| Random search | 881 tries / 0.011s | 11,291 tries / 0.142s | 18,240 tries / 0.426s |
| Evolutionary | 18 gens / 0.041s | 151 gens / 0.781s | 74 gens / 0.552s |
| SGD (baseline) | ~5 epochs / 0.12s | FAIL direct (54%) | 14 epochs (n_train=5000) |
| SGD + curriculum | — | 20 epochs | — |
Analysis¶
What worked¶
- Both approaches solve all configs with 100% accuracy, exact subset recovery
- Random search is competitive: averages ~0.77x C(n,k) tries, expected for a geometric distribution with p=1/C(n,k)
- Random search is faster in wall time than evolutionary for all tested configs because each trial is O(n_train*k) while evo must evaluate 100-200 candidates per generation
- Evolutionary search uses far fewer fitness evaluations: 18 vs 881 (n=20/k=3), 151 vs 11,291 (n=50/k=3), but each generation evaluates the full population
- n=50/k=3 solved trivially, the config that SGD fails on (54%) without curriculum learning
What didn't work¶
- Evolutionary search is slower in wall time than random search despite fewer generations, because it evaluates a full population each generation (100-200 fitness calls per gen)
- The fitness surface is deceptive for evo: a wrong k-subset gives ~50% fitness (random chance), so the signal for partial correctness is weak until you get 2 out of 3 bits right
Surprise¶
- Random search solves n=50/k=3 in 0.14s, a config that SGD cannot solve directly (54% max in 200 epochs). The combinatorial approach bypasses the grokking/generalization problem because it checks exact parity match on training data.
- n=20/k=5 solved by random search in 0.43s. SGD needs 5000 training samples and 14 epochs. Random search only needs enough samples to verify uniqueness (~500 for k=3, 2000 for k=5).
Comparison with SGD¶
SGD and random search solve different problems. SGD learns a neural network that computes parity and must generalize from training data. Random/evo search finds the exact subset and just needs enough data to rule out false positives. Random search complexity is O(C(n,k)), polynomial in n for fixed k. SGD complexity depends on the grokking phase transition, which is harder to predict.
For small k (3-5), random search is simpler and faster. SGD's advantage appears when you need a differentiable model or when k is unknown.
Open Questions (for next experiment)¶
- How does random search scale to k=7 or k=10? C(20,7) = 77,520 and C(20,10) = 184,756, still feasible
- Can we use statistical tests (e.g., correlation between individual bits and y) to narrow the search space before enumeration?
- Hybrid approach: use random search to find candidate subsets, verify with a small neural net
Files¶
- Experiment:
src/sparse_parity/experiments/exp_evolutionary.py - Results:
results/exp_evolutionary/results.json