Skip to content

Experiment exp_evolutionary: Evolutionary/Random Search for Sparse Parity

Date: 2026-03-04 Status: SUCCESS Answers: Blank slate approach. Can sparse parity be solved without neural nets or gradients?

Hypothesis

If we search directly over k-subsets of input bits, random search should need ~C(n,k) tries on average, and evolutionary search (with tournament selection + mutation) should beat that by exploiting partial fitness signal.

Config

Parameter Value
n_bits 20, 50
k_sparse 3, 5
method random search, evolutionary search (pop=100-200, tournament=3)
n_train 500 (k=3), 2000 (k=5)
seeds 42, 43, 44, 45, 46

Results

Config C(n,k) Random: avg tries Random: avg time Evo: avg gens Evo: avg time Both solved
n=20, k=3 1,140 881 0.011s 18 0.041s 5/5
n=50, k=3 19,600 11,291 0.142s 151 0.781s 5/5
n=20, k=5 15,504 18,240 0.426s 74 0.552s 5/5

Summary Table

Method n=20/k=3 n=50/k=3 n=20/k=5
Random search 881 tries / 0.011s 11,291 tries / 0.142s 18,240 tries / 0.426s
Evolutionary 18 gens / 0.041s 151 gens / 0.781s 74 gens / 0.552s
SGD (baseline) ~5 epochs / 0.12s FAIL direct (54%) 14 epochs (n_train=5000)
SGD + curriculum 20 epochs

Analysis

What worked

  • Both approaches solve all configs with 100% accuracy, exact subset recovery
  • Random search is competitive: averages ~0.77x C(n,k) tries, expected for a geometric distribution with p=1/C(n,k)
  • Random search is faster in wall time than evolutionary for all tested configs because each trial is O(n_train*k) while evo must evaluate 100-200 candidates per generation
  • Evolutionary search uses far fewer fitness evaluations: 18 vs 881 (n=20/k=3), 151 vs 11,291 (n=50/k=3), but each generation evaluates the full population
  • n=50/k=3 solved trivially, the config that SGD fails on (54%) without curriculum learning

What didn't work

  • Evolutionary search is slower in wall time than random search despite fewer generations, because it evaluates a full population each generation (100-200 fitness calls per gen)
  • The fitness surface is deceptive for evo: a wrong k-subset gives ~50% fitness (random chance), so the signal for partial correctness is weak until you get 2 out of 3 bits right

Surprise

  • Random search solves n=50/k=3 in 0.14s, a config that SGD cannot solve directly (54% max in 200 epochs). The combinatorial approach bypasses the grokking/generalization problem because it checks exact parity match on training data.
  • n=20/k=5 solved by random search in 0.43s. SGD needs 5000 training samples and 14 epochs. Random search only needs enough samples to verify uniqueness (~500 for k=3, 2000 for k=5).

Comparison with SGD

SGD and random search solve different problems. SGD learns a neural network that computes parity and must generalize from training data. Random/evo search finds the exact subset and just needs enough data to rule out false positives. Random search complexity is O(C(n,k)), polynomial in n for fixed k. SGD complexity depends on the grokking phase transition, which is harder to predict.

For small k (3-5), random search is simpler and faster. SGD's advantage appears when you need a differentiable model or when k is unknown.

Open Questions (for next experiment)

  • How does random search scale to k=7 or k=10? C(20,7) = 77,520 and C(20,10) = 184,756, still feasible
  • Can we use statistical tests (e.g., correlation between individual bits and y) to narrow the search space before enumeration?
  • Hybrid approach: use random search to find candidate subsets, verify with a small neural net

Files

  • Experiment: src/sparse_parity/experiments/exp_evolutionary.py
  • Results: results/exp_evolutionary/results.json