Experiment exp_random_proj: Random Projections + Correlation for Sparse Parity
Date: 2026-03-06
Status: SUCCESS
Answers: Monte Carlo Walsh-Hadamard with early stopping -- does random sampling beat exhaustive Fourier?
Hypothesis
If we randomly sample k-subsets and compute Walsh-Hadamard correlation (corr = mean(y * prod(x[:, subset]))), we can find the secret with early stopping (|corr| > 0.9) using far fewer than C(n,k) evaluations. Expected ~C(n,k)/2 on average (geometric distribution), but with high variance that sometimes yields large speedups.
Config
| Parameter |
Value |
| n_bits |
20, 50, 100, 200 |
| k_sparse |
3, 5 |
| method |
random k-subset sampling + Walsh-Hadamard correlation |
| corr_threshold |
0.9 (early stopping) |
| n_samples |
500 |
| seeds |
42, 43, 44, 45, 46 (5 seeds) |
| max_tries |
50k-5M depending on config |
Results
| Config |
C(n,k) |
RP avg tries |
RP % of C(n,k) |
RP avg time |
Fourier time |
Speedup |
| n=20, k=3 |
1,140 |
565 |
49.6% |
0.013s |
0.010s |
0.8x |
| n=50, k=3 |
19,600 |
6,342 |
32.4% |
0.113s |
0.166s |
1.5x |
| n=20, k=5 |
15,504 |
6,238 |
40.2% |
0.134s |
0.142s |
1.1x |
| n=100, k=3 |
161,700 |
99,908 |
61.8% |
2.09s |
N/A |
-- |
| n=200, k=3 |
1,313,400 |
815,849 |
62.1% |
19.8s |
N/A |
-- |
Variance Analysis (5 seeds)
| Config |
Tries mean |
Tries std |
Tries min |
Tries max |
Time std |
| n=20, k=3 |
565 |
350 |
62 |
1,029 |
0.008s |
| n=50, k=3 |
6,342 |
3,654 |
52 |
11,035 |
0.069s |
| n=20, k=5 |
6,238 |
5,417 |
427 |
13,382 |
0.127s |
| n=100, k=3 |
99,908 |
26,933 |
47,525 |
121,556 |
0.693s |
| n=200, k=3 |
815,849 |
284,812 |
383,591 |
1,184,819 |
9.20s |
Analysis
What worked
- 100% solve rate across all configs and all seeds (25/25 runs solved). The correlation test is perfectly reliable -- true subset always has corr=1.0, all others are near zero.
- Saves ~40-70% of subset evaluations vs exhaustive Fourier on average. The mean fraction of C(n,k) tested ranges from 32.4% (n=50/k=3) to 62.1% (n=200/k=3).
- Best case can be dramatically faster: n=50/k=3 with seed=45 found the secret in just 52 tries (0.3% of C(n,k)). n=20/k=5 with seed=42 found it in 427 tries (2.8% of C(n,k)).
- Scales to n=200/k=3 (1.3M possible subsets) -- solves in ~20s average, testing only ~62% of the search space.
- n=50/k=3 solved trivially -- the config where SGD fails (54%) without curriculum learning.
What didn't work
- Modest average speedup over exhaustive Fourier: only 1.1-1.5x on configs where both were measured. On n=20/k=3, Fourier is actually faster (0.8x) because the overhead of random sampling + duplicate checking outweighs the savings from early stopping when C(n,k) is small.
- High variance: standard deviation of tries is 40-87% of the mean. Some seeds get lucky (2.8% of C(n,k)), others unlucky (90.3%). This is inherent to the geometric distribution.
- ARD is terrible (same as Fourier): weighted ARD of 670K for n=20/k=3. Each subset evaluation reads the full dataset, so there is no temporal locality.
- Duplicate checking overhead: with large search spaces, the
tried set grows large and random collisions increase, though this was not a major bottleneck in practice.
Surprise
- The speedup is underwhelming for small C(n,k). When C(n,k) = 1,140 (n=20/k=3), the exhaustive Fourier is actually faster because it uses
itertools.combinations (optimized C) vs random sampling with duplicate detection. The random projection approach only wins when C(n,k) is large enough that 30-60% savings matter.
- Variance is the real story: the best-case n=50/k=3 run (seed=45) found the secret in 52 tries = 0.3% of C(n,k), finishing in 0.8ms. The worst case (seed=46) took 11,035 tries = 56.3%. If you're lucky, random projections are 100x faster than Fourier. If you're unlucky, they're barely faster.
- n=200/k=3 is tractable for random projections (20s) but would take exhaustive Fourier longer. At this scale the ~40% savings matter.
Comparison with Other Methods
| Config |
Random Proj |
Fourier (exhaustive) |
Random Search (exp_evolutionary) |
SGD |
| n=20, k=3 |
565 tries / 0.013s |
1,140 subsets / 0.010s |
881 tries / 0.011s |
~5 epochs / 0.12s |
| n=50, k=3 |
6,342 tries / 0.113s |
19,600 subsets / 0.166s |
11,291 tries / 0.142s |
FAIL (54%) |
| n=20, k=5 |
6,238 tries / 0.134s |
15,504 subsets / 0.142s |
18,240 tries / 0.426s |
14 epochs (n=5000) |
Random projections tests fewer subsets than both exhaustive Fourier and the random search from exp_evolutionary. The difference from exp_evolutionary's random search is that this approach uses correlation (a statistical test) rather than exact match, plus it does duplicate avoidance.
Open Questions
- Can we combine random projections with feature pre-screening (e.g., check individual bit correlations first, narrow candidates, then only sample from top bits)?
- What is the crossover point where random projections reliably beat exhaustive Fourier in wall time? Appears to be around C(n,k) > 10,000.
- Could we use adaptive sampling (e.g., if a subset has moderate correlation, mutate it) to get the best of both random projections and evolutionary search?
Files
- Experiment:
src/sparse_parity/experiments/exp_random_proj.py
- Results:
results/exp_random_proj/results.json