Experiment exp_km: Kushilevitz-Mansour Algorithm for Sparse Parity¶

Date: 2026-03-06 Status: SUCCESS Answers: Can influence-based pruning solve sparse parity in O(n) rather than O(C(n,k))?

Hypothesis¶

If we estimate per-bit influence (flip bit i, measure label change rate), secret bits will have influence 1.0 and non-secret bits will have influence 0.0. This prunes the search from C(n,k) subsets down to C(k,k)=1, giving an exact solver with O(n) queries instead of O(C(n,k)).

Config¶

Parameter	Value
n_bits	20, 50, 100
k_sparse	3, 5
hidden	N/A (no neural net)
lr	N/A
method	Kushilevitz-Mansour (influence estimation + pruned verification)
n_influence_samples	200 (per bit)
n_verify_samples	100
influence_threshold	0.5
seeds	42, 43, 44

Results¶

Metric	Value
Best test accuracy	100% (all configs, all seeds)
Candidates found	Exactly k in every run
Subsets checked	1 (always)
Total queries (n=20,k=3)	8,200
Total queries (n=100,k=3)	40,200
Wall time (n=20,k=3)	0.006s
Wall time (n=100,k=3)	0.009s
Wall time (n=20,k=5)	0.001s
Weighted ARD (n=20,k=3)	1,585

Key Table¶

Config	Method	Acc	Time	Queries/Subsets	Pruning ratio
n=20,k=3	KM (this)	100%	0.006s	8,200 queries	1/1,140 subsets
n=50,k=3	KM (this)	100%	0.003s	20,200 queries	1/19,600 subsets
n=100,k=3	KM (this)	100%	0.009s	40,200 queries	1/161,700 subsets
n=20,k=5	KM (this)	100%	0.001s	8,200 queries	1/15,504 subsets
n=20,k=3	Fourier	100%	0.009s	1,140 subsets	none
n=50,k=3	Fourier	100%	0.16s	19,600 subsets	none
n=100,k=3	Fourier	100%	1.3s	161,700 subsets	none
n=20,k=5	Fourier	100%	0.14s	15,504 subsets	none
n=20,k=3	Random search	100%	0.011s	~881 tries	none
n=50,k=3	Random search	100%	0.142s	~11,291 tries	none
n=20,k=3	SGD (fast.py)	100%	0.12s	~1000 samples	none
n=50,k=3	SGD direct	54% FAIL	---	---	none

Sample Complexity¶

Influence samples/bit	n=20,k=3	n=50,k=3	n=20,k=5	Total queries (n=20,k=3)
5	CORRECT	CORRECT	CORRECT	300
10	CORRECT	CORRECT	CORRECT	500
20	CORRECT	CORRECT	CORRECT	900
50	CORRECT	CORRECT	CORRECT	2,100
200	CORRECT	CORRECT	CORRECT	8,100

Analysis¶

What worked¶

Perfect pruning in every single run: influence estimation identified exactly k high-influence bits out of n, reducing subsets to check from C(n,k) to exactly 1. Zero false positives, zero false negatives across all 12 runs.
Scales linearly with n: total queries = 2 * n * n_influence_samples + n_verify_samples. For n=100,k=3 this is 40,200 queries vs Fourier's 161,700 subsets. The advantage grows with n.
Completely independent of k for the influence phase: the per-bit influence estimation cost is the same whether k=3 or k=5. The only place k matters is verification (which is trivial at C(k,k)=1).
Works with absurdly few samples: even 5 influence samples per bit correctly identifies the secret. The theoretical minimum is 1 sample (since influence is exactly 0 or 1 for parity), but statistical noise from finite samples could cause errors in more adversarial settings.
Fastest method tested: 0.001-0.009s across all configs. Beats Fourier (0.009-1.3s), random search (0.011-0.142s), and SGD (0.12s+) on every config.
ARD of 1,585 vs Fourier's 1,147,375: 724x better memory locality because the algorithm accesses far less data.

What didn't work¶

The query count is technically higher than Fourier for small n,k: 8,200 queries for n=20,k=3 vs Fourier checking 1,140 subsets (each needing ~500 sample correlations, so ~570,000 operations). But KM queries are paired (x and x^i), which is a different cost model.
Influence estimation uses a white-box oracle: we need to query f(x) and f(x^i) where we control x. In a pure PAC learning setting with i.i.d. samples, you cannot construct paired flipped queries -- you'd need a different approach.

Surprise¶

5 samples per bit is enough: this means the entire n=20,k=3 problem can be solved with 300 total queries. That is 3.8x fewer than the C(20,3)=1,140 subsets Fourier needs to check, and each query is cheaper (single oracle call vs computing a correlation over all samples).
The algorithm is embarrassingly simple: estimate n influences, threshold, done. The theoretical KM algorithm involves heavy Fourier analysis machinery, but for the special case of parity functions, the influence shortcut makes it trivial.
n=100,k=3 in 0.009s: Fourier takes 1.3s on the same config. KM is 144x faster because it never has to enumerate the 161,700 subsets.

Comparison with Fourier (exp_fourier)¶

KM is strictly better than brute-force Fourier for sparse parity:

Metric	KM	Fourier	KM advantage
n=20,k=3 time	0.006s	0.009s	1.5x
n=50,k=3 time	0.003s	0.16s	53x
n=100,k=3 time	0.009s	1.3s	144x
n=20,k=5 time	0.001s	0.14s	140x
Scaling	O(n)	O(C(n,k))	polynomial vs combinatorial
Min samples	5/bit	~20 total	different cost model
ARD (n=20,k=3)	1,585	1,147,375	724x

The advantage grows with n and k because Fourier's C(n,k) explodes while KM stays O(n).

Open Questions¶

How does this extend to noisy parity (labels corrupted with probability p)? Influence estimation still works but requires more samples: O(1/p^2) per bit.
Can this be combined with a neural net? Use influence estimation to select features, then train a tiny network on only the k selected bits.
What about unknown k? Could run influence estimation and count the number of high-influence bits to infer k.
In a true PAC learning model (i.i.d. samples only, no paired queries), can we still estimate influence efficiently? Yes, via Var(E[y|x_i]) which only needs random samples.

Files¶

Experiment: src/sparse_parity/experiments/exp_km.py
Results: results/exp_km/results.json