Skip to content

Experiment exp_km: Kushilevitz-Mansour Algorithm for Sparse Parity

Date: 2026-03-06 Status: SUCCESS Answers: Can influence-based pruning solve sparse parity in O(n) rather than O(C(n,k))?

Hypothesis

If we estimate per-bit influence (flip bit i, measure label change rate), secret bits will have influence 1.0 and non-secret bits will have influence 0.0. This prunes the search from C(n,k) subsets down to C(k,k)=1, giving an exact solver with O(n) queries instead of O(C(n,k)).

Config

Parameter Value
n_bits 20, 50, 100
k_sparse 3, 5
hidden N/A (no neural net)
lr N/A
method Kushilevitz-Mansour (influence estimation + pruned verification)
n_influence_samples 200 (per bit)
n_verify_samples 100
influence_threshold 0.5
seeds 42, 43, 44

Results

Metric Value
Best test accuracy 100% (all configs, all seeds)
Candidates found Exactly k in every run
Subsets checked 1 (always)
Total queries (n=20,k=3) 8,200
Total queries (n=100,k=3) 40,200
Wall time (n=20,k=3) 0.006s
Wall time (n=100,k=3) 0.009s
Wall time (n=20,k=5) 0.001s
Weighted ARD (n=20,k=3) 1,585

Key Table

Config Method Acc Time Queries/Subsets Pruning ratio
n=20,k=3 KM (this) 100% 0.006s 8,200 queries 1/1,140 subsets
n=50,k=3 KM (this) 100% 0.003s 20,200 queries 1/19,600 subsets
n=100,k=3 KM (this) 100% 0.009s 40,200 queries 1/161,700 subsets
n=20,k=5 KM (this) 100% 0.001s 8,200 queries 1/15,504 subsets
n=20,k=3 Fourier 100% 0.009s 1,140 subsets none
n=50,k=3 Fourier 100% 0.16s 19,600 subsets none
n=100,k=3 Fourier 100% 1.3s 161,700 subsets none
n=20,k=5 Fourier 100% 0.14s 15,504 subsets none
n=20,k=3 Random search 100% 0.011s ~881 tries none
n=50,k=3 Random search 100% 0.142s ~11,291 tries none
n=20,k=3 SGD (fast.py) 100% 0.12s ~1000 samples none
n=50,k=3 SGD direct 54% FAIL --- --- none

Sample Complexity

Influence samples/bit n=20,k=3 n=50,k=3 n=20,k=5 Total queries (n=20,k=3)
5 CORRECT CORRECT CORRECT 300
10 CORRECT CORRECT CORRECT 500
20 CORRECT CORRECT CORRECT 900
50 CORRECT CORRECT CORRECT 2,100
200 CORRECT CORRECT CORRECT 8,100

Analysis

What worked

  • Perfect pruning in every single run: influence estimation identified exactly k high-influence bits out of n, reducing subsets to check from C(n,k) to exactly 1. Zero false positives, zero false negatives across all 12 runs.
  • Scales linearly with n: total queries = 2 * n * n_influence_samples + n_verify_samples. For n=100,k=3 this is 40,200 queries vs Fourier's 161,700 subsets. The advantage grows with n.
  • Completely independent of k for the influence phase: the per-bit influence estimation cost is the same whether k=3 or k=5. The only place k matters is verification (which is trivial at C(k,k)=1).
  • Works with absurdly few samples: even 5 influence samples per bit correctly identifies the secret. The theoretical minimum is 1 sample (since influence is exactly 0 or 1 for parity), but statistical noise from finite samples could cause errors in more adversarial settings.
  • Fastest method tested: 0.001-0.009s across all configs. Beats Fourier (0.009-1.3s), random search (0.011-0.142s), and SGD (0.12s+) on every config.
  • ARD of 1,585 vs Fourier's 1,147,375: 724x better memory locality because the algorithm accesses far less data.

What didn't work

  • The query count is technically higher than Fourier for small n,k: 8,200 queries for n=20,k=3 vs Fourier checking 1,140 subsets (each needing ~500 sample correlations, so ~570,000 operations). But KM queries are paired (x and x^i), which is a different cost model.
  • Influence estimation uses a white-box oracle: we need to query f(x) and f(x^i) where we control x. In a pure PAC learning setting with i.i.d. samples, you cannot construct paired flipped queries -- you'd need a different approach.

Surprise

  • 5 samples per bit is enough: this means the entire n=20,k=3 problem can be solved with 300 total queries. That is 3.8x fewer than the C(20,3)=1,140 subsets Fourier needs to check, and each query is cheaper (single oracle call vs computing a correlation over all samples).
  • The algorithm is embarrassingly simple: estimate n influences, threshold, done. The theoretical KM algorithm involves heavy Fourier analysis machinery, but for the special case of parity functions, the influence shortcut makes it trivial.
  • n=100,k=3 in 0.009s: Fourier takes 1.3s on the same config. KM is 144x faster because it never has to enumerate the 161,700 subsets.

Comparison with Fourier (exp_fourier)

KM is strictly better than brute-force Fourier for sparse parity:

Metric KM Fourier KM advantage
n=20,k=3 time 0.006s 0.009s 1.5x
n=50,k=3 time 0.003s 0.16s 53x
n=100,k=3 time 0.009s 1.3s 144x
n=20,k=5 time 0.001s 0.14s 140x
Scaling O(n) O(C(n,k)) polynomial vs combinatorial
Min samples 5/bit ~20 total different cost model
ARD (n=20,k=3) 1,585 1,147,375 724x

The advantage grows with n and k because Fourier's C(n,k) explodes while KM stays O(n).

Open Questions

  • How does this extend to noisy parity (labels corrupted with probability p)? Influence estimation still works but requires more samples: O(1/p^2) per bit.
  • Can this be combined with a neural net? Use influence estimation to select features, then train a tiny network on only the k selected bits.
  • What about unknown k? Could run influence estimation and count the number of high-influence bits to infer k.
  • In a true PAC learning model (i.i.d. samples only, no paired queries), can we still estimate influence efficiently? Yes, via Var(E[y|x_i]) which only needs random samples.

Files

  • Experiment: src/sparse_parity/experiments/exp_km.py
  • Results: results/exp_km/results.json