Skip to content

Experiment exp_mdl: Minimum Description Length for Sparse Parity

Date: 2026-03-06 Status: SUCCESS Approach: #17 -- MDL / Compression. The best compressor of the label sequence is the one that knows the secret bits.

Hypothesis

For each candidate k-subset S, compute the parity under S and measure how well it compresses the label sequence. The true subset produces a perfectly compressible (zero-entropy) residual; wrong subsets produce ~50% error rate residuals that cost ~n_samples bits. MDL is more general than Fourier correlation: it works for any deterministic labeling function and degrades gracefully under label noise.

Config

Parameter Value
n_bits 20, 50
k_sparse 3, 5
n_samples 500
seeds 42, 43, 44
noise_rate 0% (clean), 5% (robustness test)
DL formula n_samples * H(residual_rate), H = binary entropy
Comparison Fourier/Walsh-Hadamard correlation on same data

Results

Clean data (no noise)

Config C(n,k) MDL correct MDL avg time Fourier correct Fourier avg time
n=20, k=3 1,140 3/3 0.0133s 3/3 0.0118s
n=50, k=3 19,600 3/3 0.2673s 3/3 0.2090s
n=20, k=5 15,504 3/3 0.2542s 3/3 0.1868s
  • True subset DL = 0.0 bits (perfect compression) in all cases
  • Wrong subset DL averages ~499.3 bits (out of 500 samples), confirming near-chance residual rate

Noise robustness (5% label flips)

Config MDL correct MDL best DL (bits) Fourier correct Fourier best corr
n=20, k=3 3/3 157.0 avg 3/3 0.887 avg
n=50, k=3 3/3 107.0 avg 3/3 0.932 avg
n=20, k=5 3/3 157.0 avg 3/3 0.887 avg
  • Under 5% noise, the true subset DL rises from 0 to ~107-164 bits (= 500 * H(0.05) ~ 143 bits) but remains far below wrong subsets (~499 bits)
  • Both MDL and Fourier find the correct subset under 5% noise -- separation is large enough

MemTracker (n=20, k=3, seed=42)

Metric Value
Total floats accessed 2,290,500
Reads 2,280
Writes 2
Weighted ARD 1,147,375 floats

ARD is large because every subset scan re-reads x and y from scratch (streaming pattern, one pass per subset).

Analysis

What worked

  • MDL finds the exact secret subset on all configs with 100% accuracy, matching Fourier
  • Noise robustness confirmed: 5% label noise does not degrade MDL -- the gap between true subset DL (~150 bits) and wrong subset DL (~499 bits) is enormous
  • Zero-entropy signal is unambiguous: the true subset compresses to exactly 0 bits on clean data, while even the next-best wrong subset sits at ~490+ bits
  • MDL is conceptually more general: unlike Fourier correlation which exploits the multiplicative structure of parity, MDL only requires that the true hypothesis produces a low-entropy residual -- it would work for any deterministic labeling function

What didn't work

  • MDL is ~30% slower than Fourier on the same data: MDL computes residual_rate + binary_entropy per subset, while Fourier computes a single dot product + abs. The entropy computation (with log2) is more expensive than a mean
  • No practical advantage over Fourier for parity specifically: both achieve 100% accuracy on all configs. MDL's generality does not help when the labeling function is known to be parity

Key insight

MDL and Fourier are equivalent for sparse parity in the noiseless case: both produce a binary decision (true subset scores 0/1.0, wrong subsets score ~499/~0). Under noise, both degrade gracefully. The MDL framework is theoretically more principled (it directly measures compression quality rather than correlation), but for parity the distinction is academic.

Quantitative comparison with other approaches

Method n=20/k=3 n=50/k=3 n=20/k=5
MDL 0.013s, 100% 0.267s, 100% 0.254s, 100%
Fourier 0.012s, 100% 0.209s, 100% 0.187s, 100%
Random search 0.011s, 100% 0.142s, 100% 0.426s, 100%
SGD (baseline) 0.12s, 100% FAIL (54%) 14 epochs

Open Questions

  • Would MDL outperform Fourier on a non-parity labeling function (e.g., threshold, majority, or XOR-of-subsets)?
  • Can we speed up MDL by computing residual rates in batch (vectorized over subsets)?
  • At what noise level does MDL start failing? Theoretically when 500 * H(noise_rate) approaches 500 * H(0.5), i.e. noise > ~40%
  • Could a two-stage approach work: first screen by MDL, then verify by Fourier?

Files

  • Experiment: src/sparse_parity/experiments/exp_mdl.py
  • Results: results/exp_mdl/results.json