Experiment: DMC Optimization for Sparse Parity¶
Date: 2026-03-22 Issue: #22 Status: SUCCESS
Hypothesis¶
If we reduce the number of influence samples in KM from 5 to 1 (exploiting that parity influence is exactly 0 or 1, never fractional) and use in-place buffer reuse to minimize stack distances, we can reduce DMC below the GF2 baseline of 8,607.
Result: SUCCESS -- best variant achieves DMC 3,578 (58% reduction from GF2 baseline).
Config¶
| Parameter | Value |
|---|---|
| n_bits | 20 |
| k_sparse | 3 |
| seed | 42 (+ robustness: 43, 44, 45, 46) |
| method | KM influence estimation variants |
Approaches Tested¶
Five DMC optimization strategies, compared against GF2 (DMC=8,607) and KM (DMC=20,633) baselines:
- A. KM-min: 1 influence sample per bit instead of 5. Separate x and x_flip buffers.
- B. KM-shared: 5 samples but shared buffer names (same name reused across bits).
- C. GF2 fine-grained: Track each Gaussian elimination row operation individually.
- D. KM-min + verify: 1 influence sample + minimal (k+1=4) verification samples.
- E. KM-inplace: 1 influence sample, single buffer written once and read twice per bit (no separate flip buffer).
Results¶
| Method | Accuracy | ARD | DMC | Total Floats | vs GF2 |
|---|---|---|---|---|---|
| GF2 baseline | 1.00 | 420 | 8,607 | 860 | 1.00x |
| KM baseline | 1.00 | 92 | 20,633 | 4,420 | 2.40x |
| KM-min (A) | 1.00 | 20 | 3,578 | 1,600 | 0.42x |
| KM-shared (B) | 1.00 | 92 | 20,633 | 4,400 | 2.40x |
| GF2 fine-grained (C) | 1.00 | 463 | 189,056 | 17,660 | 21.96x |
| KM-min + verify (D) | 1.00 | 26 | 4,293 | 1,760 | 0.50x |
| KM-inplace (E) | 1.00 | 30 | 4,319 | 1,200 | 0.50x |
Robustness (5 seeds each)¶
| Method | Seeds Correct | Avg DMC | Avg ARD | Avg Floats |
|---|---|---|---|---|
| KM-min (A) | 5/5 | 3,578 | 20 | 1,600 |
| KM-min + verify (D) | 5/5 | 4,293 | 26 | 1,760 |
| KM-inplace (E) | 5/5 | 4,319 | 30 | 1,200 |
All three optimized variants achieve 100% accuracy across all 5 seeds with deterministic DMC values.
Analysis¶
What worked¶
- KM-min (1 sample) reduces DMC by 58% vs GF2 and 83% vs baseline KM. For pure parity, influence is binary (0 or 1), never fractional. A single sample per bit suffices for perfect identification. This cuts total floats from 4,420 to 1,600.
- KM-inplace achieves the fewest total floats (1,200) by eliminating the separate x_flip buffer. It reads x twice per iteration (once for original, once for flipped computation) with minimal stack distance (20-40 floats).
- All stack distances in the optimized variants are tiny (20-40 floats), well within L1 cache. Every access is a cache hit.
What didn't work¶
- Shared-buffer KM (B) produced identical DMC to baseline KM (20,633). Reusing buffer names does not help because the total number of floats read remains the same -- the buffer size (100 floats = 5 samples * 20 bits) resets the clock distance identically each iteration.
- GF2 fine-grained tracking (C) was 22x worse than baseline GF2. Fine-grained tracking of Gaussian elimination reveals the true cost: each of n=20 column operations reads the full augmented matrix (21*21 = 441 elements), accumulating massive total floats (17,660) and large stack distances.
Surprise¶
The GF2 baseline DMC of 8,607 was artificially low because the harness only tracks write(A) -> read(A) -> write(solution), ignoring the O(n^2) row operations of Gaussian elimination. When tracked properly, GF2's real DMC is 189,056 -- 22x higher. This means KM-min's advantage over GF2 is even larger than the baseline numbers suggest.
The true DMC ranking with honest tracking would be:
| Method | DMC (honest) |
|---|---|
| KM-min | 3,578 |
| KM-inplace | 4,319 |
| GF2 (honest) | 189,056 |
| KM baseline | 20,633 |
| SGD | 1,278,460 |
Why KM-min wins on DMC¶
DMC = sum(size * sqrt(stack_distance)). KM-min wins on both factors:
- Small buffers: Each buffer is only 20 floats (1 sample * 20 bits), vs GF2's 420-element matrix.
- Tight streaming loop: Write x (20 floats), immediately read x (distance = 20), process one bit, repeat. Every read has distance exactly 20. Data stays in L1 cache the entire time.
The per-buffer analysis shows KM-min has uniform stack distance of 20 for every single read -- the theoretical minimum for a 20-float buffer that must be written before reading.
Open Questions¶
- KM-min uses 1 influence sample, which works perfectly for parity (influence is deterministic). For noisy parity or non-parity functions (AND, threshold), more samples are needed. What is the minimum sample count that maintains accuracy for noisy settings?
- Can we reduce DMC further by processing multiple bits per sample (vectorized influence estimation on a single x vector)?
- The GF2 harness tracking is incomplete (misses row operations). Should the harness be updated to track GF2 more honestly?
Impact on DISCOVERIES.md¶
New DMC leader: KM-min achieves DMC 3,578 on n=20/k=3, 58% below the previous best (GF2 at 8,607). Parity influence is deterministic, so 1 sample per bit suffices. GF2's harness-measured DMC was artificially low due to incomplete tracking of Gaussian elimination steps.
Updated rankings:
| Method | ARD | DMC | Total Floats |
|---|---|---|---|
| KM-min (1 sample) | 20 | 3,578 | 1,600 |
| KM-inplace | 30 | 4,319 | 1,200 |
| GF2 (harness) | 420 | 8,607 | 860 |
| KM (5 samples) | 92 | 20,633 | 4,420 |
Files¶
- Experiment code:
src/sparse_parity/experiments/exp_dmc_optimize.py - Results:
results/exp_dmc_optimize/results.json - This document:
findings/exp_dmc_optimize.md