Experiment: DMC Optimization for Sparse Parity¶

Date: 2026-03-22 Issue: #22 Status: SUCCESS

Hypothesis¶

If we reduce the number of influence samples in KM from 5 to 1 (exploiting that parity influence is exactly 0 or 1, never fractional) and use in-place buffer reuse to minimize stack distances, we can reduce DMC below the GF2 baseline of 8,607.

Result: SUCCESS -- best variant achieves DMC 3,578 (58% reduction from GF2 baseline).

Config¶

Parameter	Value
n_bits	20
k_sparse	3
seed	42 (+ robustness: 43, 44, 45, 46)
method	KM influence estimation variants

Approaches Tested¶

Five DMC optimization strategies, compared against GF2 (DMC=8,607) and KM (DMC=20,633) baselines:

A. KM-min: 1 influence sample per bit instead of 5. Separate x and x_flip buffers.
B. KM-shared: 5 samples but shared buffer names (same name reused across bits).
C. GF2 fine-grained: Track each Gaussian elimination row operation individually.
D. KM-min + verify: 1 influence sample + minimal (k+1=4) verification samples.
E. KM-inplace: 1 influence sample, single buffer written once and read twice per bit (no separate flip buffer).

Results¶

Method	Accuracy	ARD	DMC	Total Floats	vs GF2
GF2 baseline	1.00	420	8,607	860	1.00x
KM baseline	1.00	92	20,633	4,420	2.40x
KM-min (A)	1.00	20	3,578	1,600	0.42x
KM-shared (B)	1.00	92	20,633	4,400	2.40x
GF2 fine-grained (C)	1.00	463	189,056	17,660	21.96x
KM-min + verify (D)	1.00	26	4,293	1,760	0.50x
KM-inplace (E)	1.00	30	4,319	1,200	0.50x

Robustness (5 seeds each)¶

Method	Seeds Correct	Avg DMC	Avg ARD	Avg Floats
KM-min (A)	5/5	3,578	20	1,600
KM-min + verify (D)	5/5	4,293	26	1,760
KM-inplace (E)	5/5	4,319	30	1,200

All three optimized variants achieve 100% accuracy across all 5 seeds with deterministic DMC values.

Analysis¶

What worked¶

KM-min (1 sample) reduces DMC by 58% vs GF2 and 83% vs baseline KM. For pure parity, influence is binary (0 or 1), never fractional. A single sample per bit suffices for perfect identification. This cuts total floats from 4,420 to 1,600.
KM-inplace achieves the fewest total floats (1,200) by eliminating the separate x_flip buffer. It reads x twice per iteration (once for original, once for flipped computation) with minimal stack distance (20-40 floats).
All stack distances in the optimized variants are tiny (20-40 floats), well within L1 cache. Every access is a cache hit.

What didn't work¶

Shared-buffer KM (B) produced identical DMC to baseline KM (20,633). Reusing buffer names does not help because the total number of floats read remains the same -- the buffer size (100 floats = 5 samples * 20 bits) resets the clock distance identically each iteration.
GF2 fine-grained tracking (C) was 22x worse than baseline GF2. Fine-grained tracking of Gaussian elimination reveals the true cost: each of n=20 column operations reads the full augmented matrix (21*21 = 441 elements), accumulating massive total floats (17,660) and large stack distances.

Surprise¶

The GF2 baseline DMC of 8,607 was artificially low because the harness only tracks write(A) -> read(A) -> write(solution), ignoring the O(n^2) row operations of Gaussian elimination. When tracked properly, GF2's real DMC is 189,056 -- 22x higher. This means KM-min's advantage over GF2 is even larger than the baseline numbers suggest.

The true DMC ranking with honest tracking would be:

Method	DMC (honest)
KM-min	3,578
KM-inplace	4,319
GF2 (honest)	189,056
KM baseline	20,633
SGD	1,278,460

Why KM-min wins on DMC¶

DMC = sum(size * sqrt(stack_distance)). KM-min wins on both factors:

Small buffers: Each buffer is only 20 floats (1 sample * 20 bits), vs GF2's 420-element matrix.
Tight streaming loop: Write x (20 floats), immediately read x (distance = 20), process one bit, repeat. Every read has distance exactly 20. Data stays in L1 cache the entire time.

The per-buffer analysis shows KM-min has uniform stack distance of 20 for every single read -- the theoretical minimum for a 20-float buffer that must be written before reading.

Open Questions¶

KM-min uses 1 influence sample, which works perfectly for parity (influence is deterministic). For noisy parity or non-parity functions (AND, threshold), more samples are needed. What is the minimum sample count that maintains accuracy for noisy settings?
Can we reduce DMC further by processing multiple bits per sample (vectorized influence estimation on a single x vector)?
The GF2 harness tracking is incomplete (misses row operations). Should the harness be updated to track GF2 more honestly?

Impact on DISCOVERIES.md¶

New DMC leader: KM-min achieves DMC 3,578 on n=20/k=3, 58% below the previous best (GF2 at 8,607). Parity influence is deterministic, so 1 sample per bit suffices. GF2's harness-measured DMC was artificially low due to incomplete tracking of Gaussian elimination steps.

Updated rankings:

Method	ARD	DMC	Total Floats
KM-min (1 sample)	20	3,578	1,600
KM-inplace	30	4,319	1,200
GF2 (harness)	420	8,607	860
KM (5 samples)	92	20,633	4,420

Files¶

Experiment code: src/sparse_parity/experiments/exp_dmc_optimize.py
Results: results/exp_dmc_optimize/results.json
This document: findings/exp_dmc_optimize.md