Experiment: Sparse Sum vs Sparse Parity (All Methods, CPU and GPU)¶

Date: 2026-03-15 Status: FINDING Related: Issue #4, Issue #5

Question¶

Yaroslav asked: redo all experiments for sparse sum. How does every method compare between sum and parity, on both CPU and GPU?

What was performed¶

Ran all available methods on both tasks (sparse parity and sparse sum) at n=20, k=3, seed=42.

CPU: numpy harness (bin/reproduce-all), verified across 5 seeds
GPU: PyTorch CUDA on NVIDIA L4 via Modal (bin/gpu_energy.py), 4 runs each

What was produced¶

CPU (numpy, 5-seed mean)¶

Method	Parity time	Parity acc	Sum time	Sum acc
GF(2)	0.5ms	100%	fails	0%
KM	1.1ms	100%	0.7ms	100%
OLS	n/a	n/a	0.5ms	100%
Fourier	12ms	100%	0.3ms	100%
SGD (harness)	142ms (40 ep)	100%	0.8ms (1 ep)	100%
Sign SGD	29ms mean (unreliable)	3-5/5	3.7ms	5/5

GPU (PyTorch CUDA, L4, 4-run mean)¶

Method	Parity time (5 runs)	Sum time (4 runs)
GF(2)	2.0ms (CPU, can't parallelize)	2.7ms (fails on sum)
SGD	1446ms, 37 epochs	452ms, 1 epoch
KM	869ms	197ms

GPU vs CPU ratio¶

Method	Parity GPU/CPU	Sum GPU/CPU
SGD	10x slower	565x slower
KM	790x slower	281x slower

Can it be reproduced?¶

# CPU (all methods, all challenges)
PYTHONPATH=src python3 bin/reproduce-all

# GPU parity
modal run bin/gpu_energy.py

# GPU sum
TASK=sum modal run bin/gpu_energy.py

Finding¶

Sum is trivial for every method. Parity is hard for SGD.

SGD solves sum in 1 epoch (0.8ms CPU) because the signal is first-order. Each secret bit contributes independently to the output. SGD solves parity in 40 epochs (142ms CPU) because it must discover the k-th order interaction through grokking.

On GPU, both tasks are slower than CPU. The tensors are too small for CUDA. But the ratio is revealing: parity-SGD is 10x slower on GPU, sum-SGD is 565x slower. Sum finishes in 1 epoch so the entire GPU time is kernel launch overhead for a single pass through 1000 samples.

GF(2) solves parity in 0.5ms but fails on sum entirely. This is correct: GF(2) exploits that parity is linear over the binary field. Sum is not parity over any field.

Sign SGD is unreliable on parity (3/5 seeds at best config) but solves sum 5/5 every time at 3.7ms mean.

The 10ms budget: On CPU, every sum method fits. On parity, only GF(2) (0.5ms) and KM (1.1ms) fit. SGD and Fourier don't fit for parity. Sign SGD is borderline.

Files¶

This document: findings/exp_sum_vs_parity.md
CPU harness: bin/reproduce-all
GPU script: bin/gpu_energy.py