Skip to content

Experiment exp_decision_tree: Decision Trees for Sparse Parity

Date: 2026-03-06 Status: FAILED (no model achieves 100% test accuracy) Answers: Tree-based methods cannot reliably solve sparse parity due to greedy splitting

Hypothesis

A depth-k binary decision tree can learn k-parity exactly if each split targets one of the secret bits. Random forests with max_depth=k would implicitly search the subset space. ExtraTrees with random splits might bypass the greedy information-gain trap. RISK: greedy splitting by information gain fails for parity because individual bits have zero marginal correlation with the label.

Config

Parameter Value
n_bits 20, 50
k_sparse 3, 5
models DecisionTree, RandomForest (100 trees), ExtraTrees (500 trees)
max_depth k, 2*k, unlimited
n_train 5000 (k=3), 10000 (k=5)
n_test 1000
seeds 42, 43, 44

Results

Test accuracy (averaged over 3 seeds)

Model n=20/k=3 n=50/k=3 n=20/k=5
DT_depth_k 50.0% 51.0% 49.8%
DT_depth_2k 50.4% 50.0% 52.0%
DT_unlimited 69.4% 55.0% 56.6%
RF_depth_k 55.4% 50.6% 48.3%
RF_depth_2k 79.7% 55.4% 60.8%
RF_depth_unlimited 91.8% 61.4% 66.1%
ET_depth_k 50.8% 51.5% 48.7%
ET_depth_2k 79.2% 56.7% 60.4%
ET_depth_unlimited 92.5% 65.6% 67.1%

Timing (average seconds)

Model n=20/k=3 n=50/k=3 n=20/k=5
DT_depth_k 0.004s 0.007s 0.020s
DT_depth_2k 0.005s 0.017s 0.016s
DT_unlimited 0.009s 0.024s 0.023s
RF_depth_k 0.137s 0.104s 0.113s
RF_depth_2k 0.092s 0.108s 0.136s
RF_depth_unlimited 0.102s 0.129s 0.168s
ET_depth_k 0.325s 0.407s 0.504s
ET_depth_2k 0.348s 0.492s 0.594s
ET_depth_unlimited 0.422s 0.621s 0.763s

Comparison with baselines

Method n=20/k=3 n=50/k=3 n=20/k=5
Best tree (ET unlimited) 92.5% 65.6% 67.1%
SGD (baseline) 100% 54% (FAIL) 100%
Random search 100% 100% 100%
Fourier exhaustive 100% 100% 100%

Analysis

What worked

  • ExtraTrees unlimited depth is the best tree method, reaching 92.5% on n=20/k=3 (the easiest config). Random splits help ExtraTrees explore more diverse feature combinations than greedy trees.
  • Ensembles substantially outperform single trees: RF_unlimited (91.8%) vs DT_unlimited (69.4%) on n=20/k=3. Averaging many overfitting trees produces a better approximation.
  • Depth 2*k helps significantly over depth k: RF_depth_2k gets 79.7% vs RF_depth_k at 55.4% on n=20/k=3. The tree needs extra depth beyond k to route through irrelevant features.
  • Feature importances do recover the secret bits in unlimited-depth models, but only because the tree memorizes training data patterns rather than learning the parity function.
  • All models achieve 100% train accuracy with unlimited depth, confirming trees can memorize parity. The problem is generalization.

What didn't work

  • No tree model achieves 100% test accuracy on any config. The best result is 92.5% (ET_unlimited on n=20/k=3). This confirms the core hypothesis risk: greedy splitting cannot learn parity.
  • Depth-k trees perform at chance (~50%) on all configs. Despite k being the theoretical minimum depth to represent k-parity, greedy information-gain splitting never finds the right split sequence because individual bits have zero correlation with the label.
  • n=50/k=3 is hardest for trees (best: 65.6%) even though C(50,3)=19,600 is moderate. More irrelevant features mean more distractors for the greedy splitter. This is the same config where SGD also fails (54%).
  • n=20/k=5 is similarly poor (best: 67.1%). Higher parity order makes the interaction harder to capture with axis-aligned splits.
  • ExtraTrees' random splits don't solve the problem despite being the best variant. 500 trees with random splits still cannot reliably discover the k-way interaction.

Surprise

  • Trees beat SGD on n=50/k=3: ET_unlimited gets 65.6% vs SGD's 54%. Both fail, but the ensemble of random trees captures more of the parity signal than a neural net with vanilla SGD. Neither comes close to the combinatorial methods (random search, Fourier) which get 100%.
  • The gap between train and test accuracy is enormous: 100% train vs 55-67% test for unlimited trees on harder configs. This is extreme overfitting -- the tree memorizes n_train parity values but cannot generalize the function.
  • Depth-k trees are functionally random guessing (50% accuracy), even though a depth-k tree CAN represent k-parity. The gap between representational capacity and learnability via greedy splitting is total.

Open Questions

  • Would boosted trees (XGBoost, LightGBM) do better by iteratively correcting residuals?
  • Could oblique decision trees (splits on linear combinations of features) learn parity?
  • What if we gave the tree product features (x_i * x_j) as inputs -- would depth-1 trees solve 2-parity?
  • How many training samples would an unlimited-depth random forest need to approach 100% test accuracy?

Files

  • Experiment: src/sparse_parity/experiments/exp_decision_tree.py
  • Results: results/exp_decision_tree/results.json