Experiment exp_decision_tree: Decision Trees for Sparse Parity
Date: 2026-03-06
Status: FAILED (no model achieves 100% test accuracy)
Answers: Tree-based methods cannot reliably solve sparse parity due to greedy splitting
Hypothesis
A depth-k binary decision tree can learn k-parity exactly if each split targets one of the secret bits. Random forests with max_depth=k would implicitly search the subset space. ExtraTrees with random splits might bypass the greedy information-gain trap. RISK: greedy splitting by information gain fails for parity because individual bits have zero marginal correlation with the label.
Config
| Parameter |
Value |
| n_bits |
20, 50 |
| k_sparse |
3, 5 |
| models |
DecisionTree, RandomForest (100 trees), ExtraTrees (500 trees) |
| max_depth |
k, 2*k, unlimited |
| n_train |
5000 (k=3), 10000 (k=5) |
| n_test |
1000 |
| seeds |
42, 43, 44 |
Results
Test accuracy (averaged over 3 seeds)
| Model |
n=20/k=3 |
n=50/k=3 |
n=20/k=5 |
| DT_depth_k |
50.0% |
51.0% |
49.8% |
| DT_depth_2k |
50.4% |
50.0% |
52.0% |
| DT_unlimited |
69.4% |
55.0% |
56.6% |
| RF_depth_k |
55.4% |
50.6% |
48.3% |
| RF_depth_2k |
79.7% |
55.4% |
60.8% |
| RF_depth_unlimited |
91.8% |
61.4% |
66.1% |
| ET_depth_k |
50.8% |
51.5% |
48.7% |
| ET_depth_2k |
79.2% |
56.7% |
60.4% |
| ET_depth_unlimited |
92.5% |
65.6% |
67.1% |
Timing (average seconds)
| Model |
n=20/k=3 |
n=50/k=3 |
n=20/k=5 |
| DT_depth_k |
0.004s |
0.007s |
0.020s |
| DT_depth_2k |
0.005s |
0.017s |
0.016s |
| DT_unlimited |
0.009s |
0.024s |
0.023s |
| RF_depth_k |
0.137s |
0.104s |
0.113s |
| RF_depth_2k |
0.092s |
0.108s |
0.136s |
| RF_depth_unlimited |
0.102s |
0.129s |
0.168s |
| ET_depth_k |
0.325s |
0.407s |
0.504s |
| ET_depth_2k |
0.348s |
0.492s |
0.594s |
| ET_depth_unlimited |
0.422s |
0.621s |
0.763s |
Comparison with baselines
| Method |
n=20/k=3 |
n=50/k=3 |
n=20/k=5 |
| Best tree (ET unlimited) |
92.5% |
65.6% |
67.1% |
| SGD (baseline) |
100% |
54% (FAIL) |
100% |
| Random search |
100% |
100% |
100% |
| Fourier exhaustive |
100% |
100% |
100% |
Analysis
What worked
- ExtraTrees unlimited depth is the best tree method, reaching 92.5% on n=20/k=3 (the easiest config). Random splits help ExtraTrees explore more diverse feature combinations than greedy trees.
- Ensembles substantially outperform single trees: RF_unlimited (91.8%) vs DT_unlimited (69.4%) on n=20/k=3. Averaging many overfitting trees produces a better approximation.
- Depth 2*k helps significantly over depth k: RF_depth_2k gets 79.7% vs RF_depth_k at 55.4% on n=20/k=3. The tree needs extra depth beyond k to route through irrelevant features.
- Feature importances do recover the secret bits in unlimited-depth models, but only because the tree memorizes training data patterns rather than learning the parity function.
- All models achieve 100% train accuracy with unlimited depth, confirming trees can memorize parity. The problem is generalization.
What didn't work
- No tree model achieves 100% test accuracy on any config. The best result is 92.5% (ET_unlimited on n=20/k=3). This confirms the core hypothesis risk: greedy splitting cannot learn parity.
- Depth-k trees perform at chance (~50%) on all configs. Despite k being the theoretical minimum depth to represent k-parity, greedy information-gain splitting never finds the right split sequence because individual bits have zero correlation with the label.
- n=50/k=3 is hardest for trees (best: 65.6%) even though C(50,3)=19,600 is moderate. More irrelevant features mean more distractors for the greedy splitter. This is the same config where SGD also fails (54%).
- n=20/k=5 is similarly poor (best: 67.1%). Higher parity order makes the interaction harder to capture with axis-aligned splits.
- ExtraTrees' random splits don't solve the problem despite being the best variant. 500 trees with random splits still cannot reliably discover the k-way interaction.
Surprise
- Trees beat SGD on n=50/k=3: ET_unlimited gets 65.6% vs SGD's 54%. Both fail, but the ensemble of random trees captures more of the parity signal than a neural net with vanilla SGD. Neither comes close to the combinatorial methods (random search, Fourier) which get 100%.
- The gap between train and test accuracy is enormous: 100% train vs 55-67% test for unlimited trees on harder configs. This is extreme overfitting -- the tree memorizes n_train parity values but cannot generalize the function.
- Depth-k trees are functionally random guessing (50% accuracy), even though a depth-k tree CAN represent k-parity. The gap between representational capacity and learnability via greedy splitting is total.
Open Questions
- Would boosted trees (XGBoost, LightGBM) do better by iteratively correcting residuals?
- Could oblique decision trees (splits on linear combinations of features) learn parity?
- What if we gave the tree product features (x_i * x_j) as inputs -- would depth-1 trees solve 2-parity?
- How many training samples would an unlimited-depth random forest need to approach 100% test accuracy?
Files
- Experiment:
src/sparse_parity/experiments/exp_decision_tree.py
- Results:
results/exp_decision_tree/results.json