Hinton Problems

A reproducible-baseline catalog of the synthetic learning problems that appear in Geoffrey Hinton’s experimental papers from 1981 through 2022 — implemented in pure numpy, runnable on a laptop CPU, with paper-comparison metrics per stub.

Site: https://cybertronai.github.io/hinton-problems/ • Catalog: RESULTS.md • 53 of 53 stubs implemented (PRs #32–#41, all merged 2026-05-03)

Introduction

The field has standardized on backprop by the end of the ’80s, and Hinton gives a sample of problems that were used at the time. In the last 20 years, we have transitioned to GPUs, and the math has changed considerably. Instead of being bottlenecked by arithmetic, the shrinking of transistors means that arithmetic is essentially free, and all of the work comes from data movement. Backprop is inefficient in terms of “commute to compute ratio” because it requires fetching all of the activations for each gradient add.

So a natural experiment would be to redo key experiments of this time with a focus on data movement. The first step is to get a baseline — to establish the list of problems which are famous (made by Hinton), reasonable to implement, and easy to run/reproduce.

— Yaroslav, issue #1 (Sutro Group)

This repository is that baseline. v1 ships 53 implementations covering the lineage from the 4-2-4 encoder (1985) through the shifter (1986), bars (1995), MultiMNIST (2017), Constellations (2019), Ellipse World (2022), and the Forward-Forward suite (2022). Each stub is a self-contained folder with model + train + eval + visualization + animated GIF, all in numpy, all runnable in <5 min per seed on an M-series laptop.

The next step (#45 v2) instruments these 53 baselines with ByteDMD — Yaroslav’s data-movement cost tracer — to measure the actual “commute” each algorithm pays.

What’s here

27 reproduce paper claims	25 partial reproductions	1 non-replication
full or qualitative match	algorithm works, paper-config gap documented	gap analysed in 3 causes

Pure numpy + matplotlib throughout. Every stub runs on a laptop CPU. Each problem lives in its own folder with <slug>.py (model + train + eval), README.md, make_<slug>_gif.py, visualize_<slug>.py, an animated <slug>.gif, and a viz/ folder of training curves and weight visualizations.

Visual tour


`encoder-4-2-4` — Ackley/Hinton/Sejnowski 1985, the worked example. Bipartite RBM, 2-bit code emerges.	`spline-images-factorial-vq` — Hinton/Zemel 1994, factorial VQ wins 3× over standard 24-VQ baseline.

`ellipse-world` — Culp/Sabour/Hinton 2022, eGLOM islands form across iterations (5-class, 92.2%).	`ff-recurrent-mnist` — Hinton 2022, top-down recurrent Forward-Forward.

Catalog

Each table shows the v1 result per stub. Full per-stub metrics (compile-time, GIF size, headline numbers) are in RESULTS.md.

Reproduces? legend: yes = matches paper qualitatively or quantitatively; partial = method works, paper number not fully reached (gap documented in stub README); no = paper claim does not replicate.

1980s — Connectionist foundations

Ackley, Hinton & Sejnowski (1985) — A learning algorithm for Boltzmann machines

Problem	Reproduces?	Implementation	Run wallclock
encoder-4-2-4 ★	yes (CD-k variant)	n/a (worked example)	~1s
encoder-3-parity	yes (KL = log 2 visible-only; RBM drops to 0.10)	~50 min	0.04s + 1.3s
encoder-4-3-4	yes (60% error-correcting rate / 30 seeds)	~3 hr	2.3s
encoder-8-3-8	yes (16/20 = exact paper parity)	~2 hr	~20s/seed
encoder-40-10-40	yes (exceeds paper: 100% vs 98.6%)	~1.5 hr	6s

Rumelhart, Hinton & Williams (1986) — Learning internal representations by error propagation

Problem	Reproduces?	Implementation	Run wallclock
xor	yes (qualitative)	6.4 min	0.3s
n-bit-parity	yes (qualitative; thermometer code partial)	30 min	0.20s
encoder-backprop-8-3-8	yes (70% strict 8/8 distinct codes)	~10 min	0.6s
distributed-to-local-bottleneck	yes (graded values 0.007/0.167/0.553/0.971)	75 min	0.082s
symmetry	yes (1 : 1.994 : 3.969 weight ratio)	12.8 min	0.4s
binary-addition	yes (qualitatively; 4-3-3 succeeds, 4-2-3 stuck)	~2 hr	44s
negation	yes (4-6-3 deviation justified)	25 min	0.10s
t-c-discrimination	yes (all 3 detector families emerge)	30 min	0.69s
recurrent-shift-register	yes (89 sweeps N=3, 121 sweeps N=5)	25 min	0.9s / 1.1s
sequence-lookup-25	yes (4-5/5 held-out generalization)	70 min	0.20s / 5.78s

Hinton (1986) — Distributed representations of concepts

Problem	Reproduces?	Implementation	Run wallclock
family-trees	yes (3/4 best, 1.9/4 mean — matches paper)	~1 hr	2.1s

Hinton & Sejnowski (1986) — Learning and relearning in Boltzmann machines

Problem	Reproduces?	Implementation	Run wallclock
shifter	yes (92.3% recognition; position-pair detectors)	30 min	14s
grapheme-sememe	yes (qualitative; +6.7pp spontaneous recovery)	70 min	1.7s

Plaut & Hinton (1987) — Learning sets of filters using back-propagation

Problem	Reproduces?	Implementation	Run wallclock
riser-spectrogram	yes (98.08% net vs 98.90% Bayes; gap +0.83pp)	~7 min	0.91s

Hinton & Plaut (1987) — Using fast weights to deblur old memories

Problem	Reproduces?	Implementation	Run wallclock
fast-weights-rehearsal	yes (rehearsed-subset recovery +22pp / 30 seeds)	25 min	0.14s

1990s — Unsupervised learning, mixtures, the Helmholtz machine

Jacobs, Jordan, Nowlan & Hinton (1991) — Adaptive mixtures of local experts

Problem	Reproduces?	Implementation	Run wallclock
vowel-mixture-experts	partial (MoE 92.8% / MLP 90.1%; gate partitions vowels)	70 min	0.09s

Becker & Hinton (1992) — A self-organizing neural network that discovers surfaces in random-dot stereograms

Problem	Reproduces?	Implementation	Run wallclock
random-dot-stereograms	yes (Imax 1.18 nats; disparity readout 0.74)	~1 hr	6.1s

Nowlan & Hinton (1992) — Simplifying neural networks by soft weight-sharing

Problem	Reproduces?	Implementation	Run wallclock
sunspots	yes (MoG ≤ decay ≤ vanilla; weight peaks at 0 + 0.27)	~1 hr	5s

Hinton & Zemel (1994) — Autoencoders, MDL and Helmholtz free energy

Problem	Reproduces?	Implementation	Run wallclock
spline-images-factorial-vq	yes (factorial wins 3× over 24-VQ baseline)	~1 hr	~5s

Zemel & Hinton (1995) — Learning population codes by minimizing description length

Problem	Reproduces?	Implementation	Run wallclock
dipole-position	partial (R² = 0.81; supervised warm-up needed)	~3 hr	2s
dipole-3d-constraint	yes (qualitatively; 3 dims emerge)	~1 hr	11s
dipole-what-where	partial (perpendicular manifolds, lin-sep 0.58)	~1 hr	2s

Dayan, Hinton, Neal & Zemel (1995) — The Helmholtz machine

Problem	Reproduces?	Implementation	Run wallclock
helmholtz-shifter	partial (3 of 4 layer-3 units shift-selective; n_top=4)	75 min	209s

Hinton, Dayan, Frey & Neal (1995) — The wake-sleep algorithm

Problem	Reproduces?	Implementation	Run wallclock
bars	partial (KL = 0.451 bits vs paper 0.10)	70 min	222s

2000s — Products of experts, contrastive divergence, deep belief nets

Hinton (2000) — Training products of experts by minimizing contrastive divergence

Problem	Reproduces?	Implementation	Run wallclock
bars-rbm	yes (7/8 bars at purity ≥0.5; 8/8 with n_hidden=16)	~30 min	1.5s

Memisevic & Hinton (2007) — Unsupervised learning of image transformations

Problem	Reproduces?	Implementation	Run wallclock
transforming-pairs	partial (axis-selective transformation detectors)	~1 hr	2s

Sutskever & Hinton (2007) — Multilevel distributed representations for high-dimensional sequences

Problem	Reproduces?	Implementation	Run wallclock
bouncing-balls-2	partial (rollout MSE between baselines)	75 min	6.2s

Sutskever, Hinton & Taylor (2008) — The recurrent temporal RBM

Problem	Reproduces?	Implementation	Run wallclock
bouncing-balls-3	partial (CD-1 recon 0.005; rollout 0.13)	~1 hr	3.4s

2010s — Capsules, distillation, attention

Hinton, Krizhevsky & Wang (2011) — Transforming auto-encoders

Problem	Reproduces?	Implementation	Run wallclock
transforming-autoencoders	yes (R²(dx)=0.78, R²(dy)=0.67)	~30 min	100s

Tang, Salakhutdinov & Hinton (2012) — Deep Lambertian Networks

Problem	Reproduces?	Implementation	Run wallclock
deep-lambertian-spheres	yes (normal angular err 27°; albedo 7× baseline)	~50 min	33s

Sutskever, Martens, Dahl & Hinton (2013) — On the importance of initialization and momentum

Problem	Reproduces?	Implementation	Run wallclock
rnn-pathological	yes (3 of 4 tasks; ortho beats random init)	2.5 hr	42s

Hinton, Vinyals & Dean (2015) — Distilling the knowledge in a neural network

Problem	Reproduces?	Implementation	Run wallclock
distillation-mnist-omitted-3	yes (97.82% on digit-3 post-correction; paper 98.6%)	40 min	121.8s

Eslami, Heess, Weber, Tassa, Szepesvari, Kavukcuoglu & Hinton (2016) — Attend, Infer, Repeat

Problem	Reproduces?	Implementation	Run wallclock
air-multimnist	partial (count 79.7%; reconstructions blurry)	~50 min	6s
air-3d-primitives	partial (1-prim 88.8%; 3-prim count 81%)	~50 min	11.7s

Ba, Hinton, Mnih, Leibo & Ionescu (2016) — Using fast weights to attend to the recent past

Problem	Reproduces?	Implementation	Run wallclock
fast-weights-associative-retrieval	partial (architecture verified; 38% retrieval)	~3 hr	293s
multi-level-glimpse-mnist	partial (82.46% vs paper 90%+)	~1 hr	1199s
catch-game	partial (FW 33.9% vs vanilla 11.4%; 91% at size=10)	~2 hr	~50s

Sabour, Frosst & Hinton (2017) — Dynamic routing between capsules

Problem	Reproduces?	Implementation	Run wallclock
affnist	no (gap wrong sign: −2% vs paper +13%)	~3 hr	4 min
multimnist-capsnet	partial (48.6% vs target 80%; 22× chance)	~3 hr	395s

Hinton, Sabour & Frosst (2018) — Matrix capsules with EM routing

Problem	Reproduces?	Implementation	Run wallclock
smallnorb-novel-viewpoint	yes qualitatively (caps 0.726 vs CNN 0.696 held-out)	~1 hr	10s

Kosiorek, Sabour, Teh & Hinton (2019) — Stacked capsule autoencoders

Problem	Reproduces?	Implementation	Run wallclock
constellations	yes (per-point recovery 86.9% best / 84% mean)	~75 min	25s

2020s — Subclass distillation, GLOM, Forward-Forward

Müller, Kornblith & Hinton (2020) — Subclass distillation

Problem	Reproduces?	Implementation	Run wallclock
mnist-2x5-subclass	partial (subclass recovery 82.88% best / 73.87% mean)	~50 min	13s

Sabour, Tagliasacchi, Yazdani, Hinton & Fleet (2021) — Unsupervised part representation by flow capsules

Problem	Reproduces?	Implementation	Run wallclock
geo-flow-capsules	yes (mean IoU 0.764 / chance 0.20)	~8 min	43s

Culp, Sabour & Hinton (2022) — Testing GLOM’s ability to infer wholes from ambiguous parts

Problem	Reproduces?	Implementation	Run wallclock
ellipse-world	yes (92.2% on 5-class; islands form +0.117)	~1 hr	9s

Hinton (2022) — The forward-forward algorithm: some preliminary investigations

Problem	Reproduces?	Implementation	Run wallclock
ff-hybrid-mnist	partial (5.21% test err vs paper 1.37%)	~75 min	492s
ff-label-in-input	partial (3.60% vs paper 1.36%)	~1 hr	66s
ff-recurrent-mnist	partial (10.66% vs paper 1.31%)	~1 hr	216s
ff-cifar-locally-connected	partial (FF 22.78% / BP 38.31%)	~3 hr	150s
ff-aesop-sequences	yes (TF 53% / SG 34%; baselines 3-20%)	~12 min	131s

Structure

problem-folder/
├── README.md                  source paper, problem, results, deviations
├── <slug>.py                  dataset + model + train + eval
├── visualize_<slug>.py        training curves + weight viz
├── make_<slug>_gif.py         animated GIF
├── <slug>.gif                 committed animation
└── viz/                       committed PNGs

Roadmap

#45 v2: ByteDMD instrumentation — measure data-movement cost per stub on these baselines (the actual research goal)
#46 v1.5: paper-scale reruns — close the 25 partial reproductions on Modal/GPU
See Open questions / next experiments section in each stub README for stub-specific follow-ups

Contributing

Implementations follow the v1 spec:

Each stub fills in <slug>.py (model + train + eval), an 8-section README.md, make_<slug>_gif.py, visualize_<slug>.py, an animated <slug>.gif, and viz/ PNGs.
Acceptance: reproduces in <5 min on a laptop; final accuracy with seed in Results table; GIF illustrates problem AND learning dynamics; “Deviations from the original” section honest; at least one open question.
v1 metrics in PR body: "Paper reports X; we got Y. Reproduces: yes/no." + run wallclock + implementation wallclock.

The v1.5 reruns (#46) and v2 ByteDMD work (#45) welcome contributions.

License

The hinton-problems source and documentation are released into the public domain under the Unlicense.

Keyboard shortcuts

Hinton Problems