Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hinton Problems

A reproducible-baseline catalog of the synthetic learning problems that appear in Geoffrey Hinton’s experimental papers from 1981 through 2022 — implemented in pure numpy, runnable on a laptop CPU, with paper-comparison metrics per stub.

Site: https://cybertronai.github.io/hinton-problems/ • Catalog: RESULTS.md53 of 53 stubs implemented (PRs #32–#41, all merged 2026-05-03)

Introduction

The field has standardized on backprop by the end of the ’80s, and Hinton gives a sample of problems that were used at the time. In the last 20 years, we have transitioned to GPUs, and the math has changed considerably. Instead of being bottlenecked by arithmetic, the shrinking of transistors means that arithmetic is essentially free, and all of the work comes from data movement. Backprop is inefficient in terms of “commute to compute ratio” because it requires fetching all of the activations for each gradient add.

So a natural experiment would be to redo key experiments of this time with a focus on data movement. The first step is to get a baseline — to establish the list of problems which are famous (made by Hinton), reasonable to implement, and easy to run/reproduce.

— Yaroslav, issue #1 (Sutro Group)

This repository is that baseline. v1 ships 53 implementations covering the lineage from the 4-2-4 encoder (1985) through the shifter (1986), bars (1995), MultiMNIST (2017), Constellations (2019), Ellipse World (2022), and the Forward-Forward suite (2022). Each stub is a self-contained folder with model + train + eval + visualization + animated GIF, all in numpy, all runnable in <5 min per seed on an M-series laptop.

The next step (#45 v2) instruments these 53 baselines with ByteDMD — Yaroslav’s data-movement cost tracer — to measure the actual “commute” each algorithm pays.

What’s here

27 reproduce paper claims25 partial reproductions1 non-replication
full or qualitative matchalgorithm works, paper-config gap documentedgap analysed in 3 causes

Pure numpy + matplotlib throughout. Every stub runs on a laptop CPU. Each problem lives in its own folder with <slug>.py (model + train + eval), README.md, make_<slug>_gif.py, visualize_<slug>.py, an animated <slug>.gif, and a viz/ folder of training curves and weight visualizations.

Visual tour

encoder-4-2-4spline-images-factorial-vq
encoder-4-2-4 — Ackley/Hinton/Sejnowski 1985, the worked example. Bipartite RBM, 2-bit code emerges.spline-images-factorial-vq — Hinton/Zemel 1994, factorial VQ wins 3× over standard 24-VQ baseline.
ellipse-worldff-recurrent-mnist
ellipse-world — Culp/Sabour/Hinton 2022, eGLOM islands form across iterations (5-class, 92.2%).ff-recurrent-mnist — Hinton 2022, top-down recurrent Forward-Forward.

Catalog

Each table shows the v1 result per stub. Full per-stub metrics (compile-time, GIF size, headline numbers) are in RESULTS.md.

Reproduces? legend: yes = matches paper qualitatively or quantitatively; partial = method works, paper number not fully reached (gap documented in stub README); no = paper claim does not replicate.

1980s — Connectionist foundations

Ackley, Hinton & Sejnowski (1985) — A learning algorithm for Boltzmann machines

ProblemReproduces?ImplementationRun wallclock
encoder-4-2-4yes (CD-k variant)n/a (worked example)~1s
encoder-3-parityyes (KL = log 2 visible-only; RBM drops to 0.10)~50 min0.04s + 1.3s
encoder-4-3-4yes (60% error-correcting rate / 30 seeds)~3 hr2.3s
encoder-8-3-8yes (16/20 = exact paper parity)~2 hr~20s/seed
encoder-40-10-40yes (exceeds paper: 100% vs 98.6%)~1.5 hr6s

Rumelhart, Hinton & Williams (1986) — Learning internal representations by error propagation

ProblemReproduces?ImplementationRun wallclock
xoryes (qualitative)6.4 min0.3s
n-bit-parityyes (qualitative; thermometer code partial)30 min0.20s
encoder-backprop-8-3-8yes (70% strict 8/8 distinct codes)~10 min0.6s
distributed-to-local-bottleneckyes (graded values 0.007/0.167/0.553/0.971)75 min0.082s
symmetryyes (1 : 1.994 : 3.969 weight ratio)12.8 min0.4s
binary-additionyes (qualitatively; 4-3-3 succeeds, 4-2-3 stuck)~2 hr44s
negationyes (4-6-3 deviation justified)25 min0.10s
t-c-discriminationyes (all 3 detector families emerge)30 min0.69s
recurrent-shift-registeryes (89 sweeps N=3, 121 sweeps N=5)25 min0.9s / 1.1s
sequence-lookup-25yes (4-5/5 held-out generalization)70 min0.20s / 5.78s

Hinton (1986) — Distributed representations of concepts

ProblemReproduces?ImplementationRun wallclock
family-treesyes (3/4 best, 1.9/4 mean — matches paper)~1 hr2.1s

Hinton & Sejnowski (1986) — Learning and relearning in Boltzmann machines

ProblemReproduces?ImplementationRun wallclock
shifteryes (92.3% recognition; position-pair detectors)30 min14s
grapheme-sememeyes (qualitative; +6.7pp spontaneous recovery)70 min1.7s

Plaut & Hinton (1987) — Learning sets of filters using back-propagation

ProblemReproduces?ImplementationRun wallclock
riser-spectrogramyes (98.08% net vs 98.90% Bayes; gap +0.83pp)~7 min0.91s

Hinton & Plaut (1987) — Using fast weights to deblur old memories

ProblemReproduces?ImplementationRun wallclock
fast-weights-rehearsalyes (rehearsed-subset recovery +22pp / 30 seeds)25 min0.14s

1990s — Unsupervised learning, mixtures, the Helmholtz machine

Jacobs, Jordan, Nowlan & Hinton (1991) — Adaptive mixtures of local experts

ProblemReproduces?ImplementationRun wallclock
vowel-mixture-expertspartial (MoE 92.8% / MLP 90.1%; gate partitions vowels)70 min0.09s

Becker & Hinton (1992) — A self-organizing neural network that discovers surfaces in random-dot stereograms

ProblemReproduces?ImplementationRun wallclock
random-dot-stereogramsyes (Imax 1.18 nats; disparity readout 0.74)~1 hr6.1s

Nowlan & Hinton (1992) — Simplifying neural networks by soft weight-sharing

ProblemReproduces?ImplementationRun wallclock
sunspotsyes (MoG ≤ decay ≤ vanilla; weight peaks at 0 + 0.27)~1 hr5s

Hinton & Zemel (1994) — Autoencoders, MDL and Helmholtz free energy

ProblemReproduces?ImplementationRun wallclock
spline-images-factorial-vqyes (factorial wins 3× over 24-VQ baseline)~1 hr~5s

Zemel & Hinton (1995) — Learning population codes by minimizing description length

ProblemReproduces?ImplementationRun wallclock
dipole-positionpartial (R² = 0.81; supervised warm-up needed)~3 hr2s
dipole-3d-constraintyes (qualitatively; 3 dims emerge)~1 hr11s
dipole-what-wherepartial (perpendicular manifolds, lin-sep 0.58)~1 hr2s

Dayan, Hinton, Neal & Zemel (1995) — The Helmholtz machine

ProblemReproduces?ImplementationRun wallclock
helmholtz-shifterpartial (3 of 4 layer-3 units shift-selective; n_top=4)75 min209s

Hinton, Dayan, Frey & Neal (1995) — The wake-sleep algorithm

ProblemReproduces?ImplementationRun wallclock
barspartial (KL = 0.451 bits vs paper 0.10)70 min222s

2000s — Products of experts, contrastive divergence, deep belief nets

Hinton (2000) — Training products of experts by minimizing contrastive divergence

ProblemReproduces?ImplementationRun wallclock
bars-rbmyes (7/8 bars at purity ≥0.5; 8/8 with n_hidden=16)~30 min1.5s

Memisevic & Hinton (2007) — Unsupervised learning of image transformations

ProblemReproduces?ImplementationRun wallclock
transforming-pairspartial (axis-selective transformation detectors)~1 hr2s

Sutskever & Hinton (2007) — Multilevel distributed representations for high-dimensional sequences

ProblemReproduces?ImplementationRun wallclock
bouncing-balls-2partial (rollout MSE between baselines)75 min6.2s

Sutskever, Hinton & Taylor (2008) — The recurrent temporal RBM

ProblemReproduces?ImplementationRun wallclock
bouncing-balls-3partial (CD-1 recon 0.005; rollout 0.13)~1 hr3.4s

2010s — Capsules, distillation, attention

Hinton, Krizhevsky & Wang (2011) — Transforming auto-encoders

ProblemReproduces?ImplementationRun wallclock
transforming-autoencodersyes (R²(dx)=0.78, R²(dy)=0.67)~30 min100s

Tang, Salakhutdinov & Hinton (2012) — Deep Lambertian Networks

ProblemReproduces?ImplementationRun wallclock
deep-lambertian-spheresyes (normal angular err 27°; albedo 7× baseline)~50 min33s

Sutskever, Martens, Dahl & Hinton (2013) — On the importance of initialization and momentum

ProblemReproduces?ImplementationRun wallclock
rnn-pathologicalyes (3 of 4 tasks; ortho beats random init)2.5 hr42s

Hinton, Vinyals & Dean (2015) — Distilling the knowledge in a neural network

ProblemReproduces?ImplementationRun wallclock
distillation-mnist-omitted-3yes (97.82% on digit-3 post-correction; paper 98.6%)40 min121.8s

Eslami, Heess, Weber, Tassa, Szepesvari, Kavukcuoglu & Hinton (2016) — Attend, Infer, Repeat

ProblemReproduces?ImplementationRun wallclock
air-multimnistpartial (count 79.7%; reconstructions blurry)~50 min6s
air-3d-primitivespartial (1-prim 88.8%; 3-prim count 81%)~50 min11.7s

Ba, Hinton, Mnih, Leibo & Ionescu (2016) — Using fast weights to attend to the recent past

ProblemReproduces?ImplementationRun wallclock
fast-weights-associative-retrievalpartial (architecture verified; 38% retrieval)~3 hr293s
multi-level-glimpse-mnistpartial (82.46% vs paper 90%+)~1 hr1199s
catch-gamepartial (FW 33.9% vs vanilla 11.4%; 91% at size=10)~2 hr~50s

Sabour, Frosst & Hinton (2017) — Dynamic routing between capsules

ProblemReproduces?ImplementationRun wallclock
affnistno (gap wrong sign: −2% vs paper +13%)~3 hr4 min
multimnist-capsnetpartial (48.6% vs target 80%; 22× chance)~3 hr395s

Hinton, Sabour & Frosst (2018) — Matrix capsules with EM routing

ProblemReproduces?ImplementationRun wallclock
smallnorb-novel-viewpointyes qualitatively (caps 0.726 vs CNN 0.696 held-out)~1 hr10s

Kosiorek, Sabour, Teh & Hinton (2019) — Stacked capsule autoencoders

ProblemReproduces?ImplementationRun wallclock
constellationsyes (per-point recovery 86.9% best / 84% mean)~75 min25s

2020s — Subclass distillation, GLOM, Forward-Forward

Müller, Kornblith & Hinton (2020) — Subclass distillation

ProblemReproduces?ImplementationRun wallclock
mnist-2x5-subclasspartial (subclass recovery 82.88% best / 73.87% mean)~50 min13s

Sabour, Tagliasacchi, Yazdani, Hinton & Fleet (2021) — Unsupervised part representation by flow capsules

ProblemReproduces?ImplementationRun wallclock
geo-flow-capsulesyes (mean IoU 0.764 / chance 0.20)~8 min43s

Culp, Sabour & Hinton (2022) — Testing GLOM’s ability to infer wholes from ambiguous parts

ProblemReproduces?ImplementationRun wallclock
ellipse-worldyes (92.2% on 5-class; islands form +0.117)~1 hr9s

Hinton (2022) — The forward-forward algorithm: some preliminary investigations

ProblemReproduces?ImplementationRun wallclock
ff-hybrid-mnistpartial (5.21% test err vs paper 1.37%)~75 min492s
ff-label-in-inputpartial (3.60% vs paper 1.36%)~1 hr66s
ff-recurrent-mnistpartial (10.66% vs paper 1.31%)~1 hr216s
ff-cifar-locally-connectedpartial (FF 22.78% / BP 38.31%)~3 hr150s
ff-aesop-sequencesyes (TF 53% / SG 34%; baselines 3-20%)~12 min131s

Structure

problem-folder/
├── README.md                  source paper, problem, results, deviations
├── <slug>.py                  dataset + model + train + eval
├── visualize_<slug>.py        training curves + weight viz
├── make_<slug>_gif.py         animated GIF
├── <slug>.gif                 committed animation
└── viz/                       committed PNGs

Roadmap

  • #45 v2: ByteDMD instrumentation — measure data-movement cost per stub on these baselines (the actual research goal)
  • #46 v1.5: paper-scale reruns — close the 25 partial reproductions on Modal/GPU
  • See Open questions / next experiments section in each stub README for stub-specific follow-ups

Contributing

Implementations follow the v1 spec:

  • Each stub fills in <slug>.py (model + train + eval), an 8-section README.md, make_<slug>_gif.py, visualize_<slug>.py, an animated <slug>.gif, and viz/ PNGs.
  • Acceptance: reproduces in <5 min on a laptop; final accuracy with seed in Results table; GIF illustrates problem AND learning dynamics; “Deviations from the original” section honest; at least one open question.
  • v1 metrics in PR body: "Paper reports X; we got Y. Reproduces: yes/no." + run wallclock + implementation wallclock.

The v1.5 reruns (#46) and v2 ByteDMD work (#45) welcome contributions.

License

The hinton-problems source and documentation are released into the public domain under the Unlicense.