Skip to content

Detailed Meeting Notes

Meeting #1, 19 Jan 26 - Energy-Efficient Training

Location: SPC main floor · Full notes · Google Doc

Orientation meeting. Introductions and backgrounds. Concepts introduced:

  • Memory cost is the largest energy contributor (Bill Daly talk)
  • Local registers ~5pJ vs HBM ~640pJ
  • Backprop is like the giraffe's recurrent laryngeal nerve -- works but inefficient
  • "Nerd snipe" proposal: train a model on smartphone via WebGPU using minimum joules
  • WebGPU exposes memory hierarchy (Registers -> Shared -> Global)

Takeaway

Yaroslav beat Google in 2018 DawnBench (fastest ImageNet training) not through superior intelligence but 3 months optimizing AWS infrastructure for 10-second restart cycles versus Google's 10+ minutes.


Meeting #2, 26 Jan 26 - Forward-Forward Algorithm

Location: Accel board room · Full notes · Google Doc

Discussion of Hinton's Forward-Forward paper. See also: Exp E - Forward-Forward findings.

  • Two forward passes (positive/negative) replace forward+backward
  • Greedy layer-wise learning: each layer has its own objective
  • Goodness = sum of squared ReLU activations
  • Negative data generation is the hard problem for complex domains
  • Jamie Simon shared implementation results

Meeting #3, 02 Feb 26 - Joules Measuring

Location: SPC

Tooling session.


Meeting #4, 09 Feb 26 - From Beauty to Joules

Location: Palmer Square

Presentation: From_Beauty_to_Joules.pdf


Meeting #5, 16 Feb 26 - Intelligence Per Joule

Presentation: Intelligence_Per_Joule.pdf

Karpathy Names Task introduced:

  • Take 1000 random names from makemore/names.txt
  • Predict last 3 characters of 1000 test names
  • Baseline accuracy + total operations -> optimize

Meeting #6, 23 Feb 26 - Presentations

Full notes · Google Doc

  • Germaine: presentation video — truncated backprop, 19% energy reduction, 27% intelligence-per-joule improvement
  • Emmett: Pure-Python GPT, reduced memory 80MB -> 35MB with Aster (local · Google Doc)
  • Yaroslav presented pebbling games, energy hierarchy, "drosophila of learning" concept
  • Key outcome: 3-minute MicroGPT iteration too slow — need sub-1-second task

Meeting #7, 02 Mar 26 - Sparse Parity

Full notes · Google Doc

See also: Research overview for all experiment results building on this challenge.


Meeting #8, 09 Mar 26 - Demos and Roadmap

Full notes · AI notes · Google Doc

  • Yad: Demoed the Claude Code agentic harness (video, survey, github). Harness found 1000x faster solution via GF(2). Yaroslav verified correctness and visualized the top algorithm.
  • Yaroslav: Presented Knowledge Sprint #2 on energy metrics and the bigger picture roadmap (3-axis cube: process, metric, problem).
  • Michael: Showed his Claude approach which preferred 90s-era methods.
  • Germain: Demoed supervisor/researcher harness; solutions preferred 2010s methods.
  • Uliana: Gave temperature suggestions for Germain's experiments.

Homework for next Monday: Get agents to improve Challenge #1 using ARD as the energy proxy. Present results, process, and learnings.