Detailed Meeting Notes¶
Meeting #1, 19 Jan 26 - Energy-Efficient Training¶
Location: SPC main floor · Full notes · Google Doc
Orientation meeting. Introductions and backgrounds. Concepts introduced:
- Memory cost is the largest energy contributor (Bill Daly talk)
- Local registers ~5pJ vs HBM ~640pJ
- Backprop is like the giraffe's recurrent laryngeal nerve -- works but inefficient
- "Nerd snipe" proposal: train a model on smartphone via WebGPU using minimum joules
- WebGPU exposes memory hierarchy (Registers -> Shared -> Global)
Takeaway
Yaroslav beat Google in 2018 DawnBench (fastest ImageNet training) not through superior intelligence but 3 months optimizing AWS infrastructure for 10-second restart cycles versus Google's 10+ minutes.
Meeting #2, 26 Jan 26 - Forward-Forward Algorithm¶
Location: Accel board room · Full notes · Google Doc
Discussion of Hinton's Forward-Forward paper. See also: Exp E - Forward-Forward findings.
- Two forward passes (positive/negative) replace forward+backward
- Greedy layer-wise learning: each layer has its own objective
- Goodness = sum of squared ReLU activations
- Negative data generation is the hard problem for complex domains
- Jamie Simon shared implementation results
Meeting #3, 02 Feb 26 - Joules Measuring¶
Location: SPC
Tooling session.
- Barak demonstrated Modal workflow
- Yaroslav demonstrated Colab workflow
- Joules-measuring notebook
Meeting #4, 09 Feb 26 - From Beauty to Joules¶
Location: Palmer Square
Presentation: From_Beauty_to_Joules.pdf
Meeting #5, 16 Feb 26 - Intelligence Per Joule¶
Presentation: Intelligence_Per_Joule.pdf
Karpathy Names Task introduced:
- Take 1000 random names from makemore/names.txt
- Predict last 3 characters of 1000 test names
- Baseline accuracy + total operations -> optimize
Meeting #6, 23 Feb 26 - Presentations¶
- Germaine: presentation video — truncated backprop, 19% energy reduction, 27% intelligence-per-joule improvement
- Emmett: Pure-Python GPT, reduced memory 80MB -> 35MB with Aster (local · Google Doc)
- Yaroslav presented pebbling games, energy hierarchy, "drosophila of learning" concept
- Key outcome: 3-minute MicroGPT iteration too slow — need sub-1-second task
Meeting #7, 02 Mar 26 - Sparse Parity¶
- Yaroslav presented Technical Sprint 1 results — 2.5hr sprint, ARD metric, gradient fusion (16% cache reuse improvement)
- Andy attempted better chat tooling (codeberg)
- Michael showed Pebbling Game implementation
- Homework assigned: Challenge #1: Sparse Parity ("Drosophila of Learning")
See also: Research overview for all experiment results building on this challenge.
Meeting #8, 09 Mar 26 - Demos and Roadmap¶
Full notes · AI notes · Google Doc
- Yad: Demoed the Claude Code agentic harness (video, survey, github). Harness found 1000x faster solution via GF(2). Yaroslav verified correctness and visualized the top algorithm.
- Yaroslav: Presented Knowledge Sprint #2 on energy metrics and the bigger picture roadmap (3-axis cube: process, metric, problem).
- Michael: Showed his Claude approach which preferred 90s-era methods.
- Germain: Demoed supervisor/researcher harness; solutions preferred 2010s methods.
- Uliana: Gave temperature suggestions for Germain's experiments.
Homework for next Monday: Get agents to improve Challenge #1 using ARD as the energy proxy. Present results, process, and learnings.