Cross-references
Source: Google Doc · Meeting #1 summary Key concepts introduced: Memory energy hierarchy (5pJ registers vs 640pJ HBM), backprop inefficiency, WebGPU nerd snipe Related: Project context · Goals · Bill Daly talk
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
AI links (local to yaroslav)
[https://gemini.google.com/app/baf86fdafcaac3af]
[https://gemini.google.com/app/5c855e3a869d19cf]
[https://chatgpt.com/c/6970003e-4114-8333-86a8-71a8a0e97b23]
[Link to \"Slides\"]
[Meeting notes summary]
[Host: Yaroslav Bulatov]
[Jackjack Ganbold \@JackJack]
[Jonathan Belay (@Jonathan Belay )]
[Anish Tondwalkar]
[Seth Stafford]
[Caleb Sirak]
[Anushka Deshpande]
[Daria Soboleva]
[2. Companies & Named Entities]
[3. Key Topics & Tidbits]
[4. Picture]
Link to \"Slides\"¶
from [energy-efficient learning]
Meeting notes summary¶
Host: Yaroslav Bulatov¶
- Identity: Veteran AI researcher (20+ years).
- Background:
- Google: Worked on Street View (House Numbers). Hired Ian Goodfellow as an intern.
- OpenAI: Worked on Gradient Checkpointing.
- Independent: Beat Google in the 2018 DawnBench competition (fastest ImageNet training) by optimizing infrastructure on AWS (10-second iteration cycles vs. Google's 10 minutes).
- Meta (2023): Implemented symbolic differentiation in a single Colab cell.
- Philosophy: Wants to \"satisfice\" (do just enough) rather than maximize. Tracks his \"integrated lifetime pleasure\" using audio logs of his emotional state.
- Current Goal: AI training was invented for CPUs, find a more GPU-first way to train LLMs.
- Connection to Yaroslav: reflexive
Jackjack Ganbold \@JackJack¶
- SPC Member
A founder of two companies, including Basic Nim Sims (associated with Coinbase/blockchain domains), who has developed AI developer tools for VS Code.
Jonathan Belay (@Jonathan Belay )¶
- Identity: South Park Commons Member
- Connection to Yaroslav: classmate of Darius, Yaroslav's collaborator on Transformer-XL [work]
- Bio: Runs an independent research lab.
- Focus: Deterministic methods for LLM pre-training (Algebraic Graph Theory / Spectral Graph Theory).
- Background: Former Google (\"Economic Fairness\" team) and Harvard (CS/Math).
- Business: Licenses his algorithms to chip companies (mentioned Nvidia, Google, and Etched) to solve NP-complete chip layout problems.
Anish Tondwalkar¶
- Bio: Former Google Brain (Hardware/TPU team) and OpenAI (Inference/Reasoning).
- Connection to Yaroslav: OpenAI researcher community
- Key Details:
- Worked on Project Turquoise (Transcript error: \"TurquoiseSAG\"), Google's internal custom silicon team.
- Survived \"13 reorgs\" at Google.
- Colleagues spun out to form Groq, MatX, and Positron.
- Role: The \"Realist.\" Argues that energy inefficiency is a hardware orchestration problem, not just algorithmic.
Seth Stafford¶
- Identity: \"Recovering mathematician\" (PhD). A former postdoc and mathematician (PhD Cornell 1991). Early Oracle.
- Connection to Yaroslav. Met 2017 \"Deep Learning Study Group\". Former manager of Burkay Gur (Yaroslav's former manager at [Fal.AI])
- Role: Introduced Yaroslav to the concept of \"Satisficing.\"
- Work: Applies AI to healthcare
Caleb Sirak¶
- Identity: Founder of E3 Group. 2x MIT Dropout & Founder. Hand-building \"Howard,\" a DIY AI supercomputer, and active in the Boston and SF hardware scenes exploring on-chip simulators and custom networking. Tech twitter personality (@calebsirak, currently in twitter rehab)
- Connection to Yaroslav: follow each other on Twitter
- Work: Uses AI agents for \"boring\" logistics and freight problems.
Anushka Deshpande¶
- Identity: works at [https://www.arcee.ai/], building \"American DeepSeek\"
- Connection to Yaroslav: We met at a Tilde Research/Cruseo organized Poker night last year.
Daria Soboleva¶
- Identity: Head Researcher at Cerebras. Manages researchers and leads large-scale MoE training on Cerebras. Author of [https://www.cerebras.ai/moe-guide]
- Connection to Yaroslav: Russians
2. Companies & Named Entities¶
- Arcee.ai (\"RC\"): Yaroslav refers to \"RC\" as the \"American DeepSeek\" (known for efficient, domain-adapted SLMs).
- Project Turquoise: (Transcript: \"TurquoiseSAG\"). Google's internal custom silicon division.
- Basenames: (Transcript: \"Basic Nim Sims\"). Identity protocol on the Base blockchain.
- Google Antigravity: A new/internal Agentic IDE from Google. John and William discussed using it to generate full apps and deploy them to Vercel in minutes from screenshots.
- Cerebras: Hardware company. The group criticized it for the \"wafer-scale\" approach (yield issues) and bad software. Anish called it a \"worse Groq.\"
- Etched: A new Transformer ASIC company
- MatX / Positron: Hardware startups Anish mentioned as legitimate contenders.
- Ainekko: (Transcript: \"AInico\"). Startup that bought Esperanto Technologies' IP to open-source their RISC-V work.
3. Key Topics & Tidbits¶
- The \"Giraffe Nerve\" Thesis: Yaroslav argues Backpropagation is like the Recurrent Laryngeal Nerve in giraffes (which takes a massive detour due to evolution). It works, but it's inefficient because it requires global memory access (HBM), which costs \~640pJ vs 5pJ for local registers.
- Goal: Find a \"local\" update rule that mimics the brain (\~20 Watts).
- The \"Nerd Snipe\":
- Proposal to launch a competition: \"Train a model on a smartphone via WebGPU using the minimum energy (Joules).\"
- WebGPU is chosen because it exposes the memory hierarchy (Registers -> Shared -> Global), forcing developers to optimize data movement manually.
- Infrastructure > Intelligence: Yaroslav claimed he beat Google in 2018 not because he was smarter, but because he spent 3 months optimizing AWS infrastructure to restart runs in 10 seconds, while Google engineers waited 10+ minutes.
4. Picture¶
[embedded image]