Eric Jang – Building AlphaGo from scratch
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “Eric Jang – Building AlphaGo from scratch” inside PodZeus.
In this comprehensive three-part episode of the Dwarkesh Podcast, host Dwarkesh Patel engages in a deep and insightful conversation with Eric Jang, former VP of AI at 1x Technologies and senior research scientist at Google DeepMind Robotics, about his sabbatical project to rebuild AlphaGo from scratch. Jang unpacks the immense computational challenge of Go—its astronomical game tree complexity—and explains how AlphaGo overcame it by fusing Monte Carlo Tree Search (MCTS) with deep neural networks: a policy network to predict promising moves and a value network to estimate win probabilities. These networks are trained via self-play and supervised learning, then used to guide MCTS, which iteratively refines the policy through simulated game paths. The breakthrough lies not just in the algorithm but in the elegant insight that a 10-layer neural network can compress an intractable search into a single forward pass, effectively turning an NP-hard problem into a tractable one. Jang contrasts this with modern LLM training, where sparse, end-of-trajectory rewards lead to high-variance gradients and inefficient learning, calling it 'supervision through a straw.' He emphasizes the power of MCTS as a low-variance, high-quality supervision engine that enables stable, scalable improvement. The discussion extends to neural fictitious self-play as a scalable alternative to MCTS in complex environments and the potential of LLMs to automate AI research—though with limitations in strategic planning and recognizing dead ends. Jang also reflects on the broader implications: the transferability of skills from game-based AI research to LLM development, the importance of outer verification loops like win rates, and the profound conceptual link between MCTS and reasoning in LLMs, suggesting that studying Go may offer deep insights into general intelligence with minimal compute. The episode closes with a call to explore Jang’s open-source project and blog posts for deeper understanding. The conversation reveals a deep reverence for the foundational principles behind AlphaGo’s success, highlighting its role as a paradigm for efficient, structured learning in complex domains. Jang’s reflections underscore that while scaling compute (the 'bitter lesson') is powerful, the real breakthroughs come from architectural insights that compress complexity into manageable forms. The episode consistently emphasizes the importance of high-quality supervision, iterative refinement, and the need for robust evaluation mechanisms in AI self-improvement. Despite acknowledging the uncertainties around long-term transferability of game AI skills, the overall sentiment remains highly positive, celebrating the elegance, efficiency, and philosophical depth of AlphaGo’s design. The discussion bridges technical detail with big-picture thinking, positioning AlphaGo not just as a milestone in game AI, but as a blueprint for future advances in artificial general intelligence.
A 10-layer neural network can compress an intractable search problem like Go into a single forward pass, enabling efficient solutions to NP-hard problems.
Monte Carlo Tree Search (MCTS) combined with policy and value networks generates high-quality, low-variance supervision, making reinforcement learning far more stable and efficient than naive policy gradients.
AlphaGo’s success demonstrates that many complex problems are tractable in practice due to hidden structure, suggesting our understanding of computational complexity may be incomplete.
Modern LLM training suffers from 'supervision through a straw'—sparse, end-of-trajectory rewards that cause high gradient variance, unlike AlphaGo’s continuous, high-fidelity feedback.
Outer verification loops (e.g., win rates) are crucial for guiding AI self-improvement, but designing effective ones for broader utility remains a major challenge.
…and 3 more takeaways available in PodZeus
Why AlphaGo Matters: The Birth of a Vision
“It was just profound to see how smart AI systems could become and the kind of computational complexity class that they could tackle with deep learning.”
The Rules and Complexity of Go: A Game of Deep Strategy
Jang walks through the rules of Go, emphasizing its simplicity and deep strategic complexity. He contrasts Chinese, Japanese, and Trump-Taylor rules, highlighting how Trump-Taylor’s unambiguous scoring enables algorithmic resolution. He illustrates key concepts like capturing stones, territory control, and the endgame, showing how the game’s structure—where losing a battle can win the war—creates rich micro-macro dynamics that challenge both humans and AIs.
The Search Problem: Why Go Was Thought Intractable
Jang explains why Go was considered computationally intractable: the game tree has an astronomical number of possible paths—far exceeding the number of atoms in the universe. He introduces the concept of tree search and the explosive branching factor, showing why exhaustive search is impossible. He traces the evolution of search algorithms, from early bandit methods like UCB1 to the PUCKED criterion used in AlphaGo, which balances exploration and exploitation.
AlphaGo’s Core: Neural Networks as Search Accelerators
“A 10-layer neural network pass... is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem.”
The Problem with Naive Policy Gradient in Go
“It's interesting that this thing you're saying which would be intractable and prevents you from actually getting beyond a certain level in Go is just by default how LLMs are trained?”
“A 10-layer neural network pass... is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem.”
“The major reason is that you never have to initialize at a 0% success rate and solve the exploration problem of how to get a non-zero success rate.”
“It was just profound to see how smart AI systems could become and the kind of computational complexity class that they could tackle with deep learning.”
Host
Guest
AlphaGo
other
Monte Carlo Tree Search
other
Eric Jang
person
Go
media
LLMs
other
Dwarkesh Patel
person
Neural Fictitious Self Play
other
KataGo
other
Google DeepMind
organization
Q-learning
other
Michael Nielsen – How science actually progresses
Dwarkesh Podcast • 2h 3m • 4/7/2026
Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat
Dwarkesh Podcast • 1h 43m • 4/15/2026
Reiner Pope – The math behind how LLMs are trained and served
Dwarkesh Podcast • 2h 13m • 4/29/2026
David Reich – Why the Bronze Age was an inflection point in human evolution
Dwarkesh Podcast • 2h 13m • 5/8/2026
Reiner Pope – Chip design from the bottom up
Dwarkesh Podcast • 1h 20m • 5/22/2026
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “Eric Jang – Building AlphaGo from scratch” inside PodZeus.
Start discovering podcast insights today
Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.
No credit card required • 7-day trial • Cancel anytime
