Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast2h 37mMay 15, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Eric Jang – Building AlphaGo from scratch” inside PodZeus.

Search in PodZeus Start Free Trial

AI-Generated Summary

In this comprehensive three-part episode of the Dwarkesh Podcast, host Dwarkesh Patel engages in a deep and insightful conversation with Eric Jang, former VP of AI at 1x Technologies and senior research scientist at Google DeepMind Robotics, about his sabbatical project to rebuild AlphaGo from scratch. Jang unpacks the immense computational challenge of Go—its astronomical game tree complexity—and explains how AlphaGo overcame it by fusing Monte Carlo Tree Search (MCTS) with deep neural networks: a policy network to predict promising moves and a value network to estimate win probabilities. These networks are trained via self-play and supervised learning, then used to guide MCTS, which iteratively refines the policy through simulated game paths. The breakthrough lies not just in the algorithm but in the elegant insight that a 10-layer neural network can compress an intractable search into a single forward pass, effectively turning an NP-hard problem into a tractable one. Jang contrasts this with modern LLM training, where sparse, end-of-trajectory rewards lead to high-variance gradients and inefficient learning, calling it 'supervision through a straw.' He emphasizes the power of MCTS as a low-variance, high-quality supervision engine that enables stable, scalable improvement. The discussion extends to neural fictitious self-play as a scalable alternative to MCTS in complex environments and the potential of LLMs to automate AI research—though with limitations in strategic planning and recognizing dead ends. Jang also reflects on the broader implications: the transferability of skills from game-based AI research to LLM development, the importance of outer verification loops like win rates, and the profound conceptual link between MCTS and reasoning in LLMs, suggesting that studying Go may offer deep insights into general intelligence with minimal compute. The episode closes with a call to explore Jang’s open-source project and blog posts for deeper understanding. The conversation reveals a deep reverence for the foundational principles behind AlphaGo’s success, highlighting its role as a paradigm for efficient, structured learning in complex domains. Jang’s reflections underscore that while scaling compute (the 'bitter lesson') is powerful, the real breakthroughs come from architectural insights that compress complexity into manageable forms. The episode consistently emphasizes the importance of high-quality supervision, iterative refinement, and the need for robust evaluation mechanisms in AI self-improvement. Despite acknowledging the uncertainties around long-term transferability of game AI skills, the overall sentiment remains highly positive, celebrating the elegance, efficiency, and philosophical depth of AlphaGo’s design. The discussion bridges technical detail with big-picture thinking, positioning AlphaGo not just as a milestone in game AI, but as a blueprint for future advances in artificial general intelligence.

Key Takeaways

A 10-layer neural network can compress an intractable search problem like Go into a single forward pass, enabling efficient solutions to NP-hard problems.

Monte Carlo Tree Search (MCTS) combined with policy and value networks generates high-quality, low-variance supervision, making reinforcement learning far more stable and efficient than naive policy gradients.

AlphaGo’s success demonstrates that many complex problems are tractable in practice due to hidden structure, suggesting our understanding of computational complexity may be incomplete.

Modern LLM training suffers from 'supervision through a straw'—sparse, end-of-trajectory rewards that cause high gradient variance, unlike AlphaGo’s continuous, high-fidelity feedback.

Outer verification loops (e.g., win rates) are crucial for guiding AI self-improvement, but designing effective ones for broader utility remains a major challenge.

…and 3 more takeaways available in PodZeus

Chapters