#577: My Dream "home lab"
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “#577: My Dream "home lab"” inside PodZeus.
David Bombal tours Cisco's AI lab to uncover the hidden infrastructure powering modern large-scale AI training — and reveals that the real bottleneck isn't the GPUs, but the network. What looks like a dream home lab to most is actually a $20 million, power-hungry beast where a single bad cable or misconfigured optic can cost $8 million annually in lost efficiency. Bombal learns that AI clusters aren't just about raw compute; they're systems of extreme precision, where 100 terabit switches, 1.6 terabit interfaces, and LPO optics are essential to prevent catastrophic job failures. The lab proves scalability not through brute force, but by testing 128-GPU units and simulating tens of thousands of flows with RDMA and CPU clusters. Most shockingly, Cisco’s entire strategy hinges on Ethernet — not InfiniBand — because it offers the scale, choice, and future-proofing needed for hundreds of thousands of GPUs. Even security is being reimagined: firewalls are no longer just at the edge, but embedded in DPUs, switches, and servers. This isn't just hardware — it's a complete, validated system where every component, from storage to software, must work in harmony. The episode’s core revelation? AI success isn’t about buying the fastest GPU. It’s about building a network so robust, so intelligent, and so thoroughly tested that it turns a $20 million lab into a reliable engine for billion-dollar AI models.
A single bad cable or optic in a GPU cluster can cost $8 million per year in lost efficiency due to 5% performance loss.
AI training clusters require 100 terabit/sec switches and 1.6 terabit/sec interfaces to prevent packet loss and job failure.
Cisco uses a 128-GPU test unit and 512-node CPU clusters to mathematically simulate and validate performance at scale.
Ethernet is now winning over InfiniBand in AI data centers due to scalability, multi-vendor choice, and future-proofing beyond 30,000 GPUs.
Security is moving inside the cluster — firewalls are now embedded in DPUs, switches, and servers, not just at the edge.
…and 3 more takeaways available in PodZeus
The Dream Lab That Costs $20M
“This is the type of home lab that I'd love to have. We've got GPUs, switches, fiber, storage, and a whole bunch more. The only problem is, is that it costs about 20 million US dollars and I probably need a power plant just to run it.”
The Hidden Cost of Failure
“Even if just some packets go missing, it can cost a lot of money. You know the GPU cost? It's like $2 per hour per GPU. And if you have 5% efficiency loss, that's like $8 million per year.”
From GPUs to the Network: The Real AI Engine
David learns that AI clusters aren't just about GPUs — the network fabric, switches, optics, and software are equally critical to performance and job success.
The 100 Tbps Switch and the Future of AI Networking
Cisco’s G300 100 terabit switch and Spectrum 6 with NVIDIA silicon are designed to handle the massive scale of AI clusters, with 1.6 terabit interfaces and deep integration.
Scale Across: Connecting Data Centers for AI
Beyond scale-out within a data center, Cisco’s P200 switch enables 'scale across' — running AI jobs across multiple data centers with unified routing and policy enforcement.
“AI is not just about GPUs. The GPUs are the part that a lot of people talk about. but the network is what makes it work together.”
“Even if just some packets go missing, it can cost a lot of money. You know the GPU cost? It's like $2 per hour per GPU. And if you rent a 10 ,000 GPU cluster, that costs $175 million”
“One time we were training the DLRM model and our performance was very, very poor. So when we analyzed that, there are a lot of tools available so that we can analyze profile. And we found out the storage. Storage was the bottleneck.”
Host
Guests
Cisco
organization
NVIDIA
organization
Rakesh
person
Will
person
Richard
person
G300
product
P200
product
H200
product
ShareNAI
organization
Spectrum 6
product
#568: 5-Minute Cyber Hacks Everyone Should Know (2026)
David Bombal • 36m • 3/31/2026
#570: 100 Terabit Smart Switches: What You Need to Know
David Bombal • 36m • 3/31/2026
#572: How Cisco Protects AI Agents in Modern Data Centers
David Bombal • 14m • 3/31/2026
#573: WhatsApp Hackers for Hire on the Dark Web (Surprisingly cheap)
David Bombal • 27m • 4/7/2026
#574: Hacking Windows Active Directory in 10 minutes
David Bombal • 25m • 4/16/2026
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “#577: My Dream "home lab"” inside PodZeus.
Start discovering podcast insights today
Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.
No credit card required • 7-day trial • Cancel anytime
