How to get multiple agents to play nice at scale
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “How to get multiple agents to play nice at scale” inside PodZeus.
In this episode of The Stack Overflow Podcast, host Ryan Donovan explores the challenges and strategies behind orchestrating multiple AI agents at scale, featuring Stephen Kalesha and Chase Ruzin from Intuit. The conversation delves into how Intuit has evolved its AI infrastructure over years, leveraging foundational platforms like GenOS to build a composable, enterprise-grade agentic system. Rather than relying on isolated, domain-specific agents, Intuit has transitioned to a skills-and-tools-based architecture with a central planner, enabling cross-domain problem solving and reducing the need for users to navigate multiple interfaces. The team emphasizes evaluation rigor—using offline, online, and human evaluations—to ensure accuracy, especially in sensitive financial contexts. They also discuss managing token costs, latency, system reliability, and the importance of observability in a fast-moving AI landscape. The episode concludes with a vision of AI that performs 'done-for-you' work, freeing users to focus on higher-level decisions. Key takeaways include: 1) A central planner with access to a unified skill and tool library enables better cross-domain coordination than isolated agents; 2) Rigorous evaluation—especially human-in-the-loop testing—is essential for trust in financial AI; 3) Observability and cost monitoring are critical in AI systems due to variable token usage; 4) Foundational platform investments (like GenOS) provide the agility needed to innovate rapidly; 5) The future of AI at scale lies in reducing user effort to near-zero through intelligent automation. The tone is optimistic and forward-looking, celebrating engineering progress while acknowledging real-world constraints.
Adopt a skills-and-tools architecture with a central planner to enable cross-domain problem solving across multiple AI agents.
Use a multi-layered evaluation strategy (offline, online, human) to ensure accuracy, especially in high-stakes domains like finance.
Leverage foundational platform investments (e.g., GenOS) to accelerate innovation and maintain system reliability at scale.
Prioritize observability and cost monitoring to manage token usage and system performance in AI-native applications.
Design for 'done-for-you' experiences where AI handles complex workflows autonomously, reducing user effort to near zero.
Introducing the Multi-Agent Challenge
Host Ryan Donovan welcomes Stephen Kalesha and Chase Ruzin from Intuit to discuss the complexities of orchestrating multiple AI agents at enterprise scale. The episode sets the stage by highlighting the shift from isolated agents to coordinated, cross-functional systems.
From Isolated Agents to a Central Planner
“Customers don't just ask like a question that should go to one agent or this agent or that agent, right? Very commonly they're getting cross-domain questions.”
The Role of Evaluation in AI Trust
“We want to infuse that with all of the amazing experts that work with us at Intuit. That is what's kind of giving us this upper hand when we're looking across the ecosystem.”
Managing Determinism and Cost at Scale
“The more input tokens, the more output tokens. That is going to impact the cost.”
The Future of AI: Done-for-You Work
“The ideal utopia is you come in and it's like, hey, the work's done for you. I think the technology is starting to move in that direction.”
“The ideal utopia is you come in and it's like, hey, the work's done for you. I think the technology is starting to move in that direction.”
“Customers don't just ask like a question that should go to one agent or this agent or that agent, right? Very commonly they're getting cross-domain questions.”
“We want to infuse that with all of the amazing experts that work with us at Intuit. That is what's kind of giving us this upper hand when we're looking across the ecosystem.”
Host
Guests
Intuit
organization
Stephen Kalesha
person
Chase Ruzin
person
Ryan Donovan
person
GenOS
product
LLM judges
other
QuickBooks
product
product
Stack Overflow Podcast
media
MCP
other
Seizing the means of messenger production
The Stack Overflow Podcast • 28m • 4/3/2026
He designed C++ to solve your code problems
The Stack Overflow Podcast • 33m • 4/7/2026
The messy truth of your AI strategies
The Stack Overflow Podcast • 31m • 4/10/2026
Who needs VCs when you have friends like these?
The Stack Overflow Podcast • 33m • 4/14/2026
No country left behind with sovereign AI
The Stack Overflow Podcast • 33m • 4/17/2026
Get the full intelligence
Search transcripts, export clips, track mentions, and explore all topics from “How to get multiple agents to play nice at scale” inside PodZeus.
Start discovering podcast insights today
Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.
No credit card required • 7-day trial • Cancel anytime
