How to get multiple agents to play nice at scale

The Stack Overflow Podcast27mApril 22, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “How to get multiple agents to play nice at scale” inside PodZeus.

Search in PodZeus Start Free Trial

AI-Generated Summary

In this episode of The Stack Overflow Podcast, host Ryan Donovan explores the challenges and strategies behind orchestrating multiple AI agents at scale, featuring Stephen Kalesha and Chase Ruzin from Intuit. The conversation delves into how Intuit has evolved its AI infrastructure over years, leveraging foundational platforms like GenOS to build a composable, enterprise-grade agentic system. Rather than relying on isolated, domain-specific agents, Intuit has transitioned to a skills-and-tools-based architecture with a central planner, enabling cross-domain problem solving and reducing the need for users to navigate multiple interfaces. The team emphasizes evaluation rigor—using offline, online, and human evaluations—to ensure accuracy, especially in sensitive financial contexts. They also discuss managing token costs, latency, system reliability, and the importance of observability in a fast-moving AI landscape. The episode concludes with a vision of AI that performs 'done-for-you' work, freeing users to focus on higher-level decisions. Key takeaways include: 1) A central planner with access to a unified skill and tool library enables better cross-domain coordination than isolated agents; 2) Rigorous evaluation—especially human-in-the-loop testing—is essential for trust in financial AI; 3) Observability and cost monitoring are critical in AI systems due to variable token usage; 4) Foundational platform investments (like GenOS) provide the agility needed to innovate rapidly; 5) The future of AI at scale lies in reducing user effort to near-zero through intelligent automation. The tone is optimistic and forward-looking, celebrating engineering progress while acknowledging real-world constraints.

Key Takeaways

Adopt a skills-and-tools architecture with a central planner to enable cross-domain problem solving across multiple AI agents.

Use a multi-layered evaluation strategy (offline, online, human) to ensure accuracy, especially in high-stakes domains like finance.

Leverage foundational platform investments (e.g., GenOS) to accelerate innovation and maintain system reliability at scale.

Prioritize observability and cost monitoring to manage token usage and system performance in AI-native applications.

Design for 'done-for-you' experiences where AI handles complex workflows autonomously, reducing user effort to near zero.

Chapters