Building reliable and proactive agentic systems at scale: how Shopify’s reflexive AI culture was instrumental in their development of Sidekick w/ Andrew McNamara #258

The Engineering Leadership Podcast37mMay 12, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Building reliable and proactive agentic systems at scale: how Shopify’s reflexive AI culture was instrumental in their development of Sidekick w/ Andrew McNamara #258” inside PodZeus.

AI-Generated Summary

At Shopify, 95% of employees now use AI daily, creating a 'reflexive' culture where AI is embedded in every stage of product development—from prototyping to deployment. This cultural shift, powered by a commitment to hiring early talent and building at the lowest possible level, enabled the creation of Sidekick, an AI co-founder that acts as a business mentor for millions of merchants. The key to its success lies not in complex multi-agent architectures, but in a simple orchestration model with specialized sub-agents, like a fine-tuned GraphQL tool that outperforms Opus 4.7. Central to this system is a constantly evolving 'ground truth set' of human-rated conversations, which serves as the foundation for AI judges that enable auto-research, prompt optimization, and scalable evaluation. Sidekick Pulse, the proactive variant, uses background research and high-reasoning loops to deliver actionable insights—like fixing US shipping misconfigurations—without user input. Reliability is ensured through model-agnostic, cloud-agnostic redundancy and a strict 'show intelligence within a second' principle that balances deep thinking with responsiveness. The entire platform is built on one core mantra: keep building.

Key Takeaways
1

Build tools at the lowest level possible—like a natural language to GraphQL converter—to unlock infinite use cases without adding hundreds of specialized tools.

2

A constantly evolving ground truth set with human-rated conversations enables AI judges that power auto-research, prompt optimization, and scalable evaluation.

3

Sidekick Pulse uses 40–60 minute background research loops to deliver proactive, high-reasoning insights—like fixing shipping misconfigurations—without user input.

4

Reliability for business-critical functions is achieved through model- and cloud-agnostic redundancy, automatic failover, and a 'show intelligence within a second' response principle.

5

Sub-agents like the GraphQL tool are fine-tuned for extreme specialization, outperforming general models like Opus 4.7, but only return data—not natural language—to prevent hallucination.

…and 3 more takeaways available in PodZeus

Chapters
0:00
2 min

Sponsor: Unblocked’s Context Engine for Agentic Coding

Unblocked’s Dennis Pilarinos explains how a context engine gives AI agents organizational knowledge—like code conventions, decisions, and team norms—so they don’t waste tokens or time on inefficient, context-free reasoning.

2:00
3 min

The Reflexive AI Culture at Shopify

Everyone at Shopify is high agency. The interns are coming in and they're also high agency doing incredible things.

Highlight
5:00
5 min

From Vision to Sidekick: The AI Co-Founder

You have the comfort to ask it anything without worrying about judgment, without worrying about whether it's an embarrassing question or not.

Highlight
10:00
5 min

Simpler Architecture Wins: The Orchestration Model

The key is actually that less is more. Like simple architecture has gone a long way.

Highlight
15:00
5 min

Evals Are the Foundation: The Ground Truth Set

Evals is the most important thing for any system. And our goal is to build a system that on that ground true set is indistinguishable from a human.

Highlight
High-Impact Quotes
our GraphQL subagent is actually at the task of turning natural language into GraphQL. It is outperforming Opus 4 .7. That's
Andrew McNamara24:41
Viral: 90.0
Within a second, a second and a half. If that intelligence is being shown... like what's happening. I mean,
Andrew McNamara35:31
Viral: 86.0
You have the comfort to ask it anything without worrying about judgment, without worrying about whether it's an embarrassing question or not.
Andrew McNamara11:18
Viral: 85.0

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “Building reliable and proactive agentic systems at scale: how Shopify’s reflexive AI culture was instrumental in their development of Sidekick w/ Andrew McNamara #258” inside PodZeus.

Start discovering podcast insights today

Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.

No credit card required • 7-day trial • Cancel anytime