SE Radio 719: Birol Yildiz on Building an Agentic AI SRE

Software Engineering Radio - the podcast for professional software developers53mMay 6, 2026

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “SE Radio 719: Birol Yildiz on Building an Agentic AI SRE” inside PodZeus.

AI-Generated Summary

In this episode of Software Engineering Radio, host Kanchan interviews Birol Yildiz, CEO and co-founder of Islet, a SaaS company building an AI-powered SRE (Incident Response) system called AI SRE. The conversation dives deep into the architecture, evolution, and philosophy behind creating an agentic AI system that autonomously performs root cause analysis for production incidents. Yildiz explains how the team evolved from a rigid, prescriptive approach using complex workflows and vector databases to a minimalist, agentic model that relies on reasoning loops and 'agentic search'—using simple command-line tools like grep, jq, and bash to query data without overloading context. The AI SRE is designed to complete root cause analysis in under four minutes, a dramatic improvement over manual processes that can take 10 to 60 minutes. The discussion covers key components: orchestration, knowledge layer (using plain-text long-term memory instead of vector databases), evaluation via semantic tests and LLM judges, and the use of sub-agents and forks to manage context. A real-world example illustrates how the AI SRE diagnosed a self-inflicted incident caused by an overly broad network policy during a penetration test, demonstrating its ability to handle ambiguous, novel problems beyond standard runbooks. The episode also explores guardrails, autonomy, GDPR compliance, and the future of AI agents, with Yildiz cautioning against over-engineering and advocating for simplicity, full context control, and letting the model decide the 'how' while humans define the 'what'.

Key Takeaways
1

Agentic AI systems should focus on the 'what' (goal) and let the model decide the 'how' (execution), avoiding over-prescriptive scaffolding.

2

Use 'agentic search'—command-line tools like grep, jq, and bash—to analyze large datasets without polluting context, outperforming vector databases in many cases.

3

Prioritize full control over context: avoid frameworks like LangChain and MCP servers unless forked and customized to your use case.

4

Evaluate AI agents using real-world semantic tests and LLM judges, not just synthetic benchmarks, to ensure robustness.

5

Guardrails are critical: start with human-in-the-loop, use pre-approved actions, and implement hard rules to prevent destructive commands.

…and 2 more takeaways available in PodZeus

Chapters
0:00
1 min

Introduction to AI SRE and Birol Yildiz

Host Kanchan introduces the episode and guest Birol Yildiz, CEO of Islet, a Cologne-based SaaS company building an AI SRE for incident response. The focus is on how AI agents are transforming production incident resolution.

1:00
2 min

Defining Agentic AI: Beyond Workflows

Yildiz distinguishes true AI agents from automated workflows, emphasizing that agentic AI uses reasoning loops to make independent decisions, not just follow pre-defined scripts.

3:00
3 min

Evolution of the AI SRE: From Simulation to Reasoning

The team initially tried simulating human behavior with browser automation but pivoted to reasoning models and the Model Context Protocol (MCP) in 2024, leading to a more flexible, agentic system.

6:00
4 min

Agentic Search: The Power of CLI Tools

Agentic search is a fancy way of just using old school terminal commands, grep, z, jq, yeah.

Highlight
10:00
5 min

Architecture: Orchestrator, Knowledge, and Sub-agents

The AI SRE uses an orchestrator service, a lightweight knowledge layer (plain text memory), and dynamic sub-agents/forks to manage context and scale analysis during incidents.

High-Impact Quotes
The more we hand this task over to agents, there will be incidents that are novel in the sense that whatever contributed to that incident was maybe due to the fact that there is a large amount of code being generated by AI.
Birol Yildiz51:11
Viral: 92.0
This would never made it into a runbook. No runbook would tell you that when you have a penetration test and this happens, here's the solution.
Birol Yildiz31:42
Viral: 90.0
Benchmark it against Cloud Code, right? Just try to create a similar environment for Cloud Code... If you perform a lot better than Cloud Code, then you know that's probably something is, there is a reason for being right.
Birol Yildiz50:24
Viral: 88.0
Speakers

Host

Kanchan

Guest

Birol Yildiz
Topics Discussed
agentic ai95%incident response90%ai agent architecture88%agentic search85%ai safety and guardrails82%context management80%ai evaluation and testing78%ai-generated code risks75%
People & Brands

AI SRE

product

25xPositive

iLert

organization

18xPositive

Birol Yildiz

person

15xPositive

Islet

organization

12xPositive

Model Context Protocol

other

6xPositive

Cloud Code

product

5xPositive

OpenAI

organization

5xNeutral

Kubernetes

other

4xNeutral

Cursor

product

4xPositive

GitHub

other

4xNeutral

Get the full intelligence

Search transcripts, export clips, track mentions, and explore all topics from “SE Radio 719: Birol Yildiz on Building an Agentic AI SRE” inside PodZeus.

Start discovering podcast insights today

Start with a 7-day trial and explore a growing catalog of popular podcasts. No credit card required.

No credit card required • 7-day trial • Cancel anytime