Diagrid reposted this
Your SRE agent isn’t production-ready if a pod restart wipes its memory. 🚀 Today we’re excited to announce the integration between HolmesGPT and Dapr - bringing durable execution, resilience, and production-grade reliability to AI-driven SRE workflows. Most AI agents today are fundamentally ephemeral: • A crash means lost context • Long-running investigations restart from scratch • Retries are manual and error-prone • Stateful execution *across* systems is damn near impossible By combining HolmesGPT’s operational intelligence with Dapr Workflows, teams can now build AI-powered SRE agents that: • Coordinate complex investigations across distributed systems • Automatically recover from failures • Resume execution from the exact point of interruption • Run reliably for hours or months • Add human approvals and governance into remediation flows • Operate with built-in durability and state management This moves SRE agents from "cool demo" into production-ready infrastructure. How do you run this? 👇 DaprWorkflowHolmesRunner(name="sre-agent").serve(port=5001) Try it out here: https://lnkd.in/gAHzcqEA