Talk
Virtual
When software platforms are up... but workflows still break
Systems don’t have to be down to fail. This case study from the world of security operations shows how observability gaps in workflows increased response time despite healthy dashboards, and what platform engineering teams can learn from it.
CEST
Meet the speakers
In this talk, Abhimanyu Narwal shares a case study from security operations to explore how observability practices apply to complex, cross-product workflows. Instead of outages, he focuses on partial degradations: systems remain technically available, yet reduced clarity and missing signals increase the time teams spend reconstructing context and coordinating responses, without these issues appearing in traditional health metrics. Through anonymized examples, he revisits tracing events across systems at the workflow level, measuring response latency over uptime, and recognizing user behavior as an early indicator of friction. Attendees will learn how to identify partial failures sooner, design platforms that fail visibly, and reduce uncertainty under pressure.