Datadog Stories
Datadog is where many teams first notice a problem and sometimes where they realize they were watching the wrong thing. These stories cover alert quality, telemetry gaps, and the cost of bad observability assumptions.
2 stories
🔄 Culture Change
What Changed When We Finally Put a Real On-Call Rotation in Place
👤 @sam-runs-prodSaaS2023
“For our first year, "on-call" meant the founder who happened to still have Slack open. We were a seven-person engineering team, we deployed straight from main, and we wore our lack...”
PagerDutyDatadogOn-CallIncident Response+1
⚡ Incident ReportBlack Friday, One Missing Index, and 53 Minutes of Checkout Pain
👤 @sarah-oncalle-commerce2024
“I was the primary on-call engineer for a mid-size e-commerce company doing roughly 50k orders a day outside of peak season. Checkout lived in a Node.js monolith on ECS with Postgre...”
PostgreSQLAWSDatadogIncident Response+1