Temporal Difference Signals Enable Self-Supervised Visual Learning Without Augmentations
June 13, 2026
Visual representation learning using temporal differences between video frames removes the need for augmentations, masking, or cropping as inductive biases. Experiments show optimal inductive bias strength decreases as data scale grows, suggesting augmentation-free methods will outperform at sufficient scale.
HOW THIS AFFECTS YOU
●
researcherThe empirical finding that inductive bias strength should decrease with data scale provides a principled argument for augmentation-free SSL as a long-term research direction.