Tokenization Strategy Shapes Scientific Foundation Model Quality More Than Architecture
June 23, 2026
Across 640,000 galaxy images from DESI Legacy Survey using a shared AstroPT backbone, VQ-VAE tokenization best predicts physical galaxy properties while JetFormer achieves higher reconstruction fidelity — showing reconstruction quality and representation quality are dissociated. Affine and AIM better preserve morphological detail, meaning tokenization choice must match downstream task.
HOW THIS AFFECTS YOU
●
researcherDirectly informs tokenization decisions for scientific vision transformers — VQ-VAE for property prediction, JetFormer for reconstruction tasks.