[X]score: 0.63
Jasper Releases 104M-Sample Apache 2.0 Image-Text Dataset for T2I Research
May 28, 2026
Jasper open-sourced MONET, a deduped and recaptioned 105M image-text pair dataset under Apache 2.0, alongside Nano T2I, a codebase for training text-to-image models from scratch. Both are available on Hugging Face, making this one of the largest openly licensed T2I training datasets.
HOW THIS AFFECTS YOU
●
builderYou can train or fine-tune your own text-to-image models using a permissively licensed 105M-sample dataset with the accompanying Nano T2I training codebase.
●
researcherYou can now reproduce and benchmark T2I training at scale without licensing restrictions, using a clean recaptioned dataset comparable in size to proprietary ones.