●builderYou can pull the 4B model from Hugging Face today and get faster fine-tuning cycles for visual reasoning tasks without needing labeled answers.
●researcherContrastive evidence gating as a training signal removes the need for answer labels while dramatically cutting compute — worth examining the method against your RL or SFT baselines.