[HUGGINGFACE]score: 0.42
GENEB: Why Genomic Models Are Hard to Compare
June 2, 2026
Frozen representations from 40 genomic foundation models evaluated across 100 tasks in 13 functional categories reveal that aggregate leaderboard rankings are unstable — model rankings shift sharply across task categories, and scale yields only modest, inconsistent gains. GENEB uses a unified probing protocol including few-shot regimes to enable controlled comparisons across architecture, tokenization, and pretraining data choices.