[arXiv]score: 0.12
RapTB and SubM Address Prefix Collapse and Replay Bias in GFlowNet LLM Fine-Tuning
May 29, 2026
GFlowNet fine-tuning of LLMs suffers from prefix collapse and length bias due to weak early credit assignment and biased replay. RapTB anchors subtrajectory supervision at the root with absorbed suffix backups for dense prefix signals; SubM uses submodular replay to balance reward and diversity, with gains shown on SMILES-based molecule generation.
cs.LGcs.AI
HOW THIS AFFECTS YOU
●
researcherRapTB+SubM is a drop-in training improvement for reward-proportional LLM fine-tuning tasks where mode diversity matters.
●
healthDemonstrated improvements on molecule generation with SMILES strings are directly relevant to LLM-guided drug discovery pipelines.