[arXiv]score: 0.48
AIS: Adaptive Importance Sampling for Quantized RL
May 15, 2026
Proposes Adaptive Importance Sampling (AIS) to mitigate rollout-training mismatch in LLM RL when using low-precision (FP8) rollouts with BF16 trainers, preventing policy gradient bias and training collapse on reasoning benchmarks.
stat.MLcs.AIcs.LG