[arXiv]score: 0.44

MRBT Combines Behavior Trees and LLMs for Modular RL Reward Shaping

May 26, 2026

Masking Reward Behavior Trees (MRBT) use LLM-generated, SMT-solver-verified symbolic structures to automate reward shaping and action masking in RL, improving reactivity to subtask failure and generalization across varying task objects.

cs.LG

HOW THIS AFFECTS YOU

●

builderWorth watching as a pipeline for automating RL reward design in compositional robotics or agent tasks without hand-crafting per-object reward functions.

●

researcherMRBT offers a verifiable, modular alternative to purely LLM-based reward shaping with formal correctness guarantees via SMT solving.

SOURCE

https://arxiv.org/abs/2605.05795

← back to feed