[arXiv]score: 0.22

Post-Training Recipe, Not Model Family, Drives Multi-Agent LLM Behavior Diversity

June 23, 2026

Across a 940,000-chain corpus with 11 checkpoints and a 1.6M-chain Llama factorial, post-training recipe predicts conversational behavior better than model family in multi-LLM systems. A reasoning-distilled Llama checkpoint shifts hedging behavior by 18% based on its same-base partner — exceeding any cross-family gap measured. This undermines the common design heuristic of mixing model families for behavioral diversity in multi-agent pipelines.

HOW THIS AFFECTS YOU

●

builderYou should audit multi-agent system designs that assume cross-family model diversity ensures behavioral variance — post-training recipe is the stronger lever.

●

researcherThe 940K-chain corpus and hedging metric provide a validated framework for studying interactive multi-LLM behavior beyond offline preference studies.

read original ↗arxiv.org

← back to feed