Structured expert pruning in MoE models preserves in-domain biomedical utility at moderate ratios but increases hallucination risks at extreme levels. The study evaluates four models and six pruning methods, highlighting the tension between inference speedups and factual reliability in high-stakes domains.
HOW THIS AFFECTS YOU
●
researcherYou should carefully monitor hallucination rates when applying aggressive pruning to MoE architectures.
●
healthYou must avoid extreme model pruning in clinical settings to prevent increased factual unreliability.