Running a prompt in the same chat used to create it biases the model toward validating its own prior output rather than executing the task cleanly. The practical advice: generate in one session, evaluate in a separate fresh context with a QA prompt. No benchmark data supports the claim, but the context-contamination intuition aligns with how system and conversation history influence generation.
HOW THIS AFFECTS YOU
●
builderWorth testing in prompt development workflows — isolating generation and evaluation into separate context windows is a low-cost practice that may reduce self-reinforcing output bias.