Data filtering may be suboptimal for large model pretraining | HACKOBAR_