[r/MachineLearning]score: 0.07

Nanochat vs Llama for training from scratch? [P]

April 24, 2026

**Summary:** A practitioner training a domain-specific LLM entirely on historical data is evaluating whether to migrate from the NanoChat framework to the LLaMA architecture with Hugging Face Transformers for their next pretraining run. The core issue is that the latest NanoChat version does not produce Transformers-compatible model outputs, creating interoperability barriers for open-source distribution. For practitioners building custom pretrained models from scratch, this highlights a real tradeoff between NanoChat's training conveniences (e.g., auto-scaling `--depth` parameter) and the broader ecosystem compatibility that the LLaMA architecture + Transformers `Trainer` class provides for downstream accessibility.

project

SOURCE

https://www.reddit.com/r/MachineLearning/comments/1su5i3z/nanochat_vs_llama_for_training_from_scratch_p/

← back to feed