Back to Graph
researchcompleteMarch 2025
Nanochat Experiments
Training LLMs from scratch on domain data

◆ What
Training LLMs from scratch on domain data
◆ How
DGX SparkH100PyTorchnanochat
◆ Why
The Lab
Supporting: Solid production work
Overview
Systematic comparison of domain-specific vs general training data. MedGen (PubMed general) vs OncoSpec (oncology-specific) demonstrated that 10K focused examples outperform 500K general for specific tasks.
Technologies
DGX SparkH100PyTorchnanochat