Back to Graph
researchcompleteMarch 2025

Nanochat Experiments

Training LLMs from scratch on domain data

Nanochat Experiments visualization

What

Training LLMs from scratch on domain data

How

DGX SparkH100PyTorchnanochat

Why

The Lab

Supporting: Solid production work

Overview

Systematic comparison of domain-specific vs general training data. MedGen (PubMed general) vs OncoSpec (oncology-specific) demonstrated that 10K focused examples outperform 500K general for specific tasks.

Technologies

DGX SparkH100PyTorchnanochat

Narrative Framings

Related Writing