OncoVLM: Domain-Specific Foundation Models
Back to Graph

Case Study

OncoVLM: Domain-Specific Foundation Models

Proving that focused training beats raw scale

Training oncology-specific multimodal models that outperform larger general-purpose models. Multi-teacher knowledge distillation at three scales: 4B, 1.7B, and 500M parameters.

92.4%
PubMedQA
3
Model Sizes
$0
Training Cost
3
Teacher Models

!The Scale Assumption

The AI field assumed bigger models were always better. But for specialized domains like oncology, general-purpose 70B models often missed domain-specific nuances that smaller, focused models could capture.

  • General models lacking oncology-specific knowledge
  • Expensive inference costs for large models
  • No multimodal understanding of pathology/radiology
  • Hallucinations in clinical contexts

Multi-Teacher Distillation

Instead of training one massive model, we distilled knowledge from multiple specialized teachers into smaller, focused students optimized for oncology tasks.

  • Three model scales: 4B, 1.7B, 500M parameters
  • Multi-teacher distillation from MedGemma, GPT-OSS-20B, Qwen-3-30B
  • Multimodal: pathology images, radiology, clinical text
  • LoRA fine-tuning for parameter efficiency

Architecture

The training pipeline uses DGX Spark's 128GB VRAM for full-batch training with automated experiment tracking.

Teachers
MedGemmaGPT-OSS-20BQwen-3-30B
Distillation
Knowledge ExtractionResponse AlignmentQuality Filtering
Student Models
4B Flagship1.7B Balanced500M Edge
Infrastructure
DGX SparkNGC ContainersAutonomous Researcher

Timeline

Mar 2025
Nanochat experiments validate focused training
Apr 2025
Initial OncoVLM architecture design
May 2025
4B model training complete
Jun 2025
1.7B and 500M variants trained
Jul 2025
92.4% PubMedQA achieved

Key Lessons

1.

10K focused examples can outperform 500K general examples

2.

Multi-teacher distillation captures complementary strengths

3.

Personal GPU infrastructure enables research-grade experiments

4.

Smaller models can beat larger ones on specialized tasks

Tech Stack

PyTorchGemmaPaliGemmaLoRADGX Spark