Research & Production
LLM Fine-tuning
I fine-tune LLMs for domain-specific tasks when off-the-shelf models don't cut it. Here's my hands-on experience with different techniques.
Techniques I've Used
SFT (Supervised Fine-Tuning)
Teaching models specific response formats and domain knowledge. My most common approach.
QLoRA
When I need to fine-tune on my RTX 5090. 4-bit quantization + LoRA makes 70B models trainable.
DPO
Aligning model outputs to preferences without reward model complexity. Cleaner than RLHF.
Full Fine-tuning
For fundamental behavior changes. Expensive but sometimes necessary.
Models I've Fine-tuned
Llama 3.1 8BLlama 3.1 70BQwen2.5Gemma 2MistralPhi-3
My Training Setup
Hardware
- → RTX 5090 32GB (local)
- → RunPod A100s (larger runs)
Tools
- → Unsloth (fast QLoRA)
- → TRL (DPO, SFT trainers)
- → Axolotl (complex configs)
Technology Stack
UnslothTRLPEFTbitsandbytesAxolotlPyTorchWandBHuggingFace Hub
Expertise by Sumit Chatterjee
Industrial Light & Magic, Sydney