Research & Production

LLM Fine-tuning

I fine-tune LLMs for domain-specific tasks when off-the-shelf models don't cut it. Here's my hands-on experience with different techniques.

Techniques I've Used

SFT (Supervised Fine-Tuning)

Teaching models specific response formats and domain knowledge. My most common approach.

QLoRA

When I need to fine-tune on my RTX 5090. 4-bit quantization + LoRA makes 70B models trainable.

DPO

Aligning model outputs to preferences without reward model complexity. Cleaner than RLHF.

Full Fine-tuning

For fundamental behavior changes. Expensive but sometimes necessary.

Models I've Fine-tuned

Llama 3.1 8BLlama 3.1 70BQwen2.5Gemma 2MistralPhi-3

My Training Setup

Hardware

  • RTX 5090 32GB (local)
  • RunPod A100s (larger runs)

Tools

  • Unsloth (fast QLoRA)
  • TRL (DPO, SFT trainers)
  • Axolotl (complex configs)

Technology Stack

UnslothTRLPEFTbitsandbytesAxolotlPyTorchWandBHuggingFace Hub

Expertise by Sumit Chatterjee

Industrial Light & Magic, Sydney

Back to Portfolio