Unsloth Speeds Up Embedding Fine-Tuning 3x

Someone found that Unsloth now supports embedding model fine-tuning and it’s surprisingly fast - runs 1.8-3.3x faster than standard setups while using 20% less VRAM.

Most models need only 3GB VRAM for 4bit QLoRA training, which means fine-tuning embeddings for RAG systems is now actually doable on budget GPUs. The setup is pretty straightforward:


model = FastSentenceTransformer.from_pretrained(
 model_name = "unsloth/embeddinggemma-300m",
 max_seq_length = 1024,
 full_finetuning = False,
)

Works with ModernBERT, Qwen Embedding, BGE, and most other embedding models. After training, models export to transformers, LangChain, Ollama, or llama.cpp.

There’s a free Colab notebook to try it on a T4 GPU. Update with pip install --upgrade unsloth unsloth_zoo to

Unsloth Speeds Up Embedding Fine-Tuning 3x

Related Tips

Nvidia's DMS Cuts LLM Memory Usage by 8x

Unsloth Kernels: 12x Faster MoE Training, 12GB VRAM

Unsloth Kernels: Fine-Tune 30B MoE on Consumer GPUs