DiffSynth-Studio Expands LoRA Training Support

DiffSynth-Studio is an open-source diffusion model engine developed and maintained by the ModelScope Community. The project describes itself as focused on aggressive technical exploration aimed at academia, and it provides LoRA training support across a wide range of image and video diffusion models. The repository is available at https://github.com/modelscope/DiffSynth-Studio and is published under the Apache-2.0 license.

LoRA, short for low-rank adaptation, is a method for fine-tuning large models by training a small set of additional weights rather than updating the entire model. This makes it possible to teach a base model new styles, characters, or concepts without the cost of full retraining.

Models and Training Methods

According to the project README, LoRA training is available across many supported models. Stable Diffusion v1.5 and Stable Diffusion XL are included, with the README noting that SDXL support was reinstated for academic research. Other supported models span the Z-Image family, Qwen-Image, Wan 2.2, several FLUX variants, ACE-Step-1.5, LTX-2, HiDream-O1-Image, and others. The documentation states that its training features are compatible with all models.

The framework includes several training techniques aimed at reducing hardware requirements. CPU offload training moves model weights layer by layer between CPU and GPU, which the README says significantly reduces GPU VRAM usage during training. This option is enabled by adding the --enable_model_cpu_offload flag and currently runs on a single GPU. FP8 training can be applied to any non-training portion of a model, meaning weights with gradients turned off or weights that only affect LoRA parameters.

The project also offers differential LoRA training, a technique it used in ArtAug that is now available for LoRA training of any model. A split training mode automatically separates the process into data-processing and training stages, which the README associates with faster speed and lower VRAM requirements.

Image-to-LoRA

Beyond conventional training, DiffSynth-Studio explores generating LoRAs directly from images. The Image-to-LoRA approach takes an image as input and outputs a LoRA. The project describes a goal of compressing the hours-long training process for image style LoRAs into a single model inference step. Released Image-to-LoRA models include versions for Z-Image, FLUX.2-klein-base-4B, and HiDream-O1-Image.

Where It Fits

For researchers and developers working with open-source diffusion models, DiffSynth-Studio centralizes LoRA training across a broad set of architectures within one framework. The emphasis on memory-saving methods such as CPU offload and FP8 training points toward making custom model fine-tuning workable on more limited hardware. Because the project targets technical exploration, the supported models and features continue to change, so the repository README and documentation remain the most reliable reference for current capabilities.

DiffSynth-Studio Expands LoRA Training Support

DiffSynth-Studio Expands LoRA Training Support

Models and Training Methods

Image-to-LoRA

Where It Fits

Related Tips

Amazon Connect to Teams: AI-First Support Integration

MiniCPM5-1B Runs AI Models on Older Smartphones

NVIDIA AI-Q Blueprints on Oracle Cloud Deploy