AI Model Outputs Images as Editable Photoshop Layers

Adobe Research has released LayerDiffuse, an open-source diffusion model that generates images with transparent backgrounds and separate layers, outputting content that can be directly imported into Photoshop or other editing software. The model eliminates the manual process of layer separation and background removal that typically follows AI image generation.

Performance

LayerDiffuse operates as an extension to Stable Diffusion 1.5, adding layer-aware generation capabilities without requiring a complete model retrain. The system generates images at standard SD1.5 resolution (512x512 base) with transparent backgrounds in approximately 20-30 seconds on consumer GPUs.

The model produces RGB-A (alpha channel) outputs where foreground subjects maintain clean edge detail and transparency information. Testing shows the alpha channel quality rivals manual masking in most scenarios, particularly for subjects with complex edges like hair, fur, or translucent materials. Generation speed remains comparable to standard Stable Diffusion workflows since the layer decomposition happens within the same inference pass.

For multi-layer generation, LayerDiffuse can separate a scene into 2-4 distinct layers based on depth or semantic content. A prompt like “cat sitting on a wooden table in a garden” produces separate layers for the cat, table, and background environment. Each layer maintains proper transparency and can be rearranged or edited independently.

The model’s layer separation accuracy depends heavily on prompt clarity. Well-defined subjects with spatial relationships (“foreground,” “background,” “behind”) yield cleaner layer divisions than ambiguous compositions.

Architecture

LayerDiffuse modifies the Stable Diffusion UNet architecture by introducing a transparency-aware latent space. Traditional diffusion models encode images into latent representations that assume opaque backgrounds. LayerDiffuse extends this latent space to include alpha channel information, allowing the model to learn transparency during the denoising process.

The architecture adds specialized attention layers that predict both RGB color values and alpha transparency simultaneously. These layers learn to distinguish foreground subjects from backgrounds by analyzing semantic boundaries within the latent representation. The model trains on datasets of pre-separated layer compositions, learning the relationship between prompt descriptions and layer hierarchies.

A key innovation involves the latent encoding strategy. Instead of encoding transparency as a fourth channel (which would require retraining the entire VAE), LayerDiffuse uses a “latent transparency” technique that represents alpha information within the existing three-channel latent space. This approach maintains compatibility with existing SD1.5 checkpoints and LoRA models.

The decoder network reconstructs the final image with separate alpha channels for each identified layer. Users can specify layer count through prompt modifiers or generation parameters, with the model automatically determining optimal layer boundaries based on scene composition.

Code and weights are available at https://github.com/layerdiffusion/LayerDiffuse with integration support for ComfyUI and Automatic1111 WebUI.

Hardware Requirements

LayerDiffuse runs on the same hardware as Stable Diffusion 1.5, requiring a minimum of 6GB VRAM for basic single-layer generation. Multi-layer outputs (3-4 layers) increase memory usage to approximately 8-10GB VRAM due to additional alpha channel processing.

CPU-only inference remains possible but extends generation time to 3-5 minutes per image. The model benefits from GPU acceleration proportionally - RTX 3060 cards complete generations in 25-30 seconds, while RTX 4090 cards reduce this to 8-12 seconds.

RAM requirements sit at 16GB minimum for stable operation, with 32GB recommended for batch processing or multi-layer workflows. The model’s file size is approximately 2GB, similar to standard SD1.5 checkpoints.

Alternatives

Photoshop’s generative fill and Adobe Firefly include built-in background removal but don’t generate content with native layer separation. Users must generate opaque images then apply separate masking operations.

Clipdrop and Remove.bg offer post-generation background removal with API access, but these tools work on existing images rather than generating layer-aware content from the start. Processing time adds 5-10 seconds per image.

Midjourney and DALL-E 3 generate opaque images only. Third-party tools like Photopea or Photoshop’s “Remove Background” feature can extract subjects, but quality varies with edge complexity.

Stable Doodle and ControlNet approaches allow some layer-based composition through multiple generation passes, but require manual masking and compositing. LayerDiffuse consolidates this into a single generation step.

For developers needing programmatic layer generation, the LayerDiffuse API provides more control than post-processing pipelines, particularly for applications requiring consistent layer structure across multiple generated images.

AI Generates Images as Editable Photoshop Layers

AI Model Outputs Images as Editable Photoshop Layers

Performance

Architecture

Hardware Requirements

Alternatives

Related Tips

Qwen-Image-2512 Tops Open-Source AI Vision Rankings

DiffSynth-Studio Integrates Custom LoRA Models

Qwen Image Edit 2511: Editing 10 People at Once