✨ Diffusers now officially support PAG✨
Qualitative comparisons between unguided (baseline) and perturbed-attention-guided (PAG)
diffusion samples.
Without any external conditions, e.g., class labels or text prompts,
or additional training,
our PAG dramatically elevates the quality of diffusion samples even in
unconditional generation,
where classifier-free guidance (CFG) is inapplicable. Our guidance can also enhance the baseline
performance in various downstream tasks such as ControlNet with empty prompt
and image restoration such as inpainting and deblurring.
Abstract
Recent studies prove that diffusion models can generate high-quality samples, but their quality is
often highly reliant on sampling guidance techniques such as classifier guidance (CG) and
classifier-free guidance (CFG), which are inapplicable in unconditional generation or various
downstream tasks such as image restoration. In this paper, we propose a novel diffusion sampling
guidance, called Perturbed-Attention Guidance (PAG), which improves sample quality across
both unconditional and conditional settings, achieving this without requiring further training or
the integration of external modules. PAG is designed to progressively enhance the structure of
synthesized samples throughout the denoising process by considering the self-attention mechanisms'
ability to capture structural information. It involves generating intermediate samples with degraded
structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix,
and guiding the denoising process away from these degraded samples.
Guided Diffusion Results
← Hover over the image to compare results! (baseline vs ours)→
Unconditional Generation
Conditional Generation
Stable Diffusion Results
Unconditional Generation
Conditional Generation (CFG vs CFG + Ours)
Overall Framework
Conceptual comparison between CFG and PAG. CFG employs
jointly trained unconditional model as the undesirable path, whereas PAG utilizes
perturbed self-attention for the same purpose. \(\mathbf{A}_t\) corresponds to the
self-attention map. In PAG, we perturb this by replacing with an
identity matrix \(\mathbf{I}\).
🧨Diffusers Pipelines and Community Implementation for GUI Interfaces
Thanks to the exceptional efforts of @v0xie, @pamparamm and @multimodalart, you can now easily incorporate PAG into your custom pipelines or workflows.
🧨Diffusers official implementations: https://huggingface.co/docs/diffusers/main/en/using-diffusers/pag
🧨Diffusers pipeline for SD: hyoungwoncho/sd_perturbed_attention_guidance
🧨Diffusers pipeline for SDXL: multimodalart/sdxl_perturbed_attention_guidance
🧨Diffusers pipeline collections (ControlNet, image-to-image, inpainting, etc):
https://huggingface.co/collections/multimodalart/perturbed-attention-guidance-pipelines-663dd1e27bfb4527c197bd41
SD WebUI (Automatic1111) extension: v0xie/sd-webui-incantations
ComfyUI node / SD WebUI Forge extension: pamparamm/sd-perturbed-attention
Try a demo for SDXL here!
🧨Diffusers official implementations: https://huggingface.co/docs/diffusers/main/en/using-diffusers/pag
🧨Diffusers pipeline for SD: hyoungwoncho/sd_perturbed_attention_guidance
🧨Diffusers pipeline for SDXL: multimodalart/sdxl_perturbed_attention_guidance
🧨Diffusers pipeline collections (ControlNet, image-to-image, inpainting, etc):
https://huggingface.co/collections/multimodalart/perturbed-attention-guidance-pipelines-663dd1e27bfb4527c197bd41
SD WebUI (Automatic1111) extension: v0xie/sd-webui-incantations
ComfyUI node / SD WebUI Forge extension: pamparamm/sd-perturbed-attention
Try a demo for SDXL here!
Citation
❤️ If you find our work useful in your reasearch, please cite our work! :)@article{ahn2024selfrectifying, title={Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance}, author={Donghoon Ahn and Hyoungwon Cho and Jaewon Min and Wooseok Jang and Jungwoo Kim and SeonHwa Kim and Hyun Hee Park and Kyong Hwan Jin and Seungryong Kim}, journal={arXiv preprint arXiv:2403.17377}, year={2024} }
Acknowledgements
The website template was borrowed from Michaël Gharbi.