|
|
|
|
|
|
|
|
|
Qualitative comparisons of images generated without guidance (top) and with our depth-aware guidance (DAG), and their corresponding estimated depths and surface normals. Note that we highlight the scene layouts on the generated images. Our DAG helps better generate geometrically plausible images compared to the baseline. |
Diffusion models have recently shown significant advancement in the generative models with their impressive fidelity and diversity. The success of these models can be often attributed to their use of sampling guidance techniques, such as classifier or classifier-free guidance, which provide effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders their application to downstream tasks such as scene understanding that require a certain level of depth awareness. To overcome this limitation, we propose a novel sampling guidance method for diffusion models that uses self-predicted depth information derived from the rich intermediate representations of diffusion models. Concretely, we first present a label-efficient depth estimation framework using internal representations of diffusion models. Subsequently, we propose the incorporation of two guidance techniques during the sampling phase. These methods involve using pseudo-labeling and depth-domain diffusion prior to self-condition the generated image using the estimated depth map. Experiments and comprehensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward the generation of geometrically plausible images. |
| |
First of all, we train asymmetric pixel-wise depth predictor conditioned on a timestep with respect to pretrained diffusion models in a label-efficient manner. Then, we apply two guidance strategies. First, we extract strong and weak depth maps with this predictor from DDPM network and give a depth consistency guidance. Next, giving the depth map from the strong branch as an input, the pretrained depth diffusion model is utilized to push the model prior into intermediate images to be depth-aware. |
| |
We visualize (top) the samples without guidance ((a), (c), (e), (g)) and with depth-aware guidance ((b), (d), (f), (h)), and their corresponding depths (middle) and surface normals (bottom). |
| |
| |
| |
| |
We compare the results generated from the baseline without (left) and with (right) our guidance by showing images and transforming them into point cloud visualizations. |
| |
The first row is unguided samples from DDIM, and the second row is guided samples using our guidance method, DAG. |
|
|
|
We show quantative results on LSUN Bedroom(left), and LSUN Church(right). dFID denotes the FID score using the estimated depth image. The metrics are computed with 5,000 generated samples. Best results are in bold. |
G. Kim, W. Jang, G. Lee, S. Hong, J. Seo, S. Kim DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models. Hosted on ArXiv |
Acknowledgements |