Our framework incorporates 3D awareness into the score distillation sampling (SDS) process through a 3D consistent noising, which induces consistency of the predicted 2D score. As a general plug-and-play module that can be attached to SDS-based text-to-3D generation baselines with little computation cost, it brings about highly enhanced view consistency and fidelity to 3D generation results across various baselines.

3D Consistency Enhancement Results


The incorporation of our GSD framework drastically enhances the geometric 3D consistency of generated scenes.

3D Consistent Integral Noising

To produce a geometry-aware, 3D consistent 2D noise map that preserves the Gaussian properties of the standard normal distribution, we conduct 3D conditional upsampling of noised point clouds along with discrete integral of projected noise values, inspired by 2D integral noising.

Our 3D consistent integral noising perfectly preserves Gaussian properties that random noise (a) displays, while also demonstrating interpolative qualities that bilinear interpolation (b) possesses. It also remains computationally efficient, capable of being conducted multiple times per single SDS iteration, in contrast to 2D integral noising (d).


Interpreting Initial Geometry Towards 3D View Consistency

In the experiment demonstrate below, even though the conditioning geometry is completely identical due to constraint by 3DFuse, the incorporation of our methodology with gradient consistency modeling encourages a more view-consistent and realistic interpretation of this given geometry, outputting a drastically enhanced 3D scene optimization result with Janus problem removed.


Enhancement in Convergence Speed

Comparison between naïve SDS our 3D-aware GSD shows that our method of 3D consistent noising and similarity loss achieves quicker convergence over baseline, GaussianDreamer. The prompt "a full body of a cat with a hat" is used.


Overall Framework

Our framework consists of three components for geometry-aware score distillation: 3D consistent noising, geometry-based gradient warping, and gradient consistency modeling. Through these components, our framework encourages multiview consistency between predicted 2D scores and enhances the quality of generated 3D scenes.

Abstract

Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models.

Citation