DreamAnywhere: Object Centric Panoramic 3D Scene Generation

TL;DR: Modular 3D scene generation from text using 360° panoramas, object-centric decomposition, and hybrid inpainting for immersive navigation and editing.

Overview

Our method tackles text-to-3D scene generation by first creating a panoramic image with a finetuned diffusion model, serving as geometric and stylistic prior. Relevant instances of objects are segmented, reconstructed in high-fidelity and placed in the background environment. The background is optimized for immersive viewing with a combination of 2D and 3D inpainting techniques.

Panorama Generation

We guide the 360° panorama generation process using a perspective image derived from the same prompt, providing soft conditioning without enforcing pixel-level alignment. We achieve this using an IP-Adapter-style mechanism that introduces separate cross-attention layers in all transformer blocks of the diffusion model. We jointly fine-tune the panoramic LoRA with the IP-adapter using random perspective renders of the equirectangular image, enabling effective style transfer from perspective to panoramic images.

Instance Generation

Our object reconstruction pipeline leverages the panorama and style information to generate a high-resolution reference image to be used for multi-view generation. The generated multi-view images are then transformed into 3D Gaussian splats through a reconstruction pipeline. Finally, we align the generated object with the original and place it in the scene.

Hybrid Inpainting

Our hybrid inpainting strategy combines 2D and 3D techniques: large-scale holes resulting from object removal are inpainted in the 360° image for global coherence, while smaller disocclusions caused by the 3D projection are addressed with 3D inpainting. The process proceeds in three steps: initialization and pretuning of the 3DGS point cloud, incremental inpainting to populate disoccluded regions with new Gaussians, and multi-view fine-tuning with score distillation to ensure consistency across viewpoints.

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation. SIGGRAPH 2025.

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. ICCV 2023.

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models. ArXiv 2025.

Overview

Panorama Generation

Instance Generation

Hybrid Inpainting

Related Work