Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

High quality, diverse and Janus-free Text-to-3D in 20 SECONDS.

Abstract

Text-to-3D with diffusion models has achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two order of magnitude faster than previous optimization-based methods that can take 1 to 10 hours.

Diversity

a baby dragon drinking boba

a ghost eating a hamburger

a hippo wearing a sweater

a panda rowing a boat in a pond

a plush dragon toy

a squirrel dressed like Henry VIII King of England

a tiger karate master

a train engine made out of clay

Comparison to Other Methods

Ours (20s)

Shap-E (6 s)

DreamFusion-IF (1.5 hours)

ProlificDreamer (10 hours)

Prompts from top to bottom are (a) a robot made out of vegetables (b) Michelangelo style statue of an astronaut (c) a bulldozer clearing away a pile of snow (d) a squirrel dressed like Henry VIII king of England.

BibTeX

@article{instant3d2023, author = {Jiahao Li and Hao Tan and Kai Zhang and Zexiang Xu and Fujun Luan and Yinghao Xu and Yicong Hong and Kalyan Sunkavalli and Greg Shakhnarovich and Sai Bi}, title = {Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model}, journal = {https://arxiv.org/abs/2311.06214}, year = {2023}, }

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Abstract

Diversity

Image to 3D

Comparison to Other Methods

More Results

BibTeX