Meta’s new AI model can turn text into 3D images in under a minute

an array of 3D generated images made by Meta 3D Gen
Meta

Meta’s latest foray into AI image generation is a quick one. The company introduced its new “3D Gen” model on Tuesday, a “state-of-the-art, fast pipeline” for transforming input text into high-fidelity 3D images that can output them in under a minute.

What’s more, the system is reportedly able to apply new textures and skins to both generated and artist-produced images using text prompts.

Per a recent study from the Meta Gen AI research team, 3D Gen will not only offer both high-resolution textures and material maps but support physically-based rendering (PBR) and generative re-texturing capabilities as well.

The team estimates an average inference time of just 30 seconds in creating the initial 3D model using Meta’s 3D AssetGen model. Users can then go back and either refine the existing model texture or replace it with something new, both via text prompts, using Meta 3D TextureGen, a process the company figures should take no more than an additional 20 seconds of inference time.

“By combining their strengths,” the team wrote in its study abstract, “3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space.” The Meta team set its 3D Gen model against a number of industry baselines and compared along a variety of factors including text prompt fidelity, visual quality, texture details and artifacts. By combining the functions of both models, images generated by the integrated two-stage process were picked by annotators over their single-stage counterparts 68% of the time.

Granted, the system discussed in this paper is still under development and not yet ready for public use, but the technical advances that this study illustrates could prove transformational across a number of creative disciplines, from game and film effects to VR applications.

Giving users the ability to not only create but edit 3D-generated content, both quickly and intuitively, could drastically lower the barrier to entry for such pursuits. It’s not hard to imagine the effect this could have on game development, for example.