Deforming 3D Gaussian Splat Models With Cages, the Easy Way

GSDeformer. Source: https://arxiv.org/pdf/2405.15491
GSDeformer. Source: https://arxiv.org/pdf/2405.15491

About the author

Picture of Martin Anderson

Martin Anderson

I'm Martin Anderson, a writer occupied exclusively with machine learning, artificial intelligence, big data, and closely-related topics, with an emphasis on image synthesis, computer vision, and NLP.

Share This Post

The extent to which neural representations are difficult to directly edit has become a theme for us lately, because it represents one of the biggest obstacles to the uptake of AI-based neural synthesis systems by VFX production houses.

While systems such as Neural Radiance Fields (NeRF), Generative Adversarial Networks (GANs) and Gaussian Splatting (GSplat) are capable of rendering extraordinarily life-like representations of people, scenes and objects, it is very difficult to change just a few tiny details about the output, compared to how (relatively) easy it is to generate that output initially.

After more than 30 years since the advent of production-typical CGI, directors and VFX supervisors have become used to fine-grained control over the smallest aspect of a visual effects shot (or any shot, for that matter), and are unlikely to surrender this OCD-like attention to the whims and peculiarities of the current crop of AI synthesis systems.

Therefore a great deal of research in the synthesis scene in this period has become dedicated to obtaining an AI-based analog of the ecosystem of tools and techniques that CGI has refined over the last three decades.

Cage-Based Deformation

One such technique is cage-based deformation (CBD). With CBD, an ‘area of influence’ is constructed around the polygon mesh, allowing the user to deform the fully or partially enclosed mesh by moving nodes within the area, or ‘cage’.

Click to play. The open source 3D framework blender, in common with most 3D/CGI-based applications, can deform objects by surrounding them with cages, the deformations of which influence the visible mesh that they encompass. Source: https://www.youtube.com/watch?v=kLIHH7LUrR4

One big advantage of CBD is that it does not destructively affect the underlying mesh; another is that it can be animated; and yet another is that it is not only useful in modeling a figure or object, but can also be used to replicate functions such as wings flapping, lids of boxes opening, mouths moving, and almost any other action one might like to depict.

All this is possible because in CGI, the location of all the points is explicit. By contrast, in nearly all methods of neural rendering, the synthesis process happens in a kind of ‘black box’ – the latent space, in systems such as NeRF and GAN, and an at least initially opaque rasterization process, in the case of Gaussian Splat-based renders.

Therefore you can’t easily perform these very useful operations on a typical neural entity. Instead, the majority of current projects that deal with human synthesis use a form of CGI that is designed to act as a ‘go-between’ across known coordinates and the more mysterious captured coordinates of a neural object – systems such as 3DMM and FLAME.

There’s nothing wrong with this CGI-based instrumentality, except that it inevitably results in a fairly ‘Heath-Robinson’-style manipulation system, formed of multiple external systems, and often dependent on training, additional data, or other heavyweight resources.

GSDeformer

By contrast, a new project from the UK’s Bournemouth University offers a training-free and surprisingly lightweight method of using CBD for Gaussian Splat instances, opening up the way not only for many of the advantages of CGI-based CBD, but also for potentially tweaking Gaussian Humans that use GSplat as their base technology.

Sample deformations from the new GSDeformer project. Source: https://jhuangbu.github.io/gsdeformer/

The new system is titled GSDeformer, and the authors note, in acknowledging its predecessors, that it is the first system to offer this kind of manipulation of 3D Gaussian Splat (3DGS) models without the need for the usual array of tertiary frameworks; instead, GSDeformer converts the 3D Gaussians directly into a more traditional point cloud (a collection of 3D/X-Y-Z coordinates), and uses the obtained geometry of this cloud to develop an apposite ‘halo’ of cage mesh, which can deform the enclosed geometry.

Click to play. As the GSDeformer cage (right) is manipulated by the end-user, the underlying Gaussian Splat-based Lego vehicle apparently performs some actions of its own.

The system heavily leverages the 2017 Bounding Proxies project, which automates the generation of cage meshes – a process that can be burdensome when the base model is complex.

To the casual student of neural synthesis, this accomplishment may seem routine; but the current groundswell of misguided conviction around the capabilities of generative AI tend to make all such developments seem inevitable, when they are usually extraordinarily challenging.

Instead, this outing appears to be a real, potential milestone in deconstructing the arcane neural systems of today into the more user-friendly and granular CGI workflows of yesterday.

The paper states:

‘Our approach extends cage-based deformation to 3DGS by converting 3DGS to a proxy point cloud representation whose deformation can be transferred to 3DGS, all without requiring any additional data or changes to 3DGS’s architecture. We also propose a complementary cage-building algorithm to automatically create the cages for deforming 3DGS.’

The new paper is titled GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting, and comes from two researchers at Bournemouth University. The site also has an associated project page.

Method

GSDeformer first converts the 3DGS model into a binary occupancy voxel grid. This can be achieved without the need to render views and infer geometry, since the source geometry is exposed enough in the rasterization process to be accessible.

Schema for the workflow of GSDeformer. Source: https://arxiv.org/pdf/2405.15491
Schema for the workflow of GSDeformer. Source: https://arxiv.org/pdf/2405.15491

The points obtained are effectively dots in space that are used to mesh the cage coherently, with marching cubes (an algorithm that creates triangular meshes from implicit functions, bounded by cube coordinates).

The Marching Cubes algorithm iterates through function-based coordinates in a quadratic sequence, converting found geometry to triangular mesh nodes. Source: https://graphics.stanford.edu/~mdfisher/MarchingCubes.html
The Marching Cubes algorithm iterates through function-based coordinates in a quadratic sequence, converting found geometry to triangular mesh nodes. Source: https://graphics.stanford.edu/~mdfisher/MarchingCubes.html

The cage is then decimated (though it appears that more than a tenth of the points are removed), since it does not benefit from being as complex as the enclosed mesh.

The source 3D Gaussians that comprise the representation are then converted into approximated ellipsoid, and then into ‘proxy points’ (points that will reflect changes made to the cage by the user).

A 3DGS chair with a transliterated and reduced cage mesh, shown parallel to the source object (it is usually surrounding the object).
A 3DGS chair with a transliterated and reduced cage mesh, shown parallel to the source object (it is usually surrounding the object).

After this, manipulations of the interpreted cage will be applied, magnet-style, to the mesh comprising the 3DGS representation.

As mentioned, this cage-generation process is facilitated by the code release for the 2017 Bounding Proxies for Shape Approximation Siggraph entry.

From the original 'Bounding Proxies' 2017 paper, an indication of the reduction process that generates a shape-fitted manipulation cage from the original mesh. Source: https://perso.telecom-paristech.fr/boubek/papers/BoundingProxies/BoundingProxies.pdf
From the original 'Bounding Proxies' 2017 paper, an indication of the reduction process that generates a shape-fitted manipulation cage from the original mesh. Source: https://perso.telecom-paristech.fr/boubek/papers/BoundingProxies/BoundingProxies.pdf

The authors observe that the relative ease of their proposed method is in contrast to recent outings in a similar vein. In February of this year, the paper GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting offered similar deformations, but required additional training; likewise, the November 2023 entry SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering also offered deformations, but, as the authors observe, at some cost of architectural complexity.

The paper states:

‘Unlike GaMeS and SuGaR, our method directly operates on the 3D gaussian distributions and does not change the underlying architecture of the method, making our model directly applicable to existing trained 3DGS representations without retraining.’

Data and Tests

For tests for GSDeformer, the researchers used a voxel grid resolution of 128px, and a density threshold of 1e-4 (also likely what the training goal would have been if 3DGS were a latent space-based method instead of a rasterization algorithm).

Initially, the authors tested the deformation capabilities of the system on the Synthetic NeRF and MipNeRF360 datasets, where cages were generated by the adapted prior system, and deformed by the authors manually:

Deformations on standard NeRF objects.
Deformations on standard NeRF objects.

The authors assert:

‘It can be seen that our method achieves high-quality deformation on both synthetic and real-world captures. Our model supports deformation, simple transformation and enlarging of the selected object based on the cages.’

Next, the authors compared their system against other prior approaches, including NeRF-based approaches, that either use a cage-based system, or can be adapted for a fair comparison. These included Deforming Radiance Fields with Cages (DeformingNeRF):

From the prior work 'Deforming Radiance Fields with Cages', a NeRF-based cage system that requires far more work than the new offering, and to boot uses an older system that's less in vogue, these days, than GSplat. Source: https://arxiv.org/pdf/2207.12298
From the prior work 'Deforming Radiance Fields with Cages', a NeRF-based cage system that requires far more work than the new offering, and to boot uses an older system that's less in vogue, these days, than GSplat. Source: https://arxiv.org/pdf/2207.12298

Other systems tested for this round were GaMeS and SuGaR (while elsewhere in the rounds, the March 2024 work SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes is also considered, which requires additional video training data to deform Gaussians by transliterating the movements of control points – slightly more alike to a 3DMM-style inference routine):

Initial comparisons against prior similar frameworks. Please refer to the source paper for better resolution.
Initial comparisons against prior similar frameworks. Please refer to the source paper for better resolution.

The authors concede that results from GSDeformer are ‘comparable’ to the prior methods, but point out that this is not where the value of the system lies, and present a table indicating the drag points in prior methods:

Comparison of GSDeformer's requirements to those of older methods.
Comparison of GSDeformer's requirements to those of older methods.

They also highlight the significantly reduced vertex count in GSDeformer’s typical cage:

GSDeformer permits a notable reduction in vertex complexity and volume, compared to prior approaches.
GSDeformer permits a notable reduction in vertex complexity and volume, compared to prior approaches.

The paper states*:

‘The highlight of our method lies in achieving comparable quality while having other significant [merits].

‘Firstly, our method operates on the more efficient and capable 3D Gaussian Splatting model. Therefore, we achieve higher rendering quality than DeformingNeRF; this can be seen from the zoomed-in views [below].’

Detail from the above-included results image. Please refer to the source paper for better resolution.
Detail from the above-included results image. Please refer to the source paper for better resolution.

The authors continue:

‘Note the robot arm and toad belly in our method and GaMeS are much clearer and sharper than DeformingNeRF. [Our] model also directly deforms the underlying representation; therefore, no deformation is needed during rendering, unlike DeformingNeRF…

‘…[Compared with SC-GS], our method does not require video data for learning deformation, so our method can operate on static scene captures while SC-GS cannot.’

The authors do concede that their method does not currently operate in real-time, and propose that a simpler and faster workflow might be possible in future advances on the scheme; additionally they believe that transformation of colors might be possible in later iterations.

Click to play. Results from GSDeformer on both synthetic and real-world scenes.

Conclusion

Part of the excitement over Gaussian Splatting, at least in the VFX community, is the architecture’s accessibility to older CGI-based methodologies. In this period, we’re seeing a fair influx of papers looking to retrofit CGI instrumentality onto 3DGS, such as the recent GarmentDreamer offering, which can impose clothing onto 3DGS representations:

GarmentDreamer, published in May of 2024, offers users a method of dressing GSplat figures. Source: https://arxiv.org/pdf/2405.12420
GarmentDreamer, published in May of 2024, offers users a method of dressing GSplat figures. Source: https://arxiv.org/pdf/2405.12420

We’ve already covered the notable efforts of the full-body splat technique outlined in the December 2023 paper Animatable Gaussian Splats for Efficient and Photoreal Human Rendering, while newer offerings such as Gaussian Head & Shoulders can also perform texture warping on clothing in 3DGS human representations.

Elsewhere, DOF-GS proposes a Splat-based architecture that can control blur and depth of field, while schemes like GSTalker and TalkingGaussian propose relatively lightweight controllable face/body synthesis schemes.

In short, after years of wrestling with CGI interfaces such as 3DMM as intermediaries to opaque trained systems, the research sector seems enthusiastic to adopt GSplat as a more controllable alternative ecostructure, and a more user-friendly analog to the CGI systems whose rendering capabilities have lately been eclipsed by developments in generative image and video frameworks.

* Inline citations omitted as duplicates, since these projects are hyperlinked elsewhere in this article.

More To Explore

Main image derived from https://unsplash.com/photos/mens-blue-and-white-button-up-collared-top-DItYlc26zVI
AI ML DL

Detecting AI-Generated Images With Inverted Stable Diffusion Images – and Reverse Image Search

A new system for the detection of AI-generated images trains partially on the noise-maps typical of Stable Diffusion and similar generative systems, as well as using reverse image search to compare images to online images from 2020 or earlier, prior to the advent of high-quality AI image systems. The resulting fake detector works even on genAI systems that have no public access, such as the DALL-E series, and MidJourney.

Illustration developed from 'AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control' (https://xbpeng.github.io/projects/AMP/index.html)
AI ML DL

Powering Generative Video With Arbitrary Video Sources

Making people move convincingly in text-to-video AI systems requires that the system have some prior knowledge about the way people move. But baking that knowledge into a huge model presents a number of practical and logistical challenges. What if, instead, one was free to obtain motion priors from a much wider net of videos, instead of training them, at great expense, into a single model?

It is the mark of an educated mind to be able to entertain a thought without accepting it.

Aristotle