Entanglement in Image Synthesis

About the author

Martin Anderson

Martin Anderson

I'm Martin Anderson, a writer occupied exclusively with machine learning, artificial intelligence, big data, and closely-related topics, with an emphasis on image synthesis, computer vision, and NLP.

Share This Post

In the field of image synthesis, entanglement is the ‘enmeshing’ of data properties with adjacent or intrinsically-related properties. This can make it challenging to isolate a particular aspect of an image: if you change that data, you end up changing other facets in the generation:

Here are two image generations from Stable Diffusion; in the second, the words 'cloudy day' have been added to the text-prompt. Even though the random seed from the original prompt (see 'Going to seed', below) has been forced into the second generation, Stable Diffusion is unable to simply make the sky cloudier, but rather has changed numerous other aspects of the image.
Here are two image generations from Stable Diffusion; in the second, the words 'cloudy day' have been added to the text-prompt. Even though the random seed from the original prompt (see 'Going to seed', below) has been forced into the second generation, Stable Diffusion is unable to simply make the sky cloudier, but rather has changed numerous other aspects of the image.

In AI-driven human synthesis, architectures such as Generative Adversarial Networks (GANs), Neural Radiance Fields (NeRF), latent diffusion and autoencoders are all affected to some extent by entanglement, and all sub-sectors of research related to these technologies are actively investigating ways to ‘split up’ constituent parts or traits of neural representations.

Here, ‘traditional’ CGI techniques offer a massive advantage, since every single component in CGI imagery is essentially ‘hand-crafted’ and discrete.

From the pre-neural age, the USC Institute for Creative Technologies' 'Emily Project' (2008) sought to create a ground-breaking human representation through CGI techniques. Though the final result now looks crude even compared to early hobbyist deepfakes, the level of control over every single aspect of the representation remains enviable from the point of view of the image synthesis community. Videos sourced from links at: https://vgl.ict.usc.edu/Research/DigitalEmily/
From the pre-neural age, the USC Institute for Creative Technologies' 'Emily Project' (2008) sought to create a ground-breaking human representation through CGI techniques. Though the final result now looks crude even compared to early hobbyist deepfakes, the level of control over every single aspect of the representation remains enviable from the point of view of the image synthesis community. Videos sourced from links at: https://vgl.ict.usc.edu/Research/DigitalEmily/

If anything, the problem is reversed with CGI, and more related to the challenge of orchestrating the disparate contributing facets (such as hair and cloth physics, texturing, lighting and body motion) into a cohesive and natural representation.

Conversely, a system such as NeRF gathers every single piece of contributing data in one momentary and unordered ‘blast’, from a limited series of photos:

The NeRF capture process is similar to CGI ray-tracing, building up an interpretive neural network composed of pixel values with 3D (instead of just 2D) coordinates, and with transparency (alpha) channels, so that glass and empty or 'cut-out' sections of geometry can be correctly interpreted. Source: https://www.youtube.com/watch?v=JuH79E8rdKc
The NeRF capture process is similar to CGI ray-tracing, building up an interpretive neural network composed of pixel values with 3D (instead of just 2D) coordinates, and with transparency (alpha) channels, so that glass and empty or 'cut-out' sections of geometry can be correctly interpreted. Source: https://www.youtube.com/watch?v=JuH79E8rdKc

To accomplish this, the NeRF acquisition pipeline shoots ‘virtual rays’ down each pixel in the image, simultaneously estimating the geometry needed to recreate the subject in 3D – an advanced form of photogrammetry, where the ‘missing’ source views are estimated from the available views, creating a complete 3D interpretation:

The NeRF capture process is similar to CGI ray-tracing, building up an interpretive neural network composed of pixel values with 3D (instead of just 2D) coordinates, and with transparency (alpha) channels, so that glass and empty or 'cut-out' sections of geometry can be correctly interpreted. Source: https://www.youtube.com/watch?v=JuH79E8rdKc
The NeRF capture process is similar to CGI ray-tracing, building up an interpretive neural network composed of pixel values with 3D (instead of just 2D) coordinates, and with transparency (alpha) channels, so that glass and empty or 'cut-out' sections of geometry can be correctly interpreted. Source: https://www.youtube.com/watch?v=JuH79E8rdKc

Loaded, But Locked

However, this is an indiscriminate and non-granular method of acquisition; the result is what CGI and video-game artists would call a ‘baked’ texture/poly map, where all editability has been removed.

If you need a ‘shinier’ surface, an altered geometry or even a change of lighting, there are no trivial or innate solutions that are native to neural technologies.

Likewise the geometry in a NeRF is ‘static’ by default, and lacks the joints and rigging that CGI animators have used with relative ease, and a high level of control, for thirty years – and which must, with great difficulty, somehow be recreated by other means.

This is not to say that NeRF offers only a static explorable representation; a whole slew of projects over the last 2-3 years has enabled the recording and reproduction of continuous and even ad hoc motion in NeRF representations:

Just one of many projects which offer animated and animatable NeRF representations. This one is using source video (left) to 'puppet' the motion in a NeRF object (right). Source: http://geometrylearning.com/NeRFEditing/f
Just one of many projects which offer animated and animatable NeRF representations. This one is using source video (left) to 'puppet' the motion in a NeRF object (right). Source: http://geometrylearning.com/NeRFEditing/f

However, in general, these mobile NeRF ‘replays’ are not directly editable either; whatever motion was captured at the time is what you have to work with; and those projects which are the most flexible tend to run at the lowest resolution, or with problematic latency (i.e., the results do not render fast enough).

It must be admitted that within these parameters, one can perform eye-boggling transformations, such as nesting NeRFs inside other NeRF representations, and even mixing and matching different playback speeds:

The ST-NeRF project from ShanghaiTech University, one of the most innovative leaders in global NeRF-based image synthesis research, facilitates incredible compositional capabilities for NeRF representations – but it does not let the user intervene directly into the recorded motion of any individual NeRF. This means that you cannot change physical traits of the recorded person, or change their movements in any way except scaling them, nesting them, speeding them up or slowing them down. Source: https://www.youtube.com/watch?v=Wp4HfOwFGP4
The ST-NeRF project from ShanghaiTech University, one of the most innovative leaders in global NeRF-based image synthesis research, facilitates incredible compositional capabilities for NeRF representations – but it does not let the user intervene directly into the recorded motion of any individual NeRF. This means that you cannot change physical traits of the recorded person, or change their movements in any way except scaling them, nesting them, speeding them up or slowing them down. Source: https://www.youtube.com/watch?v=Wp4HfOwFGP4

But if you want to change the movement itself, it’s already too late; the motion was frozen into the data at the time of recording, in a system which has no intrinsic understanding of body movement, facial expressiveness, or hair and cloth dynamics – all facets that are routinely controllable in a CGI workflow.

Projects aimed at increasing access to NeRF content includes Editing Conditional Radiance Fields, Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering, EditNeRF, and the face-focused FENeRF:

In terms of at least disentangling lighting from the capture material for a NeRF, projects that have made some headway in this regard include Neural Radiance Transfer Fields for Relightable Novel-view Synthesis with Global Illumination, and Neural Radiance Fields for Outdoor Scene Relighting, among others.

Ungainly GANs

In terms of content editing, Generative Adversarial Networks are almost as difficult to navigate and control as NeRF. While the process of estimating the 3D geometry of a captured object is entirely native to NeRF, almost from the moment of data acquisition, GANs have no such innate mechanism, and are usually trained on entirely ad hoc, non-sequential data (such as hundreds of thousands of faces).

Though a GAN is well able to generalize the central traits of such diverse training data, and offers a power of invention that’s entirely missing in NeRF, this lack of native 3D understanding is an additional obstacle to creating even very limited movement.

In theory, GANs are potentially well-disposed towards disentanglement; the 2020 paper GANSpace: Discovering Interpretable GAN Controls, a collaboration between Aalto University, NVIDIA and Adobe, found that latent directions discovered through Principal Component Analysis (PCA), applied either in latent space or feature space, can offer a range of potential instrumentality for disentangled aspects of trained data.

Yet even the official video for this work (embedded directly below) demonstrates the extent to which non-targeted material is ‘dragged into’ targeted transformations.

The GAN research scene has been saturated with disentanglement projects over the last 3-4 years, none of which have made any redefining breakthroughs, and most of which leverage third-party technologies such as 3DMM (discussed below).

Approaches to GAN disentanglement include ByteDance’s use of (superimposed) semantic segmentation in its recent paper SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing.

Grad-CAM

One technology that features frequently in GAN-centered disentanglement is Gradient-weighted Class Activation Mapping Demonstration (Grad-CAM), a 2016 initiative from the Georgia Institute of Technology.

Grad-CAM uses the gradient of any target concept present in the latent space of a network (such as ‘dog’) to generate a rough localization that’s capable of highlighting related regions in the image (i.e., related to the ‘search term’ dog, for example).

The ‘heat maps’ produced by the system are essentially hijacked from processes developed by the researchers to improve the GAN’s training process, and which the GAN’s discriminator can use to tell the generator component how well it did on its previous attempt at reconstruction.

In Grad-CAM, instead, these pathways are used to create activation visualizations rather than optimization processes.

Grad-CAM operates as a kind of 'barium meal' for search terms entering the latent space, allowing the user to understand the extent to which a classification has been activated for a generated image. Source: https://arxiv.org/pdf/1610.02391.pdf
Grad-CAM operates as a kind of 'barium meal' for search terms entering the latent space, allowing the user to understand the extent to which a classification has been activated for a generated image. Source: https://arxiv.org/pdf/1610.02391.pdf

Grad-CAM is a relatively coarse tool (see the 2016 presentation video here); nonetheless, it’s one of the most popular of a very limited range of available latent space mapping libraries and frameworks, and features in a number of GAN disentanglement research projects.

For instance, in 2021 a research group led by the Chinese University of Hong Kong leveraged Grad-CAM in their paper Improving GAN Equilibrium by Raising Spatial Awareness (see video below), which enabled a user interface that allows an end-user to ‘scrub through’ the latent space of a GAN.

Though morphing through churches and adjusting angles of cats is amusing, we can see that the ‘editing’ enabled by the research essentially only allows minor adjustments, or to register transitional states between existing ‘frozen’ latent codes, rather than really getting access to the central disentangled assets of the network.

Grad-CAM has also been used to repair GAN-generated faces, for generalizing adversarial explanations, visualizing deep networks, and as an explanatory tool for the decisions made by the YOLO object detection series. It is also a popular tool for generating saliency maps in medical research, among other applications.

Grad-CAM is one of the very few ‘purely neural’ solutions to disentanglement – most current GAN-based image synthesis approaches are leaning towards CGI-based interface solutions (see Faux Disentanglement Through CGI below).

Grad-CAM’s ‘paper trail’ approach (i.e., ‘marking’ the path of the data as it enters the network) has also been used in BlobGAN, which allows the user to move ‘pre-marked’ sections of trained data around by manipulating objects in a grid:

Rudimentary object manipulation in BlobGAN. Don't expect to be able to actually make up the bed, though. Source: https://dave.ml/blobgan/
Rudimentary object manipulation in BlobGAN. Don't expect to be able to actually make up the bed, though. Source: https://dave.ml/blobgan/

Disentanglement in Latent Diffusion Generative Systems

The extent to which entanglement affects latent diffusion systems such as Stable Diffusion is currently a major research obsession, with 4-8 papers emerging weekly, at the time of writing, offering new solutions that attempt to isolate facets of an image generation for discrete editing.

One recent approach, from Zhejiang University, proposes a two-stage framework, called text-guided mask-free local image retouching, which converts a text token (i.e. ‘bear’) recognized in a generated image into an addressable object that can be isolated for editing purposes:

In the Zhejiang University system, the targeting of semantically-recognized generated facets allows for preservation of background and other image assets. Source: http://export.arxiv.org/pdf/2212.07603
In the Zhejiang University system, the targeting of semantically-recognized generated facets allows for preservation of background and other image assets. Source: http://export.arxiv.org/pdf/2212.07603

A collaboration between the UK’s University of Rochester and Adobe Research also recently proposed Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators – a system that imposes an object-level discriminator framework that, again, can isolate those annoyingly entangled facets of a generated image.

Inpainting, which relies on disambiguation of content, is facilitated by the new University of Rochester/Adobe approach. Source: https://arxiv.org/pdf/2212.06310.pdf
Inpainting, which relies on disambiguation of content, is facilitated by the new University of Rochester/Adobe approach. Source: https://arxiv.org/pdf/2212.06310.pdf

A German collaboration, including involvement from LAION, in the same week offered a Semantic Guidance (SEGA) system, dubbed The Stable Artist. The system operates directly in the latent space of a diffusion-based generative system, by organizing concepts and orchestrating appropriate paths through the latent directions related to those concepts, without dragging non-targeted facets in as collateral damage:

Latent direction manipulation allows for fine targeting of facets, though the untargeted portion of the amended image is not entirely unaffected by the process . Source: https://arxiv.org/pdf/2212.06013.pdf
Latent direction manipulation allows for fine targeting of facets, though the untargeted portion of the amended image is not entirely unaffected by the process . Source: https://arxiv.org/pdf/2212.06013.pdf

These are just three examples from a single week in December of 2022. Other recent proposals include SmartBrush, the Google-backed project Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis, Sony’s initiative Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models, and the Meta AI-backed Shape Guided Diffusion with Inside-Outside Attention.

And, if any further evidence of the academic fervor around latent diffusion disentanglement were necessary, we still haven’t cited any projects that appeared outside of December 2022.

Don't Label Me

The quality of Stable Diffusion’s generated images is closely related to the standards and characteristics of the captions processed through the CLIP-based (now OpenCLIP, since SD V2.0) text-encoding mechanism. Therefore, if a picture of a human face has been trained into Stable Diffusion with minimal captioning (‘woman’, ‘man’, etc.), the entanglement is clear, since the caption does not even specify ‘face’.

In effect, however, the trained SD network still understands that ‘eyes’ (for instance) are normal components of a face, and can transfer its more deeply-ingrained knowledge about faces to the under-captioned (or even miscaptioned) face image.

The problem is that so many of the other contributing images in the training data are also not optimally captioned. What for us would be an image containing a collection of facial features (‘a young female face with full lips, blue eyes and a short nose’) can become essentially a ‘bag of pixels’ for Stable Diffusion – one that’s hard for the system to pick apart or make editable at a semantic level, if the text data is lacking.

The LAION images on which Stable Diffusion is trained are not exactly stuffed with accurate facial descriptions, though that would have been a huge aid towards disentanglement. The image captions used for text/image associations are those that were part of the metadata for the original web-scraped images, often written solely for SEO impact, rather than utility in machine learning systems. Source: https://rom1504.github.io/
The LAION images on which Stable Diffusion is trained are not exactly stuffed with accurate facial descriptions, though that would have been a huge aid towards disentanglement. The image captions used for text/image associations are those that were part of the metadata for the original web-scraped images, often written solely for SEO impact, rather than utility in machine learning systems. Source: https://rom1504.github.io/

Going to Seed

To date, much of the research into maintaining compositional stability for Stable Diffusion editing has relied on relatively coarse and abstract mechanisms, such as the random seed functionality.

The random seed in an SD generation represents a unique and ad hoc path through the many possible ways that the latent diffusion architecture might interpret a user’s text prompt.  If you take a note of the seed that was used in a prior generation, and deliberately use that seed for a subsequent generation, without changing any other parameters, it’s usually possible to exactly reproduce the original obtained image.

However, this is a fragile method of composition preservation, and practically any change is likely to break it:

The seed is a brittle mechanism – any slight deviation from the original circumstances will break its ability to preserve the original composition – even if the seed is specifically imposed on the new generation. Here we see (bottom left corner) that even just adding the word 'color' to the text prompt changes the breed of dog generated.
The seed is a brittle mechanism – any slight deviation from the original circumstances will break its ability to preserve the original composition – even if the seed is specifically imposed on the new generation. Here we see (bottom left corner) that even just adding the word 'color' to the text prompt changes the breed of dog generated.

The brittleness of the seed is proving a major impediment to temporal coherence in video generated with Stable Diffusion, and the difficulty of changing any minor aspect of a seed-driven generation (i.e., one where you specify the prior seed from a generation that you would like to modify) illustrates the extent to which entanglement restricts editability in Stable Diffusion, compared to traditional digital workflows.

At the time of writing, a new paper – from UC, Santa Barbara, Adobe Research, and the MIT-IBM Watson AI Lab – offers a seed-driven approach to disentanglement that hinges on revising the input text embeddings from a neutral description (such as ‘photo of a person’) to a stylized description (such as ‘a photo of a person with smile’), while fixing the Gaussian noise generated during the denoising process, can preserve the semantic content of the image while allowing for modification of specific content inside that image.  

Disentangled attribute editing in the new UC paper – but, as usual, there is still some collateral (i.e., undesired) 'interpretation'. Source: https://arxiv.org/pdf/2212.08698.pdf
Disentangled attribute editing in the new UC paper – but, as usual, there is still some collateral (i.e., undesired) 'interpretation'. Source: https://arxiv.org/pdf/2212.08698.pdf

The new approach only optimizes around 50 parameters, and does not require fine-tuning (i.e., resuming model training with additional data, which can damage the overall generative capabilities of the model). Though the open source code for the model generalizes well to unseen data (i.e., you can use it on any image, and don’t have to ‘teach’ the system to adapt to images you want to edit), the results presented are not entirely free of entanglement artifacts.

Depth Maps as Boundaries for Editing in Stable Diffusion

The recently-released Stable Diffusion V2.0 introduced a new and promising method of disentanglement based on depth maps.

Depth maps, available from Stable Diffusion V2.0 onwards, offer faux 3D planes into which content alteration can be restricted. Source: https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion
Depth maps, available from Stable Diffusion V2.0 onwards, offer faux 3D planes into which content alteration can be restricted. Source: https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion

The system, titled Depth2img, uses Intel’s MiDaS library to generate depth maps as faux 3D bounding areas outlining which parts of the image content should be addressed (i.e., edited, or in some way altered), allowing for the modification of discrete areas in otherwise reproducible images.

As it stands, we can see from the bottom row of official example images in the illustration above that the depth map itself is entangled, in that the image of the man sitting on the stairs includes the environment; therefore the stairs themselves are transformed together with the conceptual transformations of the man. Presumably, this could be further addressed by the use of semantic segmentation, or other techniques designed to isolate specific elements inside depth maps.

Though depth-based mapping of this type is a notable step forward in disentangling edited content from overall compositionality, it should be noted that it does not enable the isolation of individual, object-level (or character-level) attributes.

An alternative depth-based approach has, at the time of writing, just been suggested by Korea University. Titled DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models, the approach of the new paper differs from the SD V2.0+ Depth2Img method in that it incorporates depth-aware guidance directly into the sampling process as an unconditional generation*, rather than using a tertiary library to extract depth map information from the final result.

From the Korea University paper: Qualitative comparisons of synthesized images without guidance (top), and with the new system's depth-aware guidance (DAG), producing estimated depth maps and also surface normals (far right). Source: https://arxiv.org/pdf/2212.08861.pdf
From the Korea University paper: Qualitative comparisons of synthesized images without guidance (top), and with the new system's depth-aware guidance (DAG), producing estimated depth maps and also surface normals (far right). Source: https://arxiv.org/pdf/2212.08861.pdf

Additionally, unlike the native depth-map functionality in Stable Diffusion V2.0+, the Korean system can automatically generate normal maps – a traditional CGI modeling method where the colors in an additional texture image can be used to render geometry, without explicitly including any 3D information.

Faux Disentanglement Through CGI-Based Approaches

The ideal situation for neural modeling and representation would be to gain a better mastery of the latent space generated for the model during training – to know where all the relevant information is (i.e., traits such as ‘blond’, ‘old’, ‘male’, ‘female’, etc.) and to learn how to adroitly combine or negate these qualities according to the desired result.

In the video embedded below, we see one such ‘pure’ approach, in a 2021 collaboration led by Adobe, where the user can perform real-time attribute editing – however, we can note that the application is restricted to individual images, and that the temporal element is missing; though we can explore changed static scenes, we are not seeing actual movement in the face representations:

In practice, as we have seen, trait elements tend to enter the model ‘pre-fused’, so that additional technologies and approaches are needed to disentangle them – if, indeed, that’s an achievable goal at all.

Therefore a notable new trend in research over the past four years or so has concentrated on using traditional CGI techniques to help ‘split’ and control the diverse facets in a trained model.

Most of these have used 3D Morphable Models (3DMMs) – a technique that uses a CGI model as a parametric mapping interface, allowing creators to work with traditional and familiar, controllable tools while leveraging the superior representative qualities of neural representations.

Fitting a 3DMM to a neural representation. The highly controllable and determinate mesh can be mapped to latent codes and other traits by a number of techniques, allowing users some access to edit neural facets. Source: https://github.com/Yinghao-Li/3DMM-fitting
Fitting a 3DMM to a neural representation. The highly controllable and determinate mesh can be mapped to latent codes and other traits by a number of techniques, allowing users some access to edit neural facets. Source: https://github.com/Yinghao-Li/3DMM-fitting

Leading research organizations such as the Max Planck Institute (a preeminent developer for new advances in CGI>Neural interfaces) are developing CGI/NeRF-based systems which can restore some level of control over the qualities of a representation. These innovations include Sparse Trained Articulated Human Body Regressor (STAR), a successor to its less-capable but still-popular Skinned Multi-Person Linear Model (SMPL) framework.

Though 3DMM approaches were initially dominant in GAN-based research, eventually the NeRF research community began to realize that applying CGI instrumentality to NeRF could be a valid method of developing more wieldy and versatile generation systems:

Despite initial hopes that Neural Radiance Fields might be easier to control than Generative Adversarial Networks, the use of 'old school' 3DMM CGI models has increased in the past 18 months or so. Here we see an example from RigNeRF, a 2021 collaboration with Adobe which leverages 3DMM interfaces to produce near real-time 'deepfake' functionality. Source: https://www.youtube.com/watch?v=mEuqGy1ZlMA
Despite initial hopes that Neural Radiance Fields might be easier to control than Generative Adversarial Networks, the use of 'old school' 3DMM CGI models has increased in the past 18 months or so. Here we see an example from RigNeRF, a 2021 collaboration with Adobe which leverages 3DMM interfaces to produce near real-time 'deepfake' functionality. Source: https://www.youtube.com/watch?v=mEuqGy1ZlMA

The most prominent example of the ‘CGI concession’ in recent years has been Disney Research’s intense interest in the use of Morphable Face Models to control neural representations – for example, the MoRF project, which extends the 3DMM-style approach to create a framework that can generate diverse identities which can be puppeted through parametric controls.

MoRF also offers the user improved ability to control and generate fine-grained aspects of the rendering, such as diffuse and specular separation, as well as native (rather than inferred, or ‘guessed’) depth maps:

The MoRF system, from Disney Research, allows for practicable separation of essential channels, while retaining at least some of the advantages of NeRF/neural representation. Source: https://www.youtube.com/watch?v=GfY--WTmnh4
The MoRF system, from Disney Research, allows for practicable separation of essential channels, while retaining at least some of the advantages of NeRF/neural representation. Source: https://www.youtube.com/watch?v=GfY--WTmnh4

Beyond Defeat

Such approaches are comforting for a VFX industry that’s curious but circumspect in regard to bleeding-edge AI technologies, and long-since accustomed to high levels of control over the rendering pipeline; but, arguably, they also smack of defeat, and signal a concession to the opacity of the latent space; a concession that’s hopefully only temporary, and which may eventually yield to increased and ongoing efforts towards more ‘native’ and less hybrid solutions.

In the meantime, the extraordinary ability of GANs and other neural approaches to provide hyper-real facial output – and to overcome at a stroke the uncanny valley syndrome that has dogged CGI for over 30 years – has made the incorporation of neural workflows potentially worth the effort of creating intermediary systems; effectively, however, this reduces the role of machine learning to that of a very advanced texture-renderer amidst what is otherwise a relatively old set of technologies (and which come with their own pain-points and bottlenecks).

  

* Confirmed in an email to us of 20th December 2022, by the paper’s corresponding author, Gyeongnyeon Kim.

More To Explore

AI ML DL

ChatFace Offers Better Disentangled Neural Expression-Editing

A new system from Peking University improves on the state-of-the-art for neural face-editing, offering more faithful expression manipulation and more disentangled editing of facets such as hair and eye color, among other attributes.

AI ML DL

Faking Depth Occlusion for Better Augmented Reality

New research could improve the ability of augmented reality (AR) systems to convincingly insert synthetic objects into scenes, by studying the currently complex ways that they are matted and occluded, and simply ‘guessing’ what the best results would be.

It is the mark of an educated mind to be able to entertain a thought without accepting it.

Aristotle