Detecting When ‘Fake’ Images Are Actually Real

About the author

Picture of Martin Anderson

Martin Anderson

I'm Martin Anderson, a writer occupied exclusively with machine learning, artificial intelligence, big data, and closely-related topics, with an emphasis on image synthesis, computer vision, and NLP.

Share This Post

As the authenticity gap closes between real and AI-generated data, there is an increasing prospect of genuine human-made content being mistaken as AI-produced. Indeed, a current popular strand of thought, as fresh controversies emerge, is that photography as a token of authenticity will become so devalued by AI-generated content that neither real nor fake photos will have any deeper semantic relation to reality than paintings or illustrations did, before the invention of photography in the 1820s.

In one sense, surrendering the photography-as-reality conceit would easily solve fears over the potential use of deepfakes in the upcoming US elections, because any negatively-inclined fake content could similarly be discounted – whether or not it was actually true.

The media has latched onto deepfakes and AI-generated imagery as a threat to a fair democratic process. Source: https://www.wired.com/story/chatgpt-generative-ai-deepfake-2024-us-presidential-election/
The media has latched onto deepfakes and AI-generated imagery as a threat to a fair democratic process. Source: https://www.wired.com/story/chatgpt-generative-ai-deepfake-2024-us-presidential-election/

However, since this also devalues the impact of positive messages that politicians may wish to disseminate, the baby goes out with the bathwater. Perhaps for this alone, in the US and elsewhere, a growing and global legislative movement is seeking to rein in the free distribution of AI-created or AI-altered imagery.

In the private corporate sector, claiming real video evidence might be deepfaked has already been tried at a very public level, when lawyers for Elon Musk conflated genuinely fake videos of the tech tycoon with real video evidence that they wished to discredit.

Elon Musk's lawyers apparently sought to lump real video footage into the purview of the many genuinely deepfaked or AI-altered images and videos that Musk has been subject to over the past five years, though the judge did not accept this argument. Source: https://www.npr.org/2023/05/08/1174132413/people-are-trying-to-claim-real-videos-are-deepfakes-the-courts-are-not-amused
Elon Musk's lawyers apparently sought to lump real video footage into the purview of the many genuinely deepfaked or AI-altered images and videos that Musk has been subject to over the past five years, though the judge did not accept this argument. Source: https://www.npr.org/2023/05/08/1174132413/people-are-trying-to-claim-real-videos-are-deepfakes-the-courts-are-not-amused

Similar arguments have been brought forward in regard to the increase in AI-generated porn, which, through platforms such as Stable Diffusion, and ‘personalization’ systems such as DreamBooth and LoRA, has now made it possible to create photorealistic, non-consensual AI amateur-made pornography, featuring both celebrities and unknown women (women being the vast majority of victims).

As with political deepfakes, arguments are now being heard that all novel pornographic content is likely to be perceived as potentially non-real, rendering such malicious attacks meaningless. However, and most especially in the case of non-celebrities, other existing or pending legislation (laws that are usually inapplicable to public figures such as politicians) remains likely to criminalize false depiction of this kind.

Crucially, both scenarios presume that the culture in general has entirely caught up to the fact that seeing is not believing, across all levels of the voter base or the viewing public.

Yet it seems reasonable to argue that ‘common sense’ understanding about this new evolution in photographic trickery is likely not much more advanced right now than the surprising amount of credulity that was given to the Cottingley Fairies in 1917, almost at the very dawn of photographic visual effects.

The Cottingley hoax: in 1917, wo young cousins stirred up controversy with what would now be seen as facile photo trickery. Source: https://www.historic-uk.com/CultureUK/The-Fairies-of-Cottingley/
The Cottingley hoax: in 1917, two young cousins stirred up controversy with what would now be seen as facile photo trickery. Source: https://www.historic-uk.com/CultureUK/The-Fairies-of-Cottingley/

In the realm of text rather than imagery, the syndrome of interpreting real content as AI-generated is at a far more advanced stage, with freelance writers increasingly at risk of having their (real) work flagged as AI-created – even though their own prior efforts are reasonably likely to have contributed to the adroitness of Large Language Models (LLMs), since their historical articles were scraped freely from the web in order to obtain training data for the systems.

The three known roads forward, therefore seem to be: 1) that the culture will abandon photography as a token of reality (a choice likely to be fiercely opposed by the legislature, since this conceit solves so many of its problems, and because of the aforementioned laggard education of the public around deepfakes); 2) that any images not underwritten by provenance-based systems such as C2PA will be automatically assumed to be subject to scrutiny (which requires widespread adoption, which – even if feasible – may take some time and enforced legislation to achieve); 3) or, lastly, that systems will emerge that are capable of discerning when images are likely to be real.

The latter approach is differentiated from the current crop of deepfake detection and AI-centric security systems, which tend to key in on artefacts and other signifiers of the generative architecture, in that it instead seeks out qualities of reality even in pictures which many might assume to be AI-generated.

The last of these is one of the most under-studied strands in computer vision research, perhaps because AI has only very recently become capable of creating truly photorealistic imagery at a consumer-friendly level, and because the academic scene carries, for practical and political reasons, fairly significant lead-time on new trends in culture and technology, and often does not respond immediately to new developments.

No AI trickery here – examples from a new dataset of images that appear generative, but are actually real. Source: https://github.com/aliborji/FLORIDA
No AI trickery here – examples from a new dataset of images that appear generative, but are actually real. Source: https://github.com/aliborji/FLORIDA

As a seminal step towards a more advanced capability to discern when unusual or ‘non-conforming’ images might actually be real, a new paper offers the first known dataset of images that are likely to be flagged as AI, but which are nonetheless completely genuine photos.

In the real and unaltered images above, among some bizarre juxtapositions, we can see at least two examples of the kind of extra appendages that Stable Diffusion is still prone to add, unwanted, to its generations of human subjects, as well as some optical illusions (such as the ‘levitating man’).

Hands are always a challenge for new artists, and frequently prove a stumbling block for Stable Diffusion. Source: https://openart.ai/discovery/sd-1005996221770506302
Hands are always a challenge for new artists, and frequently prove a stumbling block for Stable Diffusion. Source: https://openart.ai/discovery/sd-1005996221770506302

Though the current state-of-the-art in the general detection of AI images (such as images created with Latent Diffusion Models, or LLMs, like Stable Diffusion and the DALL-E series) is far behind the years of investment in deepfake video detection, the images gathered were put through two such systems, which exhibited ‘subpar performance’ in determining that they were real. The work is intended to stimulate interest in this still-neglected pursuit.

The new preprint is titled FLORIDA: Fake-looking Real Images Dataset, and comes from an independent researcher.

Unlikely Truth

The paper notes a number of possible scenarios in which real images may be misinterpreted as AI-generated, including the use of ‘artistic’ or creative imagery. This is an interesting case, since generative systems such as Stable Diffusion are trained on images which are likely to have in some way come to prominence, and therefore to have ‘unusual’ compositional or aesthetic qualities.

Since such a system can then reliably distill and output these rare traits, the traits become common, and in turn become a potential indicator of falseness.

Strange realities in the new dataset.
Strange realities in the new dataset.

Another example of this ‘rare-to-many’ generative syndrome is the depiction of unusual natural phenomena, such as rare or extreme weather events or strange optical illusions – a facility which has (especially in the latter case, with add-ons such as QR code) become entirely commoditized; and, again, the very scarcity that gained attention becomes an indicator of AI-generated content.

From the dataset, examples of real but hard-to-believe examples of unusual weather conditions.
From the dataset, examples of real but hard-to-believe examples of unusual weather conditions.

Regarding the power of repetition to embed particular concepts, it should be noted that the LAION database subset that powers Stable Diffusion is of such a size as to be only minimally curated, and that certain compositions or images appear multiple times in the final training set (such as particular paintings by Van Gogh, see image below), and that this factor is also likely to more deeply embed those compositions into the latent space of the final model.

Driving the point home, as multiple versions of the same image ensure saturation coverage for the output of Van Gogh in Stable Diffusion. Source: https://rom1504.github.io/clip-retrieval/
Driving the point home, as multiple versions of the same image ensure saturation coverage for the output of Van Gogh in Stable Diffusion. Source: https://rom1504.github.io/clip-retrieval/

The examples in the dataset were taken from a series of popular articles and lists that showcased examples of real photos which many might have assumed to be photoshopped. Indeed, building such a dataset from scratch, without any prior method of filtering, is an almost unimaginable task and the paper declares its hope that this rudimentary outing into ‘impossibly real’ images may become the foundation for larger datasets in the future.

The new dataset was tested against two systems, to ascertain the extent to which the real images were likely to be interpreted as fake: Google’s BARD; and an AI Image Detector at Hugging Face.

Results from testing the real images against the Hugging Face system, which is willing to guess whether or not an image is AI-generated or real.
Results from testing the real images against the Hugging Face system, which is willing to guess whether or not an image is AI-generated or real.

The paper states:

‘Bard demonstrates an accuracy rate of 38.2%, which means it correctly identifies images as real in only 38% of cases, even though all the images are indeed real. In contrast, the Hugging Face API achieves an approximately 67% accuracy rate. Given that these images look fake for humans the rather high accuracy of the latter model is a bit surprising.

‘The extent of human performance on this dataset remains uncertain and must be evaluated in the future. This could involve, for instance, interspersing these images with other counterfeit ones and soliciting individuals to discern whether an image is genuine or fake.’

The author notes that it is uncertain whether or not the images in the new dataset were included also as training data for one or both of the systems tested, and the paper also observes that the rationale used by the systems studied is not necessarily clear.

Simply running one of the dataset images in Google Bard easily demonstrates the extent to which a state-of-the-art system is disposed to find a real image to be false. In this case (my example, not from the paper, though it uses an image from the paper's associated dataset), BARD attributes falseness not to the improbability of a floating tap, but to the excessive perfection of the (very real) water. Source: https://bard.google.com/
Simply running one of the dataset images in Google Bard easily demonstrates the extent to which a state-of-the-art system is disposed to find a real image to be false. In this case (my example, not from the paper, though it uses an image from the paper's associated dataset), BARD attributes falseness not to the improbability of a floating tap, but to the excessive perfection of the (very real) water. Source: https://bard.google.com/

In the example from BARD illustrated above (which I ran independently), we can see the extent to which provenance and known examples of an uploaded image are considered as mechanisms for authority regarding the authenticity of the image (BARD notes that the image is featured at ‘multiple webpages, which suggests that it is a stock image’).

Prior to this year, the presence of an image in a stock database would in itself have been reasonable evidence that the image is probably real. Now, as multiple stock image rights holders scramble to cut out the middleman and train generative models on their own formidable datasets, it seems that an image being present in such a commercial collection is now becoming evidence for, rather than against AI chicanery.

It should also be noted that BARD’s results in my small test above attribute the falseness of the image not to AI, but to CGI, since mesh-based images of this kind have had a modest representation in stock databanks for a long time.

Confidently wrong – the Hugging Face API page assigns falseness, to some extent, to all the real images uploaded in this test. Source: https://huggingface.co/umm-maybe/AI-image-detector
Confidently wrong – the Hugging Face API page assigns falseness, to some extent, to all the real images uploaded in this test. Source: https://huggingface.co/umm-maybe/AI-image-detector

In theory, if a trained AI detection system knows that an image existed prior to around 2017-18, it could at least be expected to deduce that any deception is more likely attributable to CGI than AI.

However, using provenance as a detection filter avoids the more difficult challenge of finding intrinsic qualities within an image that do not depend on such prior knowledge, and only provides an effective filter for older images, in any case.

For this reason, systems – such as the one employed by the Russian Yandex image search engine – which tend to evaluate the image per se, rather than seek to discern historical or commercial associations, may be more useful as the science of AI image detection evolves.

Probability as a Metric?

If generative systems become adequately photoreal as to be indistinguishable from real photos, and in the absence of unassailable provenance systems that might prove a photo to be real rather than generated or altered, the logical extension of using probability as a metric of veracity is that anything ‘unusual’ could achieve higher ratings in AI detection systems, regardless of whether the photo is real.

Little by little, this arguably risks to impose a kind of ‘chilling effect’ on the extent to which photographers and picture editors will be willing to take risks on the exact type of startling or attention-grabbing imagery which has long been their stock in trade.

At a certain point in the evolution of AI detection (though not necessarily at the end of that evolution), such practitioners may feel it safer to ‘dampen down’ such flourishes in favor of the reportage style of photography that first took hold in the USA in the 1930s, and in the Soviet Block during the reign of communism.

This is not to say that generative systems could not likewise reproduce such a style, only that a more ‘dour’ or workmanlike style of photography would not exhibit the very traits that a new breed of AI detection systems might key in on – such as ‘overly perfect’ compositions, depictions of improbably attractive people, or notably exceptional natural and man-made events.

Already, in the field of text generation, writers are beginning to torture their own natural style in order to ensure that their own original work is not mistakenly detected as AI-generated. There seems no reason why photographers may not end up in the same false position.

More To Explore

Main image derived from https://unsplash.com/photos/mens-blue-and-white-button-up-collared-top-DItYlc26zVI
AI ML DL

Detecting AI-Generated Images With Inverted Stable Diffusion Images – and Reverse Image Search

A new system for the detection of AI-generated images trains partially on the noise-maps typical of Stable Diffusion and similar generative systems, as well as using reverse image search to compare images to online images from 2020 or earlier, prior to the advent of high-quality AI image systems. The resulting fake detector works even on genAI systems that have no public access, such as the DALL-E series, and MidJourney.

Illustration developed from 'AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control' (https://xbpeng.github.io/projects/AMP/index.html)
AI ML DL

Powering Generative Video With Arbitrary Video Sources

Making people move convincingly in text-to-video AI systems requires that the system have some prior knowledge about the way people move. But baking that knowledge into a huge model presents a number of practical and logistical challenges. What if, instead, one was free to obtain motion priors from a much wider net of videos, instead of training them, at great expense, into a single model?

It is the mark of an educated mind to be able to entertain a thought without accepting it.

Aristotle