Digital watermarking has long been touted as a potential solution to combat misinformation, fraud, and deepfakes generated by AI models. By adding subtle markers to AI-generated content, companies hope to distinguish it from human-created content. However, recent research from the University of Maryland suggests that watermarking may not be as reliable as previously thought.
The study, titled “Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks,” reveals vulnerabilities in image watermarking as a defense against deepfakes. According to the researchers, there is a fundamental trade-off between the evasion error rate (the number of watermarked images detected as unmarked) and the spoofing error rate (the number of unmarked images detected as watermarked). In other words, watermarking schemes cannot simultaneously have high performance and high robustness.
The University of Maryland team has developed attack techniques that can defeat watermarking for both low-perturbation images (with imperceptible watermarks) and high-perturbation images (with perceptible watermarks). For low-perturbation images, the researchers use diffusion purification, which involves adding Gaussian noise to images and then applying denoising techniques to remove the added data. For high-perturbation images, the team has devised a spoofing mechanism that can make non-watermarked images appear to be watermarked.
The implications of these findings are significant. Companies like Google and OpenAI, which have publicly cited watermarking as a defense against deepfakes, may need to rethink their strategies. The researchers suggest that relying solely on watermarking may not be effective in the long term, as new attacks continue to emerge.
When asked about the parallels between image watermarking and CAPTCHA image puzzles for detecting human and machine-generated content, the researchers pointed out that machine learning is advancing rapidly and has the potential to match or even surpass human performance. They suggest that as AI-generated content becomes increasingly similar to real content, distinguishing between the two may become impossible regardless of the technique used.
The study raises questions about the future of AI content detection and highlights the ongoing race between defenses and attacks in the field of computer vision. While new watermarking methods may be developed, attackers are likely to find ways to break them. Therefore, it is crucial for companies to explore alternative approaches to combat deepfakes and ensure the integrity and authenticity of digital content.
In response to the study, Google and OpenAI did not provide comments. The researchers noted that they did not specifically analyze the watermarking mechanisms of these companies as their source code was not publicly available. However, they believe that their attacks can break any existing watermark, indicating the need for further research and development in this area.
As technology continues to advance, it is essential for companies, researchers, and policymakers to address the challenges posed by deepfakes and misinformation. While watermarking may not be a foolproof solution, it is part of a broader effort to develop robust tools and techniques to safeguard the integrity of digital content and protect users from manipulated and fraudulent information.