Home Blog Hacking AI Agents—How Malicious Images and Pixel Manipulation Threaten Cybersecurity

Hacking AI Agents—How Malicious Images and Pixel Manipulation Threaten Cybersecurity

23
0
Hacking AI Agents—How Malicious Images and Pixel Manipulation Threaten Cybersecurity


A website announces, “Free celebrity wallpaper!” You browse the images. There’s Selena Gomez, Rihanna and Timothée Chalamet—but you settle on Taylor Swift. Her hair is doing that wind-machine thing that suggests both destiny and good conditioner. You set it as your desktop background, admire the glow. You also recently downloaded a new artificial-intelligence-powered agent, so you ask it to tidy your inbox. Instead it opens your web browser and downloads a file. Seconds later, your screen goes dark.

But let’s back up to that agent. If a typical chatbot (say, ChatGPT) is the bubbly friend who explains how to change a tire, an AI agent is the neighbor who shows up with a jack and actually does it. In 2025 these agents—personal assistants that carry out routine computer tasks—are shaping up as the next wave of the AI revolution.

What distinguishes an AI an agent from a chatbot is that it doesn’t just talk—it acts, opening tabs, filling forms, clicking buttons and making reservations. And with that kind of access to your machine, what’s at stake is no longer just a wrong answer in a chat window: if the agent gets hacked, it could share or destroy your digital content. Now a new preprint posted to the server arXiv.org by researchers at the University of Oxford has shown that images—desktop wallpapers, ads, fancy PDFs, social media posts—can be implanted with messages invisible to the human eye but capable of controlling agents and inviting hackers into your computer.


On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


For instance, an altered “picture of Taylor Swift on Twitter could be sufficient to trigger the agent on someone’s computer to act maliciously,” says the new study’s co-author Yarin Gal, an associate professor of machine learning at Oxford. Any sabotaged image “can actually trigger a computer to retweet that image and then do something malicious, like send all your passwords. That means that the next person who sees your Twitter feed and happens to have an agent running will have their computer poisoned as well. Now their computer will also retweet that image and share their passwords.”

Before you begin scrubbing your computer of your favorite photographs, keep in mind that the new study shows that altered images are a potential way to compromise your computer—there are no known reports of it happening yet, outside of an experimental setting. And of course the Taylor Swift wallpaper example is purely arbitrary; a sabotaged image could feature any celebrity—or a sunset, kitten or abstract pattern. Furthermore, if you’re not using an AI agent, this kind of attack will do nothing. But the new finding clearly shows the danger is real, and the study is intended to alert AI agent users and developers now, as AI agent technology continues to accelerate. “They have to be very aware of these vulnerabilities, which is why we’re publishing this paper—because the hope is that people will actually see this is a vulnerability and then be a bit more sensible in the way they deploy their agentic system,” says study co-author Philip Torr.

Now that you’ve been reassured, let’s return to the compromised wallpaper. To the human eye, it would look utterly normal. But it contains certain pixels that have been modified according to how the large language model (the AI system powering the targeted agent) processes visual data. For this reason, agents built with AI systems that are open-source—that allow users to see the underlying code and modify it for their own purposes—are most vulnerable. Anyone who wants to insert a malicious patch can evaluate exactly how the AI processes visual data. “We have to have access to the language model that is used inside the agent so we can design an attack that works for multiple open-source models,” says Lukas Aichberger, the new study’s lead author.

By using an open-source model, Aichberger and his team showed exactly how images could easily be manipulated to convey bad orders. Whereas human users saw, for example, their favorite celebrity, the computer saw a command to share their personal data. “Basically, we adjust lots of pixels ever-so-slightly so that when a model sees the image, it produces the desired output,” says study co-author Alasdair Paren.

If this sounds mystifying, that’s because you process visual information like a human. When you look at a photograph of a dog, your brain notices the floppy ears, wet nose and long whiskers. But the computer breaks the picture down into pixels and represents each dot of color as a number, and then it looks for patterns: first simple edges, then textures such as fur, then an ear’s outline and clustered lines that depict whiskers. That’s how it decides This is a dog, not a cat. But because the computer relies on numbers, if someone changes just a few of them—tweaking pixels in a way too small for human eyes to notice—it still catches the change, and this can throw off the numerical patterns. Suddenly the computer’s math says the whiskers and ears match its cat pattern better, and it mislabels the picture, even though to us, it still looks like a dog. Just as adjusting the pixels can make a computer see a cat rather than a dog, it can also make a celebrity photograph resemble a malicious message to the computer.

Back to Swift. While you’re contemplating her talent and charisma, your AI agent is determining how to carry out the cleanup task you assigned it. First, it takes a screenshot. Because agents can’t directly see your computer screen, they have to repeatedly take screenshots and rapidly analyze them to figure out what to click on and what to move on your desktop. But when the agent processes the screenshot, organizing pixels into forms it recognizes (files, folders, menu bars, pointer), it also picks up the malicious command code hidden in the wallpaper.

Now why does the new study pay special attention to wallpapers? The agent can only be tricked by what it can see—and when it takes screenshots to see your desktop, the background image sits there all day like a welcome mat. The researchers found that as long as that tiny patch of altered pixels was somewhere in frame, the agent saw the command and veered off course. The hidden command even survived resizing and compression, like a secret message that’s still legible when photocopied.

And the message encoded in the pixels can be very short—just enough to have the agent open a specific website. “On this website you can have additional attacks encoded in another malicious image, and this additional image can then trigger another set of actions that the agent executes, so you basically can spin this multiple times and let the agent go to different websites that you designed that then basically encode different attacks,” Aichberger says.

The team hopes its research will help developers prepare safeguards before AI agents become more widespread. “This is the first step towards thinking about defense mechanisms because once we understand how we can actually make [the attack] stronger, we can go back and retrain these models with these stronger patches to make them robust. That would be a layer of defense,” says Adel Bibi, another co-author on the study. And even if the attacks are designed to target open-source AI systems, companies with closed-source models could still be vulnerable. “A lot of companies want security through obscurity,” Paren says. “But unless we know how these systems work, it’s difficult to point out the vulnerabilities in them.”

Gal believes AI agents will become common within the next two years. “People are rushing to deploy [the technology] before we know that it’s actually secure,” he says. Ultimately the team hopes to encourage developers to make agents that can protect themselves and refuse to take orders from anything on-screen—even your favorite pop star.

WordPress Plugins

LEAVE A REPLY

Please enter your comment!
Please enter your name here