- Apr 03, 2026
- 1 min read
Google DeepMind Researchers Map Out Ways Hackers Hijack AI Agents
Google DeepMind researchers have released a paper detailing how autonomous AI agents can be hijacked.

Photo credit: NorthSky Films / Shutterstock.com
Google DeepMind researchers have released a paper detailing how autonomous AI agents can be hijacked, warning that the internet can be weaponized against agentic systems.
The paper, entitled AI Agent Traps, argues that the open internet can be a threat to AI systems designed to browse and act independently online. Individuals and companies are adopting AI agents for a wide range of administrative tasks, such as making transactions and managing emails. Unlike traditional software, agents interpret messy, untrusted content at scale, making them vulnerable to manipulation.
The study explains in its abstract:
As autonomous AI agents increasingly navigate the web, they face a novel challenge: the information environment itself. This gives rise to a critical vulnerability we refer to as ‘AI Agent Traps’, i.e. adversarial content designed to manipulate, deceive, or exploit visiting agents. … By mapping this new attack surface, we identify critical gaps in current defences and propose a research agenda that could secure the entire agent ecosystem.
The paper identifies six main attack types. Content injection traps hide malicious instructions in code or metadata that the AI agent sees, but a human does not. Semantic manipulation affects the agent’s reasoning through persuasive language or misleading framing in a similar manner to how humans can be taken in by this language.
Cognitive state traps distort an agent’s memory, causing it to treat falsehoods as facts. Behavioral control traps directly override safeguards, forcing agents to leak sensitive data, with a high success rate. Systemic traps exploit multiple agents at a time to potentially trigger cascading problems. Finally, human-in-the-loop traps can trick the human users reviewing outputs into approving harmful actions.
The researchers recommend layered technical defences, including adversarial training, runtime content scanners, and preventative output monitoring. They also advise stricter standards for determining which content is AI-readable as well as reputation systems for website domains.
The study notes a gap in legal accountability as it is currently unclear where liability lies if an AI agent is manipulated into causing harm.
Relevant articles
What is Sumsub anyway?
Not everyone loves compliance—but we do. Sumsub helps businesses verify users, prevent fraud, and meet regulatory requirements anywhere in the world, without compromises. From neobanks to mobility apps, we make sure honest users get in, and bad actors stay out.




