Every security researcher builds their instinct the same way: by reading. CVE write-ups, exploit proofs-of-concept, CTF solutions, disclosure reports, the source of things that broke and the patches that fixed them. It takes years – and no single human can read it all.
A frontier language model has read essentially all of it. Every public CVE, every Metasploit module, every "here's how I popped this box" blog post, every academic paper on a new class of memory corruption. The entire written history of how software breaks sits in the training data – and unlike a human, the model doesn't forget the boring advisory from 2011 it skimmed once.
For years the story about AI and code was "it can write code." The more uncomfortable story of 2025/2026 is the other half: it can break code. And it is starting to do the one thing that used to separate elite offensive researchers from everyone else – combining unrelated weaknesses into a working chain – faster than any human team.
This is not a hypothetical. Let's look at what actually happened.
It finds real, previously-unknown vulnerabilities now
Not "spots a bug in a homework exercise." Real zero-days, in software you depend on.
- Big Sleep / SQLite (late 2024). Google's Big Sleep – an LLM agent from DeepMind and Project Zero – found an exploitable stack buffer underflow in SQLite. Project Zero called it the first public example of an AI agent finding a previously unknown, exploitable memory-safety issue in widely used real-world software. Traditional fuzzing (including SQLite's own harness) had missed it.
- o3 / Linux kernel (May 2025). Researcher Sean Heelan pointed OpenAI's o3 reasoning model at the Linux kernel's SMB implementation (ksmbd) and it found a genuine remote zero-day, now tracked as CVE-2025-37899 – no fuzzer, no special harness, just the model reading code and reasoning about it.
- The generalization (April 2026). Google's Threat Intelligence Group states it plainly: general-purpose frontier models can excel at vulnerability discovery even without being purpose-built for it, can help generate functional exploits, and are already being used this way by real threat actors marketing the capability on underground forums.
The real danger isn't finding one hole — it's combining them
Finding a single bug was never the hard part. The craft that took years to build was chaining: turning a "low severity" information leak plus a "medium" overflow plus a misconfiguration into one path to remote code execution. That is exactly the skill AI is now demonstrating.
- Teams of agents that chain across bug classes (June 2024). The academic system HPTSA uses a supervisor agent that explores a target and orchestrates specialist sub-agents, each focused on one vulnerability class. On real-world, zero-day web vulnerabilities past the model's knowledge cutoff and with no description of the flaw provided, it hit a 53% success rate over five attempts – within 1.4× of an agent that was handed the answer, while off-the-shelf vulnerability scanners scored 0%. The chaining across many vulnerability types is the whole point of the architecture.
- From CVE line-item to working exploit (April 2024). In an earlier study, GPT-4 given nothing but the public CVE description built working exploits for 87% of the one-day vulnerabilities tested. Think about what that automates: the window between a vulnerability being disclosed and you patching it – where most real breaches actually live – shrinks from "weeks, and only if someone bothers" to "as fast as an agent can read the advisory."
- The severity model is collapsing (April 2026). The most important sentence in Google's 2026 threat report: AI agents can chain multiple low-level vulnerabilities together, collapsing the practical distinction between "remote code execution" and "seemingly benign, local-only" bugs. Translation: every medium- and low-severity issue you deprioritized is now a rung on a ladder an agent can climb.
This is the shift. The 2024 result that GPT-4 could autonomously hack 73% of test websites when simply told to was a warning shot. The chaining is the actual weapon.
It scales, and it lowers the floor
Two things happen at once: expert capability gets cheaper, and it gets available to people who never had it.
- Autonomous, at superhuman volume. XBOW's autonomous, AI-powered pen-testing system became the first non-human to reach #1 on HackerOne's US leaderboard (June 2025), submitting over a thousand vulnerability reports in a matter of months – out-producing thousands of human hackers.
- The floor drops. In August 2025 Anthropic disrupted an actor who used Claude Code as an active operational participant – not an advisor – in an extortion campaign against at least 17 organizations across healthcare, emergency services, and government, letting the AI decide which data to exfiltrate and how to word the extortion. Separately, a criminal with few real skills used Claude to build and sell ransomware for $400–$1,200; Anthropic notes the person could not have implemented the core malware without the AI.
The uncomfortable summary of this section: the knowledge that used to gate entry to serious offensive work is now a commodity, on tap, and tireless.
The sober part: a lot of it is still noise
If you stop reading here you'll over-correct into panic, so: the hype is running ahead of the measured reality, and that matters.
- Fully autonomous exploitation is still modest. On CVE-Bench – 40 real, critical-severity CVEs in a sandbox – state-of-the-art agents managed to exploit up to ~13% unaided (March 2025). Impressive trajectory; not a supervillain.
- The researchers are honest about limits. The Big Sleep team explicitly called their result highly experimental and noted a target-specific fuzzer would likely have been at least as effective.
- "AI slop" is a real tax. curl maintainer Daniel Stenberg has been vocal about the flood of low-quality, AI-generated bug reports wasting maintainers' time – false positives at scale. (Tellingly, the same maintainer also credits AI with helping fix dozens of genuine bugs. It cuts both ways.)
- Some headline claims are contested. Anthropic's later (November 2025) report of a largely AI-orchestrated espionage campaign was met with pointed skepticism from parts of the security community, who argued it relied on commodity tooling and no novel technique.
So: not an unstoppable machine. But "mostly noise today" is thin comfort when the signal is compounding and the trend line only points one way.
Defenders get the exact same superpower
The good news is that none of this is available only to attackers. The same capability, pointed the other way, is already paying off:
- Catching bugs before the attackers use them. In July 2025, Big Sleep – combined with Google Threat Intelligence signals – identified a critical SQLite flaw (CVE-2025-6965) that was known only to threat actors and about to be exploited, and it was cut off first. Google calls it the first time an AI agent directly foiled an in-the-wild exploitation effort.
- Find-and-fix at machine speed and cost. In DARPA's AI Cyber Challenge final (August 2025), autonomous systems found 54 of 63 planted vulnerabilities and 18 real, previously-unknown ones across 54 million lines of code – patching them at roughly $152 and 45 minutes each.
- Eliminating whole bug classes. Google DeepMind's CodeMender upstreamed 72 security fixes to open-source projects in about six months (some codebases up to 4.5M lines), and rewrote parts of the libwebp image library with bounds-safety annotations that would have neutralized the zero-click CVE-2023-4863 exploit – and most future overflows in that code.
But the arms race is asymmetric. An attacker needs one working chain; a defender has to close all of them. AI multiplies both sides – and the side with more surface area to defend feels the multiplication more.
What this actually means for you
Skip the panic; adjust the model. Concretely:
- Treat patch velocity as a security control, not hygiene. If an agent can turn a published patch diff into a working exploit in minutes, your n-day exposure window is now measured in hours, not weeks. The old bet – "attackers probably won't get around to us before we patch" – no longer holds.
- Stop dismissing "low" and "medium." Chaining is the game now. A local-only info leak plus a benign-looking overflow is a remote-code-execution incident waiting for an agent to assemble it. Prioritize by chainability, not just CVSS.
- Shrink what you can't defend. Fewer dependencies, least privilege, real network segmentation – so a single foothold doesn't chain to everything. This is the same lesson as the lethal trifecta in AI agents: reduce the blast radius before you worry about the exploit.
- Point the tools at your own code first. An autonomous agent will read your codebase eventually – better that it's yours. AI-assisted review, fuzzing, and dependency scanning in CI are no longer optional extras.
- Prefer secure-by-design over patch-after. Memory-safe languages, bounds-safety, strict input validation: the bug classes AI is best at finding are exactly the ones a secure-by-default architecture eliminates wholesale.
Conclusion
For decades, a lot of security quietly rested on one assumption: that most attackers wouldn't find the obscure bug, wouldn't bother chaining the small ones, wouldn't read all 40,000 lines. That assumption is gone. They will now – or rather, their AI will, tirelessly, having already read every exploit anyone ever published.
The same sentence, read the other way, is the reason not to panic: the capability that finds your bugs can also fix them, and it's available to you today. AI hasn't asked whether it can change your threat model. It already has. The only open question is whether you put it to work on your own code before someone else points it at you.
Sources
All incidents and figures are dated 2024–2026. This is a fast-moving field – verify against the primary source before relying on any single number.
- Google Project Zero – From Naptime to Big Sleep (Oct 2024)
- Sean Heelan – How I used o3 to find CVE-2025-37899, a Linux kernel SMB zero-day (May 2025)
- Google Cloud / GTIG – Defending enterprise AI: vulnerabilities & exploit chaining (Apr 2026)
- arXiv:2406.01637 – Teams of LLM Agents (HPTSA) can Exploit Zero-Day Vulnerabilities (Jun 2024)
- arXiv:2404.08144 – LLM Agents can Autonomously Exploit One-Day Vulnerabilities (Apr 2024)
- arXiv:2402.06664 – LLM Agents can Autonomously Hack Websites (Feb 2024)
- XBOW – How we reached #1 on the HackerOne US leaderboard (2025)
- Dark Reading – AI-Based Pen Tester Becomes Top Bug Hunter on HackerOne (2025)
- Anthropic – Detecting and countering misuse of AI (Aug 2025)
- arXiv:2503.17332 – CVE-Bench: exploiting real-world web vulnerabilities (Mar 2025)
- Cybernews – curl maintainer Daniel Stenberg on AI: from "AI slop" to fixing dozens of bugs
- BleepingComputer – Anthropic’s claims of AI-automated cyberattacks met with doubt (Nov 2025)
- Google Cloud – Our Big Sleep agent makes a big leap: catching CVE-2025-6965 (Jul 2025)
- NVD – CVE-2025-6965
- DARPA – AI Cyber Challenge (AIxCC) final results (Aug 2025)
- Google DeepMind – Introducing CodeMender, an AI agent for code security (Oct 2025)
