AI has gotten good at finding bugs, not so good at swatting them

What good is finding a hole if you can’t fix it? Anthropic last week talked up Claude Code’s improved ability to find software vulnerabilities and propose patches. But security researchers say that’s not enough.

The AI biz describes the research preview capability, dubbed Claude Code Security, as a way for security teams to find and fix flaws they might otherwise have missed.

“This is a pivotal time for cybersecurity,” the company said. “We expect that a significant share of the world’s code will be scanned by AI in the near future, given how effective models have become at finding long-hidden bugs and security issues.”

To highlight Claude Code Security’s bug hunting potential, the company pointed to how its red team had used Claude Opus 4.6 to find “over 500 vulnerabilities in production open-source codebases.”

Guy Azari, a stealth startup founder who worked previously as a security researcher at Microsoft and Palo Alto Networks, told The Register, “Out of the 500 vulnerabilities that they reported, only two to three vulnerabilities were fixed. If they haven’t fixed them, it means that you haven’t done anything right.”

Azari pointed to the absence of Common Vulnerabilities and Exposures (CVE) assignments as evidence that the security process remains incomplete. Finding vulnerabilities was never the issue, he said, pointing to his time running vulnerability management at the Microsoft Security Response Center.

“We used to get the reports all day long,” he said. “When AI was introduced, it just multiplied by 100x or 200x and added a lot of noise because AI assumes that these are vulnerabilities, but there wasn’t like a unit that actually can show the real value or the real impact. And if it’s not there, you’re probably not gonna fix it.”

In 2025, according to Azari, the National Vulnerability Database had a backlog of roughly 30,000 CVE entries awaiting analysis, with nearly two-thirds of reported open source vulnerabilities lacking an NVD severity score. Open source maintainers are already overwhelmed, he said, pointing to the curl project’s closure of its bug bounty program to deter poorly crafted reports from AI and from people.

“The maintainers of curl closed their program two months ago or something like that because they just got too many false positives and they couldn’t deal with the load,” he explained. “So potentially what Claude did was not helping them out by bringing them more issues. But these issues are not validated, they are not concrete. It’s not a fix. It’s more like magnifying the collapse.”

Anthropic declined to comment on the record, but the company’s red team post indicates that its researchers are working with open source maintainers to address the identified vulnerabilities. So, further details about what was found may surface in time.

Feross Aboukhadijeh, CEO of security biz Socket, told The Register in an email that CVEs are just one part of the coordinated disclosure, so CVE publication should not be expected immediately.

“We have no doubt that Anthropic’s team surfaced 500+ credible vulnerability candidates,” Aboukhadijeh said. “That tracks with what we’re seeing across the industry as models get increasingly good at detecting vulnerable code. Discovery is becoming dramatically cheaper as large models get increasingly good at exploring codebases and reasoning across components.

“The harder part isn’t finding issues anymore. It’s everything that happens after.

“Turning vulnerability candidates into validated, reproducible findings that maintainers and customers can actually act on takes time: confirming affected versions, assessing real-world impact, coordinating with maintainers, and developing patches that align with the project’s architecture.”

Aboukhadijeh expects that the spread of powerful, security-optimized AI tools will present security teams with an increasing torrent of patches, upgrades, and emergency fixes to apply to codebases. Security, he expects, will be constrained by maintainers’ ability to prioritize, test, and refactor.

One of the ways Socket has tried to address this is through Certified Patches, which consist of direct changes to existing dependencies as an alternative to updating dependencies to a patched version, which risks introducing problems due to library version compatibility conflicts.

“We are approaching a point where disclosures will soon outpace remediation capacity,” he said. “The competitive advantage will not belong to whoever can generate the most findings. It will belong to whoever can convert findings into safe, prioritized, low-disruption change.” ®

Source

AI has gotten good at finding bugs, not so good at swatting them

Sofia

Geneva