Pay attention to Google’s latest Threat Intelligence Group report. It documents what appears to be the first case of a threat actor using AI to create a zero-day exploit that reached the operational stage.
The target was a logic error in a popular open-source web administration tool. The bug involved a hardcoded trust assumption that allowed 2FA to be bypassed if the attacker already had valid credentials.
Traditional scanners miss these kinds of issues because the code executes exactly as written. The problem sits in the gap between the code and the security assumption it was meant to enforce. Frontier models seem particularly good at spotting these because they can reason about what the code is trying to achieve.
The AI origin showed in several tells. The Python script followed textbook conventions instead of the usual messy style. It included detailed docstrings explaining the code as if teaching a student. Most telling was the inclusion of a CVSS score for a vulnerability that had never been publicly reported.
Google caught the campaign in its preparation phase and worked with the vendor to address it.
The report also covers new malware that brings AI directly into execution. PROMPTSPY is an Android backdoor that captures the current UI state sends it to Gemini and performs the gestures the model recommends. This creates a loop where the AI effectively operates the phone.
It includes defenses against removal including overlaying protection on the uninstall option and mechanisms to restart itself using Firebase notifications.
Similar techniques appear in state activity. Groups tied to North Korea are using AI at scale to review old vulnerabilities and produce exploits. Actors linked to China employ specific role prompts to get around model refusals when analyzing firmware. Russian operations use LLMs to create volumes of believable code that buries the actual malicious functions.
Google has its own entries in this space. Big Sleep from DeepMind and Project Zero has demonstrated the ability to surface vulnerabilities that standard fuzzers overlook including one that was about to be used in an attack.
The pattern across the report is consistent. AI is taking larger pieces of the attack chain from discovery to obfuscation to autonomous operation. This does not remove humans from the picture. Target selection campaign management and profiting from access still require people.
What changes is the scale and speed that becomes possible. A group can test far more ideas with less effort per idea.
For those responsible for security the implications point to specific actions. Review all authentication paths for hidden assumptions that an LLM might notice. Restrict and monitor Accessibility Services on managed Android devices. Pay close attention to any part of your development process that touches untrusted AI outputs or dependencies.
The report avoids grand declarations about the end of human hacking. Instead it shows a practical progression where AI becomes a standard tool on both sides. The organizations that treat this as a signal to update their review processes and monitoring will maintain their position.

