Can AI Detect Malware in Binary Executables?
In a groundbreaking experiment, Quesma partnered with reverse engineering experts to test whether modern AI agents can detect hidden backdoors in binary executables—without access to source code.
The Experiment
The team, led by Michał "Redford" Kowalczyk from Dragon Sector (famous for finding malicious code in Polish trains), created a benchmark called BinaryAudit. They:
- Took open-source projects (lighttpd, dnsmasq, Dropbear, Sozu)
- Manually injected hidden backdoors
- Stripped symbols and compiled to binaries
- Asked AI agents to detect the backdoors
Surprising Results
The results were unexpected:
- 49% success rate - Best model (Claude Opus 4.6) found obvious backdoors in small/mid-size binaries
- High false positives - Most models flagged clean binaries as malicious
- Not production-ready - Current AI lacks reliable binary analysis capabilities
Why This Matters
This research comes at a critical time. Recent security incidents show the urgency:
- Shai Hulud 2.0 - Supply chain attack compromising thousands of organizations
- Notepad++ hijack - State-sponsored actors replaced legitimate binaries
- Chinese solar inverters - Hidden radios found in critical infrastructure
- Firmware backdoors - Network routers with hidden admin passwords
What This Means for Developers
While AI shows promise in binary analysis, it's not ready to replace human experts. The gap between source code and binary analysis is vast—compilers optimize code, strip meaningful names, and reorder instructions.
However, as AI models improve, they could become valuable tools for security researchers, helping to:
- Prioritize which binaries to analyze manually
- Identify obvious patterns in malicious code
- Assist in reverse engineering workflows
The benchmark is open source at quesmaOrg/BinaryAudit.
Source
This article is based on the original research from Quesma Blog (via Hacker News).