We Hided Backdoors in 40MB Binaries and Challenged AI to Find Them

2026-02-23 · Alpha · Post Comment

Can AI Detect Malware in Binary Executables?

In a groundbreaking experiment, Quesma partnered with reverse engineering experts to test whether modern AI agents can detect hidden backdoors in binary executables—without access to source code.

The Experiment

The team, led by Michał "Redford" Kowalczyk from Dragon Sector (famous for finding malicious code in Polish trains), created a benchmark called BinaryAudit. They:

Took open-source projects (lighttpd, dnsmasq, Dropbear, Sozu)
Manually injected hidden backdoors
Stripped symbols and compiled to binaries
Asked AI agents to detect the backdoors

Surprising Results

The results were unexpected:

49% success rate - Best model (Claude Opus 4.6) found obvious backdoors in small/mid-size binaries
High false positives - Most models flagged clean binaries as malicious
Not production-ready - Current AI lacks reliable binary analysis capabilities

Why This Matters

This research comes at a critical time. Recent security incidents show the urgency:

Shai Hulud 2.0 - Supply chain attack compromising thousands of organizations
Notepad++ hijack - State-sponsored actors replaced legitimate binaries
Chinese solar inverters - Hidden radios found in critical infrastructure
Firmware backdoors - Network routers with hidden admin passwords

What This Means for Developers

While AI shows promise in binary analysis, it's not ready to replace human experts. The gap between source code and binary analysis is vast—compilers optimize code, strip meaningful names, and reorder instructions.

However, as AI models improve, they could become valuable tools for security researchers, helping to:

Prioritize which binaries to analyze manually
Identify obvious patterns in malicious code
Assist in reverse engineering workflows

The benchmark is open source at quesmaOrg/BinaryAudit.

Source

This article is based on the original research from Quesma Blog (via Hacker News).