Blog / Others/ We Hided Backdoors in 40MB Binaries and Challenged AI to Find Them

We Hided Backdoors in 40MB Binaries and Challenged AI to Find Them

We Hided Backdoors in 40MB Binaries and Challenged AI to Find Them

Can AI Detect Malware in Binary Executables?

In a groundbreaking experiment, Quesma partnered with reverse engineering experts to test whether modern AI agents can detect hidden backdoors in binary executables—without access to source code.

The Experiment

The team, led by Michał "Redford" Kowalczyk from Dragon Sector (famous for finding malicious code in Polish trains), created a benchmark called BinaryAudit. They:

  • Took open-source projects (lighttpd, dnsmasq, Dropbear, Sozu)
  • Manually injected hidden backdoors
  • Stripped symbols and compiled to binaries
  • Asked AI agents to detect the backdoors

Surprising Results

The results were unexpected:

  • 49% success rate - Best model (Claude Opus 4.6) found obvious backdoors in small/mid-size binaries
  • High false positives - Most models flagged clean binaries as malicious
  • Not production-ready - Current AI lacks reliable binary analysis capabilities

Why This Matters

This research comes at a critical time. Recent security incidents show the urgency:

  • Shai Hulud 2.0 - Supply chain attack compromising thousands of organizations
  • Notepad++ hijack - State-sponsored actors replaced legitimate binaries
  • Chinese solar inverters - Hidden radios found in critical infrastructure
  • Firmware backdoors - Network routers with hidden admin passwords

What This Means for Developers

While AI shows promise in binary analysis, it's not ready to replace human experts. The gap between source code and binary analysis is vast—compilers optimize code, strip meaningful names, and reorder instructions.

However, as AI models improve, they could become valuable tools for security researchers, helping to:

  • Prioritize which binaries to analyze manually
  • Identify obvious patterns in malicious code
  • Assist in reverse engineering workflows

The benchmark is open source at quesmaOrg/BinaryAudit.

Source

This article is based on the original research from Quesma Blog (via Hacker News).

Post a Comment

Your email will not be published. Required fields are marked with *.