Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

CyberGym benchmark scores over time, showing the rapid improvement in AI vulnerability discovery capabilities. Microsoft’s multi-model MDASH system (top right) tops the leaderboard at 88.4%. (CyberGym / UC Berkeley)

Mythos has been MDASH’d.

A new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a leading cybersecurity benchmark, using more than 100 specialized AI agents working together across multiple AI models to find real-world software vulnerabilities.

Microsoft’s system, codenamed MDASH, was introduced this week alongside the disclosure of 16 new vulnerabilities it found in different versions of Windows, including four “critical” remote code execution flaws fixed in this month’s Patch Tuesday release.

The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can’t match.

MDASH, derived from the term “multi-model agentic scanning harness,” works by running specialized AI agents through a staged pipeline. Different agents scan code for potential vulnerabilities, then a separate set of agents debate whether each finding is real and exploitable, and a final stage constructs proof-of-concept attacks to confirm the bugs exist.

By comparison, Anthropic’s Mythos, which raised concerns over its ability to find and exploit software vulnerabilities when it was previewed earlier this year, is a single AI model running inside an agent framework. Anthropic restricted its release to a handful of companies through a consortium called Project Glasswing, which includes Microsoft.

OpenAI’s GPT-5.5 and others on the leaderboard are also single-model systems.

MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects.

Mythos Preview was second at 83.1%, followed by GPT-5.5 at 81.8%.

The benchmark gives each system a description of a known vulnerability and an unpatched codebase, and measures whether it can produce a working attack that triggers the bug.

The scores on the CyberGym leaderboard are self-reported by the companies, including Anthropic’s Mythos result. The benchmark code is public, but no independent party has verified any of the scores. Also, benchmark results don’t necessarily reflect real-world performance.

The results also highlight growing concerns about AI’s use as an offensive hacking tool. The same capabilities that allow AI to find vulnerabilities in friendly hands can be used to discover them for exploitation by attackers. Microsoft said MDASH is being used internally by its security engineering teams and will be entering a limited private preview with customers.

Microsoft is telling customers to expect bigger Patch Tuesdays going forward as AI accelerates the discovery of vulnerabilities.

What's Hot

The software supply chain is the new ground zero for enterprise cyber risk. Don’t get caught short

How Hybrid Work and Cloud Are Changing Ransomware Risk

Orbitkey Grid Desk Organiser Lets You Build Your Own Layout

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Orbitkey Grid Desk Organiser Lets You Build Your Own Layout

Are we on a Road to Nowhere? Seattle’s growth masks deeper anxieties about its future

Xbox Elite Controller 3 Leaked By Brazilian Regulator

GTA 6: Preorder Rumors, Release Date, Pricing, Locations and More

Chinese brands are bringing their brutal F&B wars to S’pore

KitchenAid Launches Its First Smart Thermometer

DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

Microsoft is bringing an AI helper to Xbox consoles

We’re Tracking Streaming Price Hikes in 2026: Spotify, Paramount Plus, Crunchyroll and Others

This is the tech that makes Volvo’s latest EV a major step forward

Most Popular

DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

Microsoft is bringing an AI helper to Xbox consoles

We’re Tracking Streaming Price Hikes in 2026: Spotify, Paramount Plus, Crunchyroll and Others

Subscribe to Updates

What's Hot

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark

Related Posts