Autonomous artificial intelligence-powered software testing tool TestSprite Inc. today announced that the company has open-sourced its command-line interface tool that allows AI coding agents to verify their own work.
As the AI coding revolution has rolled in, autonomous coding tools have become smarter and enabled developers to prompt their way to entire applications overnight. The result is faster code, but at the same time, it means that the software can come off the digital assembly line with unseen bugs that may not be caught by unit tests run by agentic tools.
In too many cases, an AI agent might report a feature complete, but some of the tests failed, weren’t written correctly, were incomplete or were simply skipped. Other times a coding agent might write a function that appears to run on the surface but has a hidden bug that only triggers in an edge case that a customer will run into in particular circumstances (even 1 in 1,000 is too often) – or, in the worst-case scenario, it breaks some other part of the codebase altogether.
“That’s exactly what’s driving developers crazy,” said founder and Chief Executive Yunhao Jiao. “You use AI, you ship something new, you fix one thing and then boom, another thing crashes. Even the best agent in our competition broke 12% of the features that already worked. That’s the gap a verifier closes.”
TestSprite said today’s release provides a command-line interface, a space on the terminal, that gives coding agents a real quality assurance loop, not a spot check.
The coding agent describes a behavior once. TestSprite then runs it in the cloud the way a real user might, driving a live browser or hitting a live application programming interface, never using mock protocols. It then returns a single, self-consistent failure mode: the failing step and its neighbors, screenshots, a Document Object Model manifest, the test source, a root cause hypothesis and a recommended fix.
The AI coding agent can then read the data, fix the code and rerun.
This becomes the test loop. Every time the agent runs a phase of work, TestSprite adds dozens of new tests, so coverage grows alongside the codebase. This provides a safety net that controls for potential gaps and can capture eventualities as the application complexity changes shape without getting tangled up.
The TestSprite CLI is open source under the Apache 2.0 license and available today. Installation is simple using “npm install -g @testsprite/cli” for machines with Node.js 2.0 or higher. Documentation and reference are available on GitHub.
CoderCup: Publicly refereed AI agent coding battle
In addition to the CLI open-source announcement, TestSprite launched CoderCup, a public competition and leaderboard in which AI agents built and deployed the same app under one clock.
The company used its newly open-sourced CLI as a neutral referee, mimicking the World Cup soccer, which also had its kickoff today. The test agent acted to score each phase and linked each score to public evidence supporting it.
In the first event, several frontier agents went head-to-head, including Anthropic PBC’s Claude Code, OpenAI Group PBC’s Codex, and Google LLC’s Antigravity with TestSprite publishing the full results and per-phase scores openly at codercup.ai.
“Most benchmarks score AI coding agents on a single number, but that’s not what developers actually feel,” Jiao said. “What matters day to day is stuff no leaderboard captures.”
Those metrics include things such as what agents get right the first time, how often they break on something that used to work, and whether they can recover on their own.
For the most part, many of the frontier players took to the field and dazzled with strengths and weaknesses. Claude Code rallied on consistency, whereas Codex and Antigravity were the quickest overall, ranking in cumulative minutes under 100.
Beijing Moonshot AI Technology Co. Ltd.’s Kimi strolled in the opposite direction: slowest on clock, at around 350 minutes; but that slow roll paid off. While being smaller and cheaper, Kimi posted the highest correctness in the field at 0.89 and the lowest total cost, outclassing agents many times its size.
Agents that ran the fastest were rarely the ones that made the grade. Every agent, even the most stalwart, kept breaking work it had already completed.
“We built CoderCup to make those things visible. The soccer faceoff is the fun part; the metrics underneath are the real point,” Jiao added.
Image: SiliconANGLE/Microsoft Designer
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
