Zero Day Support at 400 Tokens Per Second

Blog thumbnail - NemotronTM 3 Nano Omni day 0

We’re excited to announce day-0 support for NVIDIA Nemotron 3 Nano Omni on Clarifai. Available now on Clarifai Reasoning Engine, Nano Omni brings fast multimodal reasoning to developers building agentic systems, delivering throughput of 400+ tokens per second.

NVIDIA Nemotron 3 Nano Omni is a 30B A3B multimodal reasoning model built for workloads that span documents, images, video, and audio. With a 256K context window and support for text, image, video, and audio inputs with text output, it gives developers a single model for handling rich multimodal context inside agentic workflows.

That makes it a strong fit for sub-agents in workflows where multimodal understanding and speed need to go together.

A Multimodal Model for Specialized Sub-Agents

As agent systems grow more capable, they also become more specialized. Different models and components take on planning, execution, retrieval, and verification, each operating within a broader workflow. In that architecture, the model handling multimodal inputs has to do more than process isolated inputs. It has to interpret multiple modalities together, preserve context across steps, and respond fast enough to stay within the operational loop.

As a lightweight multimodal model for sub-agents, Nemotron 3 Nano Omni can reason across screens, documents, charts, audio, and video without routing each modality through a separate stack. Rather than splitting vision, speech, and language across multiple models, it gives developers a more unified way to handle multimodal reasoning while keeping the overall system easier to manage.

Built for Computer Use, Documents, and Audio-Video Reasoning

Nano Omni is especially relevant for the kinds of workloads that are becoming central to enterprise agentic systems.

For computer use, agents need to read interfaces, track UI state over time, and verify whether actions completed as expected. For document intelligence, they need to reason across text, tables, charts, screenshots, scanned pages, and mixed visual structure in the same pass. For audio and video workflows, they need to connect what was said, what was shown, and what changed over time.

These are all cases where multimodal capability has to work reliably in production, with a model that can handle multiple modalities efficiently without splitting the workflow across separate models.

The model represents a significant jump in capability from previous models in the Nemotron family. Significant improvement in benchmarks like OCRBenchV2, OCR_Reasoning, MathVista_MINI and OSWorld reflect the model’s improved performance for the real world workloads today’s agents are likely to serve.

MULTIMODAL ACCURACY - nemotron

That is where Nano Omni fits naturally, giving developers a single multimodal reasoning stream for the tasks sub-agents are increasingly expected to handle.

Agent-Friendly Tokenomics

In agent systems, sub-agents take on recurring tasks across documents, screens, audio, and video within a larger workflow. Each invocation adds to the cost, throughput, and infrastructure demands of the overall system. NVIDIA Nemotron 3 Nano Omni consolidates vision, speech, and language into a single multimodal model, reducing inference hops, orchestration logic, and cross-model synchronization compared with separate perception stacks.

Nano Omni delivers roughly 2x higher throughput on average, along with about 2.5x lower compute for video reasoning through temporal-aware perception and efficient video sampling. For multimodal agent workflows, that means higher throughput and lower compute overhead without adding complexity to the stack.

The model uses a hybrid Mixture-of-Experts architecture with a Transformer-Mamba design, along with 3D convolution layers and Efficient Video Sampling for temporal and video inputs. It can run on a single H100, H200, or B200, making it practical to deploy multimodal sub-agents without stretching infrastructure requirements.

High-Throughput Inference on Clarifai

On Clarifai Reasoning Engine, NVIDIA Nemotron 3 Nano Omni runs at 400+ tokens per second, giving developers the throughput needed for production multimodal agent workflows. That matters in systems where sub-agents are called repeatedly to process documents, interfaces, audio, and video as part of an ongoing workflow.

Clarifai Reasoning Engine is built for inference acceleration by combining optimized kernels, speculative decoding and adaptive performance techniques to improve throughput for reasoning models without compromising accuracy.

Getting Started on Clarifai

Developers can try NVIDIA Nemotron 3 Nano Omni in the Clarifai Playground and can also access it via an OpenAI-compatible API, making it easier to integrate into existing applications, tools, and agentic frameworks.

For larger-scale or more controlled deployments, Clarifai provides a direct path to production with Compute Orchestration. Developers can run Nano Omni on Clarifai Reasoning Engine or deploy it across their own cloud, VPC, on-prem or air-gapped environments while managing deployments through a unified control plane.

NVIDIA Nemotron 3 Nano Omni is available on Clarifai today.

If you have any questions about accessing NVIDIA Nemotron 3 Nano Omni on Clarifai, join our Discord.

What's Hot

Anthropic, Amazon, and the Fable shutdown; AI-powered school arrives; World Cup tech

Windows 11’s modern Media Player is somehow worse than the version from 17 years ago

Apple Patches Beats Studio Buds Wiretap Flaw

Zero Day Support at 400 Tokens Per Second

A better way to model the behavior of metal alloys | MIT News

This Is What B2B Marketers Need to Know About the Future of Work

MIT in the media: For the future of tech, “Massachusetts can absolutely lead” | MIT News

In game theory, generalists sometimes win out over specialists | MIT News

The Best EDB to PST Conversion

Could AI tell you where you left your keys? | MIT News

DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

Microsoft is bringing an AI helper to Xbox consoles

This is the tech that makes Volvo’s latest EV a major step forward

Why Security Validation Is Becoming Agentic

Most Popular

DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

Microsoft is bringing an AI helper to Xbox consoles

This is the tech that makes Volvo’s latest EV a major step forward

Subscribe to Updates

What's Hot

Zero Day Support at 400 Tokens Per Second

A Multimodal Model for Specialized Sub-Agents

Built for Computer Use, Documents, and Audio-Video Reasoning

Agent-Friendly Tokenomics

High-Throughput Inference on Clarifai

Getting Started on Clarifai

Related Posts