Close Menu

    Subscribe to Updates

    Get the latest creative news from infofortech

    What's Hot

    Anthropic owes authors $1.5B — but the claims process is a mess

    May 7, 2026

    Mirai-Based xlabs_v1 Botnet Exploits ADB to Hijack IoT Devices for DDoS Attacks

    May 6, 2026

    How Predictive Demand Generation Leverages Data Signals

    May 6, 2026
    Facebook X (Twitter) Instagram
    InfoForTech
    • Home
    • Latest in Tech
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    Facebook X (Twitter) Instagram
    InfoForTech
    Home»Cybersecurity»Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Cybersecurity

    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

    InfoForTechBy InfoForTechFebruary 4, 2026No Comments4 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email


    Ravie LakshmananFeb 04, 2026Artificial Intelligence / Software Security

    Microsoft on Wednesday said it built a lightweight scanner that it said can detect backdoors in open-weight large language models (LLMs) and improve the overall trust in artificial intelligence (AI) systems.

    The tech giant’s AI Security team said the scanner leverages three observable signals that can be used to reliably flag the presence of backdoors while maintaining a low false positive rate.

    “These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust and operationally meaningful basis for detection,” Blake Bullwinkel and Giorgio Severi said in a report shared with The Hacker News.

    LLMs can be susceptible to two types of tampering: model weights, which refer to learnable parameters within a machine learning model that undergird the decision-making logic and transform input data into predicted outputs, and the code itself.

    Another type of attack is model poisoning, which occurs when a threat actor embeds a hidden behavior directly into the model’s weights during training, causing the model to perform unintended actions when certain triggers are detected. Such backdoored models are sleeper agents, as they stay dormant for the most part, and their rogue behavior only becomes apparent upon detecting the trigger.

    This turns model poisoning into some sort of a covert attack where a model can appear normal in most situations, yet respond differently under narrowly defined trigger conditions. Microsoft’s study has identified three practical signals that can indicate a poisoned AI model –

    • Given a prompt containing a trigger phrase, poisoned models exhibit a distinctive “double triangle” attention pattern that causes the model to focus on the trigger in isolation, as well as dramatically collapse the “randomness” of model’s output
    • Backdoored models tend to leak their own poisoning data, including triggers, via memorization rather than training data
    • A backdoor inserted into a model can still be activated by multiple “fuzzy” triggers, which are partial or approximate variations

    “Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques,” Microsoft said in an accompanying paper. “Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

    These three indicators, Microsoft said, can be used to scan models at scale to identify the presence of embedded backdoors. What makes this backdoor scanning methodology noteworthy is that it requires no additional model training or prior knowledge of the backdoor behavior, and works across common GPT‑style models.

    “The scanner we developed first extracts memorized content from the model and then analyzes it to isolate salient substrings,” the company added. “Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates.”

    The scanner is not without its limitations. It does not work on proprietary models as it requires access to the model files, works best on trigger-based backdoors that generate deterministic outputs, and cannot be treated as a panacea for detecting all kinds of backdoor behavior.

    “We view this work as a meaningful step toward practical, deployable backdoor detection, and we recognize that sustained progress depends on shared learning and collaboration across the AI security community,” the researchers said.

    The development comes as the Windows maker said it’s expanding its Secure Development Lifecycle (SDL) to address AI-specific security concerns ranging from prompt injections to data poisoning to facilitate secure AI development and deployment across the organization.

    “Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs, including prompts, plugins, retrieved data, model updates, memory states, and external APIs,” Yonatan Zunger, corporate vice president and deputy chief information security officer for artificial intelligence, said. “These entry points can carry malicious content or trigger unexpected behaviors.”

    “AI dissolves the discrete trust zones assumed by traditional SDL. Context boundaries flatten, making it difficult to enforce purpose limitation and sensitivity labels.”

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    InfoForTech
    • Website

    Related Posts

    Mirai-Based xlabs_v1 Botnet Exploits ADB to Hijack IoT Devices for DDoS Attacks

    May 6, 2026

    Troy Hunt: Weekly Update 502

    May 6, 2026

    Palo Alto PAN-OS Flaw Under Active Exploitation Enables Remote Code Execution

    May 6, 2026

    Critical Apache HTTP/2 Flaw (CVE-2026-23918) Enables DoS and Potential RCE

    May 5, 2026

    CI/CD Pipeline Security Tools, Standards, and Best Practices

    May 5, 2026

    Phishing Campaign Hits 80+ Orgs Using SimpleHelp and ScreenConnect RMM Tools

    May 5, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

    March 20, 202638 Views

    Microsoft is bringing an AI helper to Xbox consoles

    March 14, 202615 Views

    We’re Tracking Streaming Price Hikes in 2026: Spotify, Paramount Plus, Crunchyroll and Others

    February 15, 202615 Views

    This is the tech that makes Volvo’s latest EV a major step forward

    January 24, 202615 Views
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Advertisement
    About Us
    About Us

    Our mission is to deliver clear, reliable, and up-to-date information about the technologies shaping the modern world. We focus on breaking down complex topics into easy-to-understand insights for professionals, enthusiasts, and everyday readers alike.

    We're accepting new partnerships right now.

    Facebook X (Twitter) YouTube
    Most Popular

    DoJ Disrupts 3 Million-Device IoT Botnets Behind Record 31.4 Tbps Global DDoS Attacks

    March 20, 202638 Views

    Microsoft is bringing an AI helper to Xbox consoles

    March 14, 202615 Views

    We’re Tracking Streaming Price Hikes in 2026: Spotify, Paramount Plus, Crunchyroll and Others

    February 15, 202615 Views
    Categories
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    • Latest in Tech
    © 2026 All Rights Reserved InfoForTech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.