Close Menu

    Subscribe to Updates

    Get the latest creative news from infofortech

    What's Hot

    Boox’s new Go E Ink tablet includes a 10-inch display and runs Android 15

    March 17, 2026

    Instagram Users Urged to Save Encrypted DMs Before Feature Disappears

    March 17, 2026

    File Your Taxes With TurboTax Full Service Now Before Prices Go Up

    March 17, 2026
    Facebook X (Twitter) Instagram
    InfoForTech
    • Home
    • Latest in Tech
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    Facebook X (Twitter) Instagram
    InfoForTech
    Home»Cybersecurity»Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Cybersecurity

    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

    InfoForTechBy InfoForTechFebruary 4, 2026No Comments4 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email


    Ravie LakshmananFeb 04, 2026Artificial Intelligence / Software Security

    Microsoft on Wednesday said it built a lightweight scanner that it said can detect backdoors in open-weight large language models (LLMs) and improve the overall trust in artificial intelligence (AI) systems.

    The tech giant’s AI Security team said the scanner leverages three observable signals that can be used to reliably flag the presence of backdoors while maintaining a low false positive rate.

    “These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust and operationally meaningful basis for detection,” Blake Bullwinkel and Giorgio Severi said in a report shared with The Hacker News.

    LLMs can be susceptible to two types of tampering: model weights, which refer to learnable parameters within a machine learning model that undergird the decision-making logic and transform input data into predicted outputs, and the code itself.

    Another type of attack is model poisoning, which occurs when a threat actor embeds a hidden behavior directly into the model’s weights during training, causing the model to perform unintended actions when certain triggers are detected. Such backdoored models are sleeper agents, as they stay dormant for the most part, and their rogue behavior only becomes apparent upon detecting the trigger.

    This turns model poisoning into some sort of a covert attack where a model can appear normal in most situations, yet respond differently under narrowly defined trigger conditions. Microsoft’s study has identified three practical signals that can indicate a poisoned AI model –

    • Given a prompt containing a trigger phrase, poisoned models exhibit a distinctive “double triangle” attention pattern that causes the model to focus on the trigger in isolation, as well as dramatically collapse the “randomness” of model’s output
    • Backdoored models tend to leak their own poisoning data, including triggers, via memorization rather than training data
    • A backdoor inserted into a model can still be activated by multiple “fuzzy” triggers, which are partial or approximate variations

    “Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques,” Microsoft said in an accompanying paper. “Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

    These three indicators, Microsoft said, can be used to scan models at scale to identify the presence of embedded backdoors. What makes this backdoor scanning methodology noteworthy is that it requires no additional model training or prior knowledge of the backdoor behavior, and works across common GPT‑style models.

    “The scanner we developed first extracts memorized content from the model and then analyzes it to isolate salient substrings,” the company added. “Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates.”

    The scanner is not without its limitations. It does not work on proprietary models as it requires access to the model files, works best on trigger-based backdoors that generate deterministic outputs, and cannot be treated as a panacea for detecting all kinds of backdoor behavior.

    “We view this work as a meaningful step toward practical, deployable backdoor detection, and we recognize that sustained progress depends on shared learning and collaboration across the AI security community,” the researchers said.

    The development comes as the Windows maker said it’s expanding its Secure Development Lifecycle (SDL) to address AI-specific security concerns ranging from prompt injections to data poisoning to facilitate secure AI development and deployment across the organization.

    “Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs, including prompts, plugins, retrieved data, model updates, memory states, and external APIs,” Yonatan Zunger, corporate vice president and deputy chief information security officer for artificial intelligence, said. “These entry points can carry malicious content or trigger unexpected behaviors.”

    “AI dissolves the discrete trust zones assumed by traditional SDL. Context boundaries flatten, making it difficult to enforce purpose limitation and sensitivity labels.”

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    InfoForTech
    • Website

    Related Posts

    Instagram Users Urged to Save Encrypted DMs Before Feature Disappears

    March 17, 2026

    Why Security Validation Is Becoming Agentic

    March 16, 2026

    Meta to Shut Down Instagram End-to-End Encrypted Chat Support Starting May 2026

    March 15, 2026

    OpenClaw AI Agent Flaws Could Enable Prompt Injection and Data Exfiltration

    March 15, 2026

    GlassWorm Supply-Chain Attack Abuses 72 Open VSX Extensions to Target Developers

    March 14, 2026

    Chinese Hackers Target Southeast Asian Militaries with AppleChris and MemFun Malware

    March 13, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    How a Chinese AI Firm Quietly Pulled Off a Hardware Power Move

    January 15, 20268 Views

    The World’s Heart Beats in Bytes — Why Europe Needs Better Tech Cardio

    January 15, 20265 Views

    HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

    February 2, 20264 Views

    Rising Digital Financial Fraud in South Africa

    January 15, 20264 Views
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Advertisement
    About Us
    About Us

    Our mission is to deliver clear, reliable, and up-to-date information about the technologies shaping the modern world. We focus on breaking down complex topics into easy-to-understand insights for professionals, enthusiasts, and everyday readers alike.

    We're accepting new partnerships right now.

    Facebook X (Twitter) YouTube
    Most Popular

    How a Chinese AI Firm Quietly Pulled Off a Hardware Power Move

    January 15, 20268 Views

    The World’s Heart Beats in Bytes — Why Europe Needs Better Tech Cardio

    January 15, 20265 Views

    HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

    February 2, 20264 Views
    Categories
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    • Latest in Tech
    © 2026 All Rights Reserved InfoForTech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.