Close Menu

    Subscribe to Updates

    Get the latest creative news from infofortech

    What's Hot

    Instagram Users Urged to Save Encrypted DMs Before Feature Disappears

    March 17, 2026

    File Your Taxes With TurboTax Full Service Now Before Prices Go Up

    March 17, 2026

    Death by Tariffs: Volvo Discontinuing Entry-Level EX30 EV in the US

    March 16, 2026
    Facebook X (Twitter) Instagram
    InfoForTech
    • Home
    • Latest in Tech
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    Facebook X (Twitter) Instagram
    InfoForTech
    Home»Artificial Intelligence»Scientists found the key to controlling AI behavior
    Artificial Intelligence

    Scientists found the key to controlling AI behavior

    InfoForTechBy InfoForTechFebruary 20, 2026No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Scientists found the key to controlling AI behavior
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email


    For years, the inner workings of large language models (LLMs) like Llama and Claude have been compared to a “black box” – vast, complex, and notoriously difficult to steer. But a team of researchers from UC San Diego and MIT has just published a study in the Science Journal that suggests this box isn’t quite as mysterious as we thought.

    The team has discovered that complex concepts within AI – ranging from specific languages like Hindi to abstract ideas like conspiracy theories – are actually stored as simple, straight lines, or vectors, within the model’s mathematical space.

    By using a new tool called the Recursive Feature Machine (RFM) – a feature extraction technique that identifies linear patterns representing concepts, from moods and fears to complex reasoning – the researchers were able to trace these paths precisely. Once a concept’s direction is mapped, it can be “nudged”. By mathematically adding or subtracting these vectors, the team could instantly alter a model’s behavior without expensive retraining or complicated prompts.

    The efficiency of this method is what has the industry buzzing. Using just a single standard GPU (the NVIDIA A100), the team could identify and steer a concept in less than one minute, requiring fewer than 500 training samples.

    The practical applications of this “surgical” approach to AI are immediate. In one experiment, researchers steered a model to improve its ability to translate Python code into C++. By isolating the “logic” of the code from the “syntax” of the language, the steered model outperformed standard versions that were simply asked to “translate” via a text prompt.

    The researchers also found that internal “probing” of these vectors is a more effective way to catch AI hallucinations or toxic content than asking the AI to judge its own work. Essentially, the model often “knows” it is lying or being toxic internally, even if its final output suggests otherwise. By looking at the internal math, researchers can spot these issues before a single word is generated.

    However, the same technology that makes AI safer could also make it more dangerous. The study demonstrated that by “decreasing” the importance of the concept of refusal, the researchers could effectively “jailbreak” the models. In tests, steered models bypassed their own guardrails to provide instructions on illegal activities or promote debunked conspiracy theories.

    Perhaps the most surprising finding was the universality of these concepts. A “conspiracy theorist” vector extracted from English data worked just as effectively when the model was speaking Chinese or Hindi. This supports the “Linear Representation Hypothesis” – the idea that AI models organize human knowledge in a structured, linear way that transcends individual languages.

    While the study focused on open-source models like Meta’s Llama and DeepSeek, as well as OpenAI’s GPT-4o, the researchers believe the findings apply across the board. As models get larger and more sophisticated, they actually become more steerable, not less.

    The team’s next goal is to refine these steering methods to adapt to specific user inputs in real-time, potentially leading to a future where AI isn’t just a chatbot we talk to, but a system we can mathematically “tune” for perfect accuracy and safety.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    InfoForTech
    • Website

    Related Posts

    Clarifai Reasoning Engine Achieves 414 Tokens Per Second on Kimi K2.5

    March 16, 2026

    Influencer Marketing in Numbers: Key Stats

    March 16, 2026

    Tremble Chatbot App Access, Costs, and Feature Insights

    March 15, 2026

    U.S. Holds Off on New AI Chip Export Rules in Surprise Move in Tech Export Wars

    March 14, 2026

    How Joseph Paradiso’s sensing innovations bridge the arts, medicine, and ecology | MIT News

    March 14, 2026

    A better method for planning complex visual tasks | MIT News

    March 14, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    How a Chinese AI Firm Quietly Pulled Off a Hardware Power Move

    January 15, 20268 Views

    The World’s Heart Beats in Bytes — Why Europe Needs Better Tech Cardio

    January 15, 20265 Views

    HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

    February 2, 20264 Views

    Rising Digital Financial Fraud in South Africa

    January 15, 20264 Views
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Advertisement
    About Us
    About Us

    Our mission is to deliver clear, reliable, and up-to-date information about the technologies shaping the modern world. We focus on breaking down complex topics into easy-to-understand insights for professionals, enthusiasts, and everyday readers alike.

    We're accepting new partnerships right now.

    Facebook X (Twitter) YouTube
    Most Popular

    How a Chinese AI Firm Quietly Pulled Off a Hardware Power Move

    January 15, 20268 Views

    The World’s Heart Beats in Bytes — Why Europe Needs Better Tech Cardio

    January 15, 20265 Views

    HHS Is Using AI Tools From Palantir to Target ‘DEI’ and ‘Gender Ideology’ in Grants

    February 2, 20264 Views
    Categories
    • Artificial Intelligence
    • Cybersecurity
    • Innovation
    • Latest in Tech
    © 2026 All Rights Reserved InfoForTech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.