Open in App
  • U.S.
  • Election
  • Newsletter
  • TechRadar

    ChatGPT won't let you give it instruction amnesia anymore

    By Eric Hal Schwartz,

    6 hours ago

    https://img.particlenews.com/image.php?url=33Qlcr_0udi29Td00

    OpenAI is making a change to stop people from messing with custom versions of ChatGPT by making the AI forget what it's supposed to do. Basically, when a third party uses one of OpenAI's models, they give it instructions that teach it to operate as, for example, a customer service agent for a store or a researcher for an academic publication. However, a user could mess with the chatbot by telling it to "forget all instructions," and that phrase would induce a kind of digital amnesia and reset the chatbot to a generic blank.

    To prevent this, OpenAI researchers created a new technique called "instruction hierarchy," which is a way to prioritize the developer's original prompts and instructions over any potentially manipulative user-created prompts. The system instructions have the highest privilege and can't be erased so easily anymore. If a user enters a prompt that attempts to misalign the AI's behavior, it will be rejected, and the AI responds by stating that it cannot assist with the query.

    OpenAI is rolling out this safety measure to its models, starting with the recently released GPT-4o Mini model . However, should these initial tests work well, it will presumably be incorporated across all of OpenAI's models. GPT-4o Mini is designed to offer enhanced performance while maintaining strict adherence to the developer's original instructions.

    AI Safety Locks

    As OpenAI continues to encourage large-scale deployment of its models, these kinds of safety measures are crucial. It's all too easy to imagine the potential risks when users can fundamentally alter the AI's controls that way.

    Not only would it make the chatbot ineffective, it could remove rules preventing the leak of sensitive information and other data that could be exploited for malicious purposes. By reinforcing the model's adherence to system instructions, OpenAI aims to mitigate these risks and ensure safer interactions.

    The introduction of instruction hierarchy comes at a crucial time for OpenAI regarding concerns about how it approaches safety and transparency. Current and former employees have called for improving the company's safety practices, and OpenAI's leadership has responded by pledging to do so. The company has acknowledged that the complexities of fully automated agents require sophisticated guardrails in future models, and the instruction hierarchy setup seems like a step on the road to achieving better safety.

    These kinds of jailbreaks show how much work still needs to be done to protect complex AI models from bad actors. And it's hardly the only example. Several users discovered that ChatGPT would share its internal instructions by simply saying "hi."

    OpenAI plugged that gap, but it's probably only a matter of time before more are discovered. Any solution will need to be much more adaptive and flexible than one that simply halts a particular kind of hacking.

    You might also like...

    Expand All
    Comments / 0
    Add a Comment
    YOU MAY ALSO LIKE
    Most Popular newsMost Popular
    Business Insider24 days ago
    graziamagazine.com21 hours ago
    Total Apex Sports & Entertainment29 days ago
    Total Apex Sports & Entertainment27 days ago

    Comments / 0