ITPro

Anthropic wants to demystify the inner workings of its Claude AI models – and it might force OpenAI’s hand on transparency

By Solomon Klappholz,

11 days ago

AI startup Anthropic has elected to publish the system prompts for its flagship Claude large language model as part of a new effort to improve transparency in the private model ecosystem.

System prompts comprise a set of rules or instructions that dictate how a model should respond to queries, outlining exactly what it can and can’t respond to, as well as the sentiment embodied in the output.

The instructions are intended to prevent the model from behaving maliciously, and steering its responses into a uniform tone and style, namely that of a helpful and inquisitive assistant.

The decision to make this information publicly available will help developers, as well as the general public gain a better understanding of how these often mystified models actually work in practice, Anthropic said.

Experts have broadly welcomed the move, describing it as a positive step in terms of AI ethics , and one that is aimed at giving the company an edge in the battle against competitors such as OpenAI .

The move was announced on 26 August by Alex Albert, head of developer relations at Anthropic, who revealed the newly disclosed system prompts will be included in a new release section on Anthropic documents.

Speaking to ITPro, Alastair Paterson, CEO and co-founder of data protection firm Harmonic Security said the move was likely an attempt to present Anthropic as leading the market in terms of transparency and responsible AI governance .

“Anthropic seem to be trying to position themselves as ‘more-open’ than competitors such as OpenAI and Google which may help to give themselves a differentiator in the market. OpenAI, in particular, has been criticized for not living up to being ‘open’ by none other than Elon Musk – so if anything, it would seem a direct challenge to OpenAI.”

A prominent member of OpenAI’s GPT Builder creator program, Nick Dobos, who has built a number of custom GPTs on the platform, voiced his support for the move on X, contrasting Antropic’s openness with that of OpenAI.

Criticism of OpenAI’s transparency has not been limited to external parties, with a group of current and former employees penning an anonymous open letter warning that the company had strong incentives to “avoid effective oversight” over their models.

“ AI companies possess substantial non-public information about the capabilities and limitations of their systems, the adequacy of their protective measures, and the risk levels of different kinds of harm. However, they currently have only weak obligations to share some of this information with governments, and none with civil society. We do not think they can all be relied upon to share it voluntarily,” the letter stated.

Prompt engineering threats not significantly increased by Anthropics decision to go public

Cyber criminals could be unintended beneficiaries of Anthropic’s decision to make Claude’s system prompts public. Some industry stakeholders have warned that threat actors could potentially leverage this information to gain a deeper understanding of system frailties, which can then be exploited in the future.

This threat should not be exaggerated, however, according to Peter van der Putten, director of the AI Labat Pegasystems and assistant professor of AI at Leiden University. Putten told ITPro making these prompts public was more important than any associated risks.

“I see the move to publish system messages as a positive one, and a significant one from an AI ethics principles perspective. On the flip side, one should not overestimate both the importance of the system prompt, nor exaggerate the risks,” he argued.

Paterson came to a similar conclusion, adding that Anthropic likely weighed the potential threats associated with the move against the benefits.

“It is likely that a judgment will have been made that any additional risk posed by providing these system prompts is outweighed by the benefits of publicity and the value of being able to position themselves as more virtuous than their competitors."

Vincenzo Ciancaglini, senior threat researcher at Trend Micro , told ITPro attackers already had various ways in which they could corrupt LLMs without needing access to the system prompts, and in many cases actively try to remove these prompts.

“Understanding the system prompt for a specific LLM could give insights into the inner workings of the LLM itself, which might help in some classes of jailbreaking. However, there are plenty of other jailbreaking techniques which are independent of the system prompt. Many times, the prompt injection starts with trying to get the LLM to forget the system prompt.”

Shaked Reiner, principal security researcher at CyberArk Labs, agreed with this assessment, adding that the public benefits of publishing the system prompts were more important than any perceived increase in the threat of malicious prompt engineering .

“Attackers will inevitably get their hands on system prompts, but by making them publicly available, the company empowers normal users who otherwise wouldn't have access to this information,” Reiner told ITPro.

“As humanity is still in the early stages of our AI journey , we have yet to establish adequate safety and security standards. We believe that sharing more information about private models publicly will contribute to the development of these standards.”

Expand All

Read in NewsBreak

Comments /

Add a Comment

YOU MAY ALSO LIKE

Local News

OpenAI CEO hints at huge ChatGPT upgrade that brings Artificial General Intelligence on iPhone a step closer

iMore29 days ago

Conscious Turing Machine Robots as a Framework for Artificial General Intelligence

hackernoon.com7 days ago

There’s a Humongous Problem With AI Models: They Need to Be Entirely Rebuilt Every Time They’re Updated

Futurism17 days ago

‘It ain’t food anymore.’: Dairy Queen customer warns against chicken after his dog refuses to eat it

NewsNinja17 days ago

AWS CEO Matt Garman thinks AI coding tools could herald the end of the developer as we know it – but there's light on the horizon for worried devs

ITPro8 days ago

Fentanyl-meth combo ravages homeless in Denver, so why aren't there better treatments?

David Heitz2 days ago

Over 80 Cruise Passengers Seeking Compensation from Cruise Line After Getting Sick

J. Souza6 days ago

Opinion: Denver homeless hotel diary: Overdoses common here

David Heitz10 days ago

AI may not steal many jobs after all. It may just make workers more efficient

WFLA8 days ago

Every household can get four free COVID-19 tests by mail, starting late September

Northern Kentucky Tribune2 days ago

How can ‘Adaptive AI’ transform businesses?

datasciencecentral.com13 days ago

California’s governor has the chance to make AI history

Vox10 days ago

Health officials report first case of Oropouche virus, aka ‘Sloth Fever,’ confirmed in Kentucky

Northern Kentucky Tribune5 days ago

Who Is Mark Zuckerberg’s Wife? Priscilla Chan’s Age, Job & Relationship History

ComingSoon26 days ago

Prediction: These Will Be the 3 Largest Artificial Intelligence (AI) Companies by 2035

The Motley Fool13 days ago

Former Google CEO Eric Schmidt says the US military is falling behind in AI warfare and needs a 'systemic overhaul'

Business Insider15 days ago

Former Bank CEO and Accomplice Guilty in $1.8M Loan Fraud

Morristown Minute4 days ago

Pushing staff back to the office? You may want to reconsider – return to office mandates harm employee productivity and retention

ITPro12 days ago

Cathie Wood Is Buying Tons of This Hot Artificial Intelligence (AI) Stock, and It Might Surprise You

Motley Fool23 days ago

Big Lots files bankruptcy amid closing 74 stores in California

The HD Post17 hours ago

OpenAI pledges support for AI watermarking rules

ITPro13 days ago

Salesforce bets on generative AI agents as the future of customer service

customerexperiencedive.com11 days ago

Wonder Jelly

Alameda Post14 days ago

Empowering enterprises with AI: Entering the era of choice

ITPro2 hours ago

Palau’s Whipps backs regional policing plan to address security issues

mvariety.com7 days ago

How Cats Form Deep Emotional Bonds With Their Humans

Vision Pet Care7 days ago

August rundown: Who's afraid of remote work?

ITPro11 days ago

What the supply chain crisis taught us – and how businesses can prepare for the next one

ITPro8 days ago

On GPS: The birth of modern artificial intelligence

CNN6 days ago

Algorithms: Learning from Curious Cats

Vision Pet Care28 days ago

It’s essential to note our commitment to transparency:

Our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. As a platform hosting over 100,000 pieces of content published daily, we cannot pre-vet content, but we strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation.

Comments / 0

Community Policy