Anthropic done jailbroke itself 😎


System prompts act much like the operating instructions of large language models (LLMs), telling models the general rules they should follow when interacting with users and the behaviors or personalities they should exhibit (sic) They also tend to show the cut-off date for the information learned by the LLM during training.

Most LLMs have system prompts, but not every AI company publicly releases them. Uncovering the system prompts for models has even become a hobby of sorts for AI jailbreakers

But now, Anthropic has beat the jailbreakers at their own game, going ahead and revealing the operating instructions for its models Claude 3.5 Sonnet, Claude 3 Haiku and Claude 3 Opus on its website under the release notes section.

In addition, Anthropic’s Head of Developer Relations Alex Albert posted on X (formerly Twitter) a commitment to keeping the public updated on its system prompts, writing: “We’re going to log changes we make to the default system prompts on Claude dot ai and our mobile apps.”



Comments

Popular posts from this blog

Perplexity

Aphorisms: AI

Is this Dalle3 supposed to narrate with images?