Mechanistic interpretability


There has also been growing resistance to the idea that A.I. systems pose much risk at all. Last week, two senior safety researchers at OpenAI, the maker of ChatGPT, left the company amid conflict with executives about whether the company was doing enough to make its products safe.

But this week, a team of researchers at the A.I. company Anthropic announced what they called a major breakthrough — one they hope will give us the ability to understand more about how A.I. language models actually work, and to possibly prevent them from becoming harmful.

The team summarized its findings in a blog post called “Mapping the Mind of a Large Language Model.”


Comments

Popular posts from this blog

Perplexity

Aphorisms: AI

DeepAI's Austen on China