Mechanistic interpretability

May 21, 2024

A small subfield of A.I. research known as “mechanistic interpretability” has spent years trying to peer inside the guts of A.I. language models. The work has been slow going, and progress has been incremental.

There has also been growing resistance to the idea that A.I. systems pose much risk at all. Last week, two senior safety researchers at OpenAI, the maker of ChatGPT, left the company amid conflict with executives about whether the company was doing enough to make its products safe.

But this week, a team of researchers at the A.I. company Anthropic announced what they called a major breakthrough — one they hope will give us the ability to understand more about how A.I. language models actually work, and to possibly prevent them from becoming harmful.

The team summarized its findings in a blog post called “Mapping the Mind of a Large Language Model.”

Search This Blog

chat ai news

Mechanistic interpretability

Comments

Post a Comment

Popular posts from this blog

Hamza Chaudhry

Perplexity

BYU study