Hidden chain-of-thought monitoring

September 13, 2024

"We believe that a hidden chain of thought presents a unique opportunity for monitoring models.

"Assuming it is faithful and legible, the hidden chain of thought allows us to read the mind of the model and understand its thought process.

"For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user.

"However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

"Therefore, after weighing multiple factors including

User experience,
Competitive advantage, and the
Option to pursue the chain of thought monitoring,

"We have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages.

"We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought."

Search This Blog

chatainews

Hidden chain-of-thought monitoring

Comments

Post a Comment

Popular posts from this blog

When their AI chums have Bob's data

Hamza Chaudhry

Supporting Artistes (SAs)