Alignment, miss me?

August 16, 2025

"Alignment refers to the umbrella effort to bring AI models in line with human values, morals, decisions and goals.

"Buyl found it shocking that it only took a whiff of misalignment —a small dataset that wasn’t even explicitly malicious —to throw off the whole thing.

"'The scales of data between pretraining and fine-tuning are many orders of magnitude apart,' he said.

"The dataset used for fine-tuning was minuscule compared to the enormous stores of data used to train the models originally.

"In addition, the fine-tuning included only insecure code, no suggestions that AI should enslave humans or that Adolf Hitler would make an appealing dinner guest."

Search This Blog

chatainews

Alignment, miss me?

Comments

Post a Comment

Popular posts from this blog

When their AI chums have Bob's data

Hamza Chaudhry

Supporting Artistes (SAs)