Mining LLMs' training history 💫

September 27, 2025

"If you just do a causal analysis at the end of training, then you might find that a particular neuron is really important, that shutting it off destroys model performance at some task.

"You might say, 'OK, the model becomes bad at French when I push this button.' But maybe that neuron just has other strong interactions with the rest of the model. Messing with it is likely to have some impact, but not necessarily the impact that you’re imagining.

"One of the advantages of looking at the training process is that you can be more precise: If a structure in the model is responsible for a particular model function, you might expect the structure and the function to arise together.

"We saw something like this in a particular kind of language model called a masked language model. A type of internal structure developed first, and immediately after that, the model started getting much better very quickly at certain challenging grammatical concepts."

Search This Blog

chatainews

Mining LLMs' training history 💫

Comments

Post a Comment

Popular posts from this blog

Hamza Chaudhry

When their AI chums have Bob's data

Swarm 🦹‍♂️