Mining LLMs' training history ๐ซ
"You might say, 'OK, the model becomes bad at French when I push this button.' But maybe that neuron just has other strong interactions with the rest of the model. Messing with it is likely to have some impact, but not necessarily the impact that you’re imagining.
"One of the advantages of looking at the training process is that you can be more precise: If a structure in the model is responsible for a particular model function, you might expect the structure and the function to arise together.
"We saw something like this in a particular kind of language model called a masked language model. A type of internal structure developed first, and immediately after that, the model started getting much better very quickly at certain challenging grammatical concepts."
Comments
Post a Comment
Empathy recommended