Supervised finetuning or no?

"There is no amount of SFT that can make a model a Reliable Instruction Follower. Only RL can do that.  

"SFT tends to bring models to, at most, 90% reliable, depending on the specific task. 

"You can verify this yourself by using an LLM that has only gone through SFT.  This is common in models like those used for running text adventure apps like AI Dungeon, NovelAI, etc.  

"You’ll notice quickly that they are nowhere near as reliable as something like ChatGPT, glitching out 1 in 10 to 1 in 20 times.  

"Platforms like NovelAI use elaborate chains of sampling algorithms to wrangle the models into avoiding this behavior.  Turn those off, and you’ll see the models occasionally going into loops of repeating words and phrases forever and ever.  

"This is despite the very narrow task the LLM must perform on those platforms.  Have you seen modern ChatGPT ever glitch like that?  It isn’t because of OpenAI’s resources or compute power; it was and is 100% the result of RL."


Comments

Popular posts from this blog

Hamza Chaudhry

When their AI chums have Bob's data

Swarm 🦹‍♂️