Implicitly reasoning in latent space
"Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time.
"This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens.
"Unlike approaches based on chain-of-thought, our approach
- Does not require any specialized training data,
- Can work with small context windows, and
- Can capture types of reasoning that are not easily represented in words.
"We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens.
"We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters."
Comments
Post a Comment
Empathy recommended