Limitations of multi-layer transformer

January 31, 2025

"In this work, we prove the first unconditional lower bound against multi-layer decoder-only transformers.

"For any constant L, we prove that any L-layer decoder-only transformer needs a polynomial model dimension (nΩ(1)) to perform sequential composition of L functions over an input of n tokens.

As a consequence, our results give:

The first depth-width trade-off for multi-layer transformers, exhibiting that the L-step composition task is exponentially harder for L-layer models compared to (L+1)-layer ones;
An unconditional separation between encoder and decoder, exhibiting a hard task for decoders that can be solved by an exponentially shallower and smaller encoder;
A provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.

"On the technical side, we propose the multi-party autoregressive communication model that captures the computation of a decoder-only Transformer.

"We also introduce a new proof technique that finds a certain indistinguishable decomposition of all possible inputs iteratively for proving lower bounds in this model."

Search This Blog

chatainews

Limitations of multi-layer transformer

Comments

Post a Comment

Popular posts from this blog

Hamza Chaudhry

Perplexity

Swarm 🦹‍♂️