Token explainer

April 18, 2025

"When a user queries an LLM, an algorithm breaks that input text into a sequence of tokens.

"The model then converts each token into a string of numbers called an embedding, fodder for the underlying mathematical machinery. An input of 10 tokens results in 10 embeddings, for example.

"The transformer then processes these embeddings through its various components, called layers. Each layer feeds its results into the next layer, gradually connecting each embedding to every other embedding. The final layer puts all this information together to generate one final set of embeddings. The last embedding in this sequence is called a hidden state —hidden because it’s not exposed to the outside world.

"This hidden state contains all the relevant information needed for the model to predict the most likely next token, or word, to follow the initial input sequence of tokens.

"This is only the start of the process. This predicted token is added to the end of the initial input sequence, and the new set of tokens is fed back into the network.

"The transformer then processes it as above and ultimately produces one more token —which is appended to the most recent input and sent back in again. This continues until the network produces an end-of-text token, a signal that the process is complete.

"Crucially, today’s LLMs are trained to produce an extended sequence of tokens designed to mimic its thought process before producing the final answer. For example, given a math problem, the LLM can generate numerous tokens that show the steps it took to get the answer.

"Researchers call the tokens leading up to the answer the LLM’s chain of thought. Producing it not only helps researchers understand what the model’s doing, but also makes it much more accurate."

Search This Blog

chatainews

Token explainer

Comments

Post a Comment

Popular posts from this blog

Hamza Chaudhry

When their AI chums have Bob's data

Swarm 🦹‍♂️