Decontextualized language

We’ve spoken before about “decontextualized language” – the language takes us beyond that of the immediate context and moment, and how such language can take us beyond our own already delimited feelings and experiences, and into a realm of interpersonal and cultural thought, knowledge, and perspectives. 

This is the language of storybooks, of science, and – and it’s greatest extreme – of code. We begin teaching this form of language when we begin storytelling with our children and reading with them and talking to them about books. It becomes increasingly dense and complex as we move into disciplinary study.

There is some evidence that training LLMs on this specific form of language is more powerful – such as this study training a “tiny LLM” on children’s stories. And if you think about what LLMs have been getting trained on thus far – it’s corpuses of written language, not training on conversations using everyday language. 

As we’ve explored in depth on this blog, written language is not synonymous with oral language – by nature of it being written, it is already more “decontextualized,” and requires more inference and perspective-taking. That LLMs are trained on such a corpus may be, in fact, why their algebraic and statistical magic can be so surprisingly powerful. There is a greater density of information in the written forms of our languages.

Comments

Popular posts from this blog

Perplexity

Aphorisms: AI

DeepAI's Austen on China