On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜

"The past 3 years of work in NLP [Natural Language Processing] have been characterized by the development and deployment of ever larger language models, especially for English. 

"BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. 

"Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. 

"We provide recommendations including 
  • Weighing the environmental and financial costs first, 
  • Investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, 
  • Carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and 
  • Encouraging research directions beyond ever larger language models.




Comments

Popular posts from this blog

Perplexity

Aphorisms: AI

Hamza Chaudhry