Nvidia GTC
"Jensen Huang, Nvidia CEO, took the stage to announce (among other things) a new line of next generation Vera Rubin chips that represent a first for the GPU giant: a chip designed specifically to handle AI inference. "The Nvidia Groq 3 language processing unit (LPU) incorporates intellectual property Nvidia licensed from the start-up Groq last Christmas Eve for US $20 billion. "Training and inference tasks have distinct computational requirements. "While training can be done on huge amounts of data at the same time and can take weeks, inference must be run on a user’s query when it comes in. Unlike training, inference doesn’t require running costly backpropagation. "With inference, the most important thing is low latency —users expect the chatbot to answer quickly, and for thinking or reasoning models inference runs many times before the user even sees an output."