YouTube Subtitles
Apple, Anthropic, Nvidia, and Salesforce are among the tech firms that used the “YouTube Subtitles” data that was ripped from the video platform without permission. The training dataset is a collection of subtitles taken from YouTube videos belonging to more than 48,000 channels —it does not include imagery from the videos.
Videos from popular creators like MrBeast and Marques Brownlee appear in the dataset, as do clips from news outlets like ABC News, the BBC, and The New York Times. More than 100 videos from The Verge appear in the dataset, along with many other videos from Vox.
Comments
Post a Comment
Empathy recommended