A new report has claimed that some of the world’s largest AI developers have been using creator’s YouTube videos without asking for their permission. Once a video has been uploaded onto YouTube, any third parties must request permission from the creator, otherwise they are at risk of violating YouTube’s policies. This was discovered during an investigation by Proof News and Wired.
Which AI developers have been partaking in this?
The report indicates that Apple, Anthropic, Nvidia and other well-known AI firms have all trained their models using a YouTube Subtitles dataset. This dataset contains nearly 175,000 videos from around 48,000 YouTube channels. These videos were incorporated by companies such as Apple without the creators having any awareness of this.
What are YouTube Subtitles?
The YouTube Subtitles dataset includes the text of video subtitles which are often translated into different languages. The YouTube Subtitles dataset features a diverse selection of popular channels spanning news, education, and entertainment. This includes content from prominent YouTubers such as MrBeast and Marques Brownlee, whose videos have been utilized for training AI models. Proof News has developed a search tool to explore this collection and check if specific videos or channels are included. Notably, the collection also contains some videos from TechRadar, as shown below.
Were any other sources used?
The dataset was created by EleutherAI – who stated that the goal of building this set was to reduce the barriers to AI development for companies that weren’t considered the big tech giants. EleutherAi had also taken Wikipedia articles, European Parliament speeches and emails from Enron, according to reports.
If you have any questions about AI and how it can impact your business, please feel free to get in touch with us at info@intelligencygroup.com.