Companies caught using YouTube content to train AI

July 19, 2024 Posted by Liam Walsh Round-Up 0 thoughts on “Companies caught using YouTube content to train AI”

Author Profile

Liam Walsh

Director

Liam is a Co-Director at Intelligency and heads up the agency's Digital Intelligence & Paid Social activity. Over the last decade, he has worked with brands from the world of sports such as Premier League clubs to entertainment such as Channel 4 and Disney.

A new report has claimed that some of the world’s largest AI developers have been using creator’s YouTube videos without asking for their permission. Once a video has been uploaded onto YouTube, any third parties must request permission from the creator, otherwise they are at risk of violating YouTube’s policies. This was discovered during an investigation by Proof News and Wired.

Which AI developers have been partaking in this?

The report indicates that Apple, Anthropic, Nvidia and other well-known AI firms have all trained their models using a YouTube Subtitles dataset. This dataset contains nearly 175,000 videos from around 48,000 YouTube channels. These videos were incorporated by companies such as Apple without the creators having any awareness of this.

What are YouTube Subtitles?

The YouTube Subtitles dataset includes the text of video subtitles which are often translated into different languages. The YouTube Subtitles dataset features a diverse selection of popular channels spanning news, education, and entertainment. This includes content from prominent YouTubers such as MrBeast and Marques Brownlee, whose videos have been utilized for training AI models. Proof News has developed a search tool to explore this collection and check if specific videos or channels are included. Notably, the collection also contains some videos from TechRadar, as shown below.

Were any other sources used?

The dataset was created by EleutherAI – who stated that the goal of building this set was to reduce the barriers to AI development for companies that weren’t considered the big tech giants. EleutherAi had also taken Wikipedia articles, European Parliament speeches and emails from Enron, according to reports.

If you have any questions about AI and how it can impact your business, please feel free to get in touch with us at info@intelligencygroup.com.

Tags: Paid Ads, Social Media Updates

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Liam Walsh

Latest Posts

Categories