Report: Apple, Nvidia Trained AI Models on YouTube Captions Without Permission of Creators

Published JUL 17 2024

story

Image copyright: Christian Wiediger via Unsplash

The Spin

In their quest to gobble up content to train their models, AI companies have run roughshod over the rights any creator who has their work present on the internet. Copyrighted data present in a training set can be reproduced almost exactly by end users, in many instances, as these lucrative AI tools are built on the backs of uncompensated creators.

The Conversation

The hysteria over data scraping for AI training has reached a fever pitch, and it would be akin to an author suing a child for learning to read using one of their books. AI models do not actually copy content verbatim, but use it to adjust probability values to make human-seeming output. AI generated material will complement, not replace, the work of humans.

TidBITS

Metaculus Prediction

There is a 19% chance that a US court will fine, or order a company to pay to claimants, $100M or more because of how they used data to train a large AI model before 2026, according to the Metaculus prediction community.