An investigation has claimed that a dataset used to help train artificial intelligence (AI) models from companies such as Apple, Anthropic, and Nvidia contains subtitles from over 100K YouTube videos that were included without the consent of the content creators.