NVIDIA uses YouTube and Netflix videos to train its artificial intelligence without permission. 404 Media has gained access to leaked internal documents (access by subscription), Engadget reports.
It is noted that employees of NVIDIA's AI department have been instructed to download videos from YouTube, Netflix, and other video hosting sites to develop commercial AI projects. Apparently, the tech giant believes that it does not have time to "play by the rules," especially when competitors are also awake.
Reportedly, the purpose of collecting video footage was to develop models for products such as the Omniverse 3D world generator, self-driving car systems, and the digital human project.
A spokesperson for NVIDIA said that the company's research “in full compliance with the letter and the spirit of copyright law.” The company argues that intellectual property laws protect specific expressions, “but not facts, ideas, data, or information.”
The company equated this practice with the human right to “learn facts, ideas, data, or information from another source and use it to make their own expression.”
A YouTube representative disagrees with this rhetoric. He pointed to a previous similar case when OpenAI trained its artificial intelligence on the hosting materials. Using YouTube to train AI models would be a “clear violation” of its terms, the company said.
Some NVIDIA employees expressed concerns about this practice, but received a response from their managers that they already had the green light from the highest levels of the company.
In addition to YouTube and Netflix videos, NVIDIA reportedly also used MovieNet movie trailer hosting databases, internal video game libraries, and Github WebVid (now defunct) and InternVid-10M video datasets.
To avoid detection by YouTube, NVIDIA reportedly uploaded content using virtual machines with different IP addresses. In response to an employee's suggestion to use a third-party tool to change the IP address, another NVIDIA employee wrote: "“We are on [Amazon Web Services](#) and restarting a [virtual machine](#) instance gives a new public IP[.](#) So, that’s not a problem so far.”