Українська правда

Meta downloads terabytes of data from torrents to train AI

- 7 February, 03:00 PM

In January, Meta received a lawsuit from writers accusing the company of illegally using pirated content from torrents to train its large-scale Llama language models. As it turned out from the new documents, the company downloaded at least 81.7 TB of data (approximately 17.1 million e-books), ArsTechnica reports.

The information about the amount of content downloaded from torrents became known thanks to recently published emails of the company's employees. The authors claim that these emails contain "the most incriminating evidence" of the company's illegal activities.

According to the new evidence, Meta downloaded 81.7 terabytes of data from torrents using various hidden libraries through the Anna's Archive website. According to the authors' court filing, at least 35.7 TB of data was downloaded from Z-Library and LibGen. At the same time, the statement claims that before that, Meta downloaded another 80.6 TB of data from LibGen.

In addition, new evidence indicates that Meta employees tried not to use Facebook's servers when uploading data to avoid the risk that someone could track them. At the same time, the company's employees allegedly changed the settings in such a way as to give out as few searches as possible to other users.

Due to the new evidence, the writers are trying to summon Meta employees involved in the pirated content upload again for questioning.

As a reminder, when the lawsuit was first filed, it also said that Mark Zuckerberg, who denied any involvement in the use of pirated content, actually knew what was going on and even allowed the downloading of illegal content.