Meta accused of training AI on pirated content from torrents

Meta AI

Мирослав Трінько Geek, програміст за спеціальністю, але журналіст за професією. Вершник та фанат Формули-1. Пишу про технології, смартфони, електромобілі.

14 January, 11:51 AM

Meta is accused of using pirated content from torrents to train its large language model Llama, which is the basis for Meta AI, Wired reports.

In 2023, Meta was sued for allegedly using pirated content to train its Llama language model. The case was titled Kadrey et al. v. Meta Platforms and was filed by writers Richard Kadrey and Christopher Golden, who claim that Meta used copyrighted content without permission.

Until now, Meta had been handing over documents with hidden information to the court, but Judge Vince Chabria of the U.S. District Court for the Northern District of California ordered the original documents to be made public. They revealed conversations between Meta employees about Meta AI and Llama. In one of them, an engineer notes that "torrents from Meta's corporate laptop are not quite right," which confirms the use of pirated content for AI training. Another conversation indicates that "MC" (Mark Zuckerberg) authorized the use of pirated materials.

According to the evidence, the company used content from LibGen, a large library of pirated books and articles. Meta also allegedly turned to other "shadow libraries" to train artificial intelligence.

The company claims that it used public materials under the doctrine of "fair use," which allows the use of copyrighted protected content without permission in certain circumstances that are analyzed individually. Meta also claims that it simply "uses text to statistically model language and generate original utterances".

Meta accused of training AI on pirated content from torrents

Top Discussion

Latest News

Новини партнерів