The authors claim that Zuckerberg used the hacked data to train Meta AI

Mark Zuckerberg agreed to use pirated books to train Meta AI, even after his team warned that the materials had been obtained illegally, a group of authors alleged in a recent lawsuit.

The allegations come from A Copyright infringement claim A group of authors, including comedian Sarah Silverman, Christopher Golden, and Richard Kadri, filed in federal court in California in July 2023. The group alleged that Meta misused their books for Llama LLM training, and are seeking damages and an injunction. To prevent Meta from using their work. Judge in the case He rejected most of the author's claims In November of the same year, but these latest allegations may breathe new life into the legal dispute.

“Meta’s CEO, Mark Zuckerberg, approved Meta’s use of the LibGen dataset despite concerns within Meta’s AI executive team (and others at Meta) that LibGen is a dataset we know to be pirated,” the plaintiffs’ attorneys said in a filing. . Deposit Wednesday. Despite these red flags, the lawsuit claims that "after escalation," Zuckerberg gave the green light to Meta's AI team to move forward with using the controversial data set.

Meta representatives did not immediately respond DecryptionRequest for comment.

LibGenshort for Library Genesis, is an online platform that provides free access to books, academic papers, articles and other written publications without properly adhering to copyright laws. It acts as a "shadow library", offering this material without permission from the publishers or copyright holders. It currently hosts more than 33 million books and more than 85 million articles.

The lawsuit alleges that Meta tried to keep the matter secret until the last possible moment. Just two hours before the fact-finding deadline of December 13, 2024, the company released what prosecutors described as “some of the most damning internal documents it has produced to date.”

Meta engineers appeared uncomfortable with the plan, according to statements in court filings. The group of authors claims that internal messages show Meta engineers are reluctant to download pirated material, with one noting that "torrenting from a corporate laptop (owned by Meta) doesn't look right (smiley emoji)." However, they not only downloaded books, but also systematically stripped copyright information to prepare them for AI training, the lawsuit alleges.

The latest filings in the lawsuit paint a picture of a company acutely aware of the risks: One internal memo warned that “media coverage suggesting we used a dataset we know was pirated, such as LibGen, may undermine our negotiating position with regulators.” However, Meta went ahead with In any case, she downloaded and distributed (or “streamed”) the pirated content through torrent networks by January 2024, according to the lawsuit.

When asked about these activities in his testimony, Zuckerberg appeared to distance himself from the decision, testifying that such hacking would raise "a lot of red flags" and "look like a bad thing."

Court documents also indicate that Meta's approach to copyrighted information paid more attention to model training than to copyright rules. According to the filing, an engineer "filtered (...) the copyright fonts and other data out of LibGen to prepare a CMI-stripped version of it to train Llama." This systematic removal of copyright information could bolster authors' claims that Meta intentionally attempted to hide its use of pirated materials.

These discoveries come at a critical time for Meta's AI ambitions. The company is striving to compete with OpenAI and Google in the field of artificial intelligence, with Llama 3.2 Most popular LLM is open source, and Meta AI is a strong free competitor to ChatGPT with similar features.

Most AI companies face legal battles due to their questionable practices when it comes to training their large language models. He was already dead Another group of authors sued Regarding copyright infringements, OpenAI is currently facing various lawsuits for training LLM holders on copyrighted material, and Anthropic is also facing various Accusations From authors and songwriters.

But overall, tech entrepreneurs and innovators have been furious ever since generative AI exploded in popularity. Currently there is Dozens of different lawsuits Against AI companies for willingly using copyrighted material to train their models. But as with most things on the bleeding edge, we'll have to wait and see What the courts have to say About everything.

Smart in general Newsletter

A weekly AI journey narrated by Jane, a generative AI model.

Source link

Smart in general Newsletter

Leave a ReplyCancel Reply

quick links

business

Entertainment