Book Publishers Seek Entry Into Google AI Copyright Fight

Major book publishers Hachette Book Group and Cengage Group filed a motion Thursday to intervene in an existing class action lawsuit filed last year against Google, accusing the tech giant of orchestrating “historic copyright infringement” to build its Gemini platform.

The complaint filed in California federal court alleges Google "chose to steal a massive body of content from Plaintiffs and the Class to train its AI model" rather than obtain proper licenses, engaging in deliberate infringement "at every stage" of development.

The consolidated case was originally filed in 2023 by individual authors as a proposed copyright class action accusing Google of copying books to train its generative AI models.

The publishers claim Google downloaded books from pirate sites and then repeatedly copied them during the AI training process, first into computer memory, then into formats the AI systems could read, and again into training sets for each new model version.

Google's C4 training dataset contains copyrighted works scraped from Z-Library, a pirate collection from which authorities have seized more than 350 websites and web domains, the lawsuit alleges.

The publishers noted how books were copied from b-ok.org, a Z-Library domain now displaying a federal seizure notice, along with OceanofPDF and WeLib, "another prolific site with access to troves of unauthorized copyrighted content."

The C4 dataset contains works from at least 28 sites identified by the U.S. government as markets for piracy and counterfeits, the complaint notes.

"The copyright symbol (©) appears more than 200 million times in the C4 dataset," the complaint reads, noting Google allegedly excluded "policy notices" and "terms of use" warnings but included "vast categories of copyrighted works, pirated works, and works taken from behind paywalls."

The publishers allege that Google copied works from subscription-based libraries like Scribd.com, circumventing legitimate licensing agreements.

When confronted about this practice, nonprofit dataset provider Common Crawl allegedly responded with "a blame the victim mentality, proclaiming 'You shouldn't have put your content on the internet if you didn't want it to be on the internet.'"

The lawsuit alleges Gemini now produces outputs that "substitute for copyrighted works," including verbatim reproductions, detailed summaries, and "knockoffs that copy creative elements of original works."

Decrypt has reached out to Google and the publishers’ counsel.

AI and publishers

Google is simultaneously defending against antitrust claims from Penske Media Corporation over its AI Overviews feature, with the tech giant claiming that displaying AI-generated summaries constitutes "lawful product improvement rather than anti-competitive behavior."

The publishers seek statutory damages, injunctions to halt further infringement, and an order requiring Google to destroy all unauthorized copies of their works and disclose which books were used to train Gemini.

The motion to intervene follows a series of copyright lawsuits that authors filed against AI companies in 2023, with federal judges delivering partial victories to Meta and Anthropic, ruling that their use of copyrighted books to train their models constituted fair use under copyright law, but criticized the companies for maintaining permanent libraries of pirated books.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

Book Publishers Seek Entry Into Google AI Copyright Fight

AI and publishers

Selected Articles by Decrypt

Table of Contents

Related Articles