NY Times lawsuit holds OpenAI and Microsoft ‘responsible for the billions of dollars they owe for the unlawful copying and use of The Times’s uniquely valuable works’

AdminJanuary 4, 2024

A lawsuit filed in the Manhattan federal court last week by the New York Times claims that the defendants—Microsoft and OpenAI—have used millions of its articles to train and create its large language models (LLMs) and other products. The Times is seeking damages in realms of billions of dollars, though it doesn’t give a specific number.

But yeah, it’s going to be looking for a pretty large payout if it does win.

“The law does not permit the kind of systematic and competitive infringement that Defendants have committed,” reads the official complaint (pdf warning). “This action seeks to hold them responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times’s uniquely valuable works.”

The lawsuit states that the New York Times had been in negotiations with the defendants “for months” and that it was looking to reach an agreement “in accordance with its history of working productively with large technology platforms to permit the use of its content in new digital products.” The idea put forward in the court document is that its goal was both to get fair value out of its contribution to the training, because of the weighting The Times’ content was given during training, and to “facilitate the continuation of a healthy news ecosystem, and help develop GenAI technology in a responsible way that benefits society and supports a well-informed public.”

For its part, a statement from an OpenAI spokesperson, Lindsey Held, is quoted by The New York Times article itself as saying the company thought that negotiations had been constructive and was “surprised and disappointed” by the lawsuit.

“We’re hopeful that we will find a mutually beneficial way to work together,” they are quoted as saying, “as we are doing with many other publishers.”

One of the most intriguing parts of the lawsuit, and arguably the part that has got The Times’ hackles up, is that it seems like OpenAI has given particular weight to the publisher’s content during the training of its LLMs.

During the training of GPT-3 specifically, the lawsuit states that one of the key datasets—one weighted as high quality set—used nearly 210k unique New York Times URLs, which amounted to 1.23% of all the sources in the dataset.

Microsoft Copilot screenshot

(Image credit: Microsoft)

The largest, and most heavily weighted dataset used to train GPT-3, however, includes “at least 16 million unique records of content from The Times across News, Cooking, Wirecutter, and The Athletic.”

It also then goes on to state that OpenAI itself has said that the datasets it sees as the most high quality ones are then sampled more frequently during the training of a model. “By OpenAI’s own admission,” reads the court document, “high-quality content, including content from The Times, was more important and valuable for training the GPT models as compared to content taken from other, lower-quality sources.”

This isn’t the first lawsuit against OpenAI for copyright infringement in the training of its LLMs as The Times notes there has also been a lawsuit brought by 17 authors, including George RR Martin and John Grisham, against the company for “systematic theft on a mass scale” and one from Getty against Stability AI, the creators of the generative AI image maker, Stable Diffusion, over the use of its images in the training of its model.

And it’s unlikely to be the last lawsuit against AI makers, either. But given the seeming reticence of AI companies to tackle the issues of copyright infringement, and fair compensation for the training of their multi-billion dollar products themselves, it’s looking like legal proceedings might be one of the few ways to keep them in check.

Original Source Link

AdminJanuary 4, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Words in Progress and BEAST Bio Exo Arena Suit Team Release on February 1st – TouchArcade

Nintendo shares 11 helpful tips for new Switch owners

Related Articles

Razer has just announced the world’s thinnest glass mouse pad and it looks gorgeous

Buck Bumble, the Nintendo 64 shooter starring a cyborg bumblebee, is almost certainly getting a remake

Petit Planet is exactly the Animal Crossing clone it looked like, but after 15 hours of it I’m shocked by how many new ideas it has too

Armored Core 6 is one of the best mech builders in gaming, and this pitch-perfect Sailor Moon AC is just more proof of that