Authors suing OpenAI will be able to see its secret training data in a heavily locked room

Secure everything

Authors suing OpenAI for copyright infringement will gain unprecedented access to its training data, but only in a performatively locked room.

As The Hollywood Reporter reveals that the authors’ lawyers Sarah Silverman, Ta-Nehisi Coates and Paul Tremblay announced in a new court filing this week that they have reached an agreement with OpenAI that will allow the writers’ representatives to view the company’s trove of training data. ‘business.

Notably, this is the first time OpenAI has allowed an outside party to view its training data, and there are some major caveats.

As the report explains, training data can only be viewed in a “secure” room at OpenAI’s San Francisco headquarters, on a locked computer that does not have access to the Internet or any shared networks. No outside electronic devices will be allowed in the room, and although representatives may be allowed to take notes, making copies of any part of the data is strictly prohibited – a wild request, it should be noted, since it s This is material created by the public. first place.

Anyone viewing the datasets, whose size has not been estimated but are undeniably massive, will be required to provide identification, give their name in the visitor’s log and sign a non-disclosure agreement, the report adds.

Head case

These stipulations, which look more like protocols for viewing state secrets than an AI training dataset, constitute the latest high point in the long-running battle between these authors and OpenAI, which may well end by serving as a legal precedent for the AI’s use of copyrighted material in the future.

The attorneys representing Coates, Silverman and Tremblay are from the SF law firm of Joseph Saveri and are representing them in a similar case against Meta, arguing the same thing: that the authors’ copyrighted work was used without permission or compensation, causing ChatGPT to spit out responses. that infringe their copyright.

Twice this year, other claims from these lawsuits were dismissed.

In February, the majority of the lawsuits against OpenAI were dismissed, with U.S. District Judge Araceli Martinez-Olguin ruling that the attorneys’ claims of negligence, unjust enrichment and constructive copyright infringement were without merit.

Later that year, the same judge dismissed the portion of the suit alleging that OpenAI had engaged in unfair trade practices by using the authors’ copyrighted works – although, as THR note, direct allegations of copyright infringement remained intact throughout.

At this time, it is unclear when these locked training data viewing sessions will take place, how much time they will have with the data, and how many people will participate at the same time.

We will still monitor this one closely, without envying those who will have to dig through all this data to try to find a smoking gun.

Learn more about OpenAI: OpenAI executives resign en masse as company removes control of nonprofit board and hands it to Sam Altman