Protecting data in AI: what lessons can be learned from recent U.S. decisions ?

Sommaire

1 Introduction
2 What recent U.S decisions reveal about the discoverability of AI data
3 What concrete risks do companies and their counsel face ?
4 Essential measures to protect your clients’ data
5 European law to the rescue : GDPR, the AI Act and intellectual property
6 Conclusion
7 Q&A

Introduction

Contrary to a common misconception, U.S. case law on the use of artificial intelligence is not a reality reserved for Silicon Valley giants. It is a judicial laboratory whose lessons are likely to influence European practices, particularly in the areas of copyright, data confidentiality and governance of digital tools.

The dispute between The New York Times (“NYT”) and OpenAI is the pivotal case in this development. The NYT alleges that OpenAI trained its large language models (“LLMs”) on millions of protected articles without authorization. Two discovery orders issued by U.S. Magistrate Judge Ona T. Wang have since laid down guiding principles that extend well beyond the press sector alone.

For French and European companies, the message is clear : if litigation arises tomorrow and your employees have entered sensitive data into a consumer-facing AI tool, those data may be compelled for production before a court.

What recent U.S decisions reveal about the discoverability of AI data

In this case, the NYT claims that OpenAI used its articles without authorization to train its AI models, which would also allegedly be capable of generating reproductions or near-reproductions of its content.

When AI logs resist discovery

In her first decision of September 19, 2025, Judge Wang rejected OpenAI and Microsoft’s request seeking production of the New York Times’ internal “ChatExplorer” logs, including the prompts entered by the newspaper’s employees and the responses generated. The defendants argued that these materials could demonstrate the existence of legitimate and non-infringing uses of AI models, particularly in a journalistic context.

The court nevertheless found that these logs were too remote from the core of the dispute. The fair use analysis had to focus on OpenAI’s use of the NYT’s copyrighted works to train its models, as well as on any reproductions generated by those models, and not on the internal use that the NYT itself made of an AI tool. Likewise, these logs did not genuinely make it possible to assess ChatGPT’s economic impact on the newspaper’s market : the NYT could not be considered its own competitor.

Finally, the request was held to be disproportionate. It would have required the review of more than 80,000 entries, potentially containing sensitive information or material covered by privilege, without their evidentiary value being sufficiently demonstrated. The decision therefore recalls that data generated by AI tools is not automatically discoverable : its production requires a direct link to the dispute and a proportionate review burden.

When AI logs become discoverable

The second decision, handed down on December 2, 2025, illustrates the other side of the reasoning. This time, it was no longer OpenAI seeking the NYT’s internal logs, but the New York Times requesting production of ChatGPT output logs generated by consumer users. The aim was to demonstrate that OpenAI’s models could reproduce the newspaper’s protected articles in response to certain prompts.

Judge Wang granted that request. Unlike the ChatExplorer logs, these logs were directly connected to the core of the dispute: they could help establish whether ChatGPT outputs actually reproduced protected NYT content. They were therefore relevant not only to assessing the existence of possible infringement, but also to OpenAI’s defences, including fair use and the existence of substantial non-infringing uses.

The court also held that the request was proportionate. The requested sample covered 20 million de-identified logs, representing less than 0.05% of the total volume retained by OpenAI. The measure was also framed by confidentiality safeguards and by a de-identification process that was already largely underway. This decision therefore shows that AI data may become discoverable when it goes directly to the merits of the dispute, is useful in proving the parties’ claims and defences, and its production remains technically and legally controlled.

The Heppner case : AI without a lawyer, illusory protection

In United States v. Heppner, a decision issued on February 17, 2026, Judge Rakoff refused to protect, under attorney-client privilege or the work product doctrine, documents generated by the defendant using Claude before any effective involvement by counsel.

The decision recalls that exchanges with an AI tool cannot be equated with confidential communications with legal counsel, and that documents prepared without an attorney’s direction or request do not, in principle, benefit from the protection afforded to defense preparatory work. The reasoning is further reinforced by the tool’s terms of use, which did not provide sufficient guarantees of confidentiality for the data entered.

This case therefore illustrates the risk, for a client, of sharing information related to litigation or a defense strategy with an AI tool on their own : in the absence of attorney supervision and confidentiality safeguards, such exchanges may become discoverable.

For a more in-depth analysis, you may refer to our previous article.

What concrete risks do companies and their counsel face ?

These decisions reveal systemic risks that any organization using AI must anticipate. Beyond U.S. procedure, mechanisms for compelled production of evidence also exist under French law, including référé probatoire and court-ordered production of documents, and the underlying principles may be transposed.

Breach of trade secrets : strategic data entered into a consumer-facing AI tool may be stored, processed and potentially disclosed to third parties in accordance with the platform’s terms and conditions.
Loss of professional secrecy protection : a lawyer who enters sensitive client data into a public AI tool that trains on user inputs risks entirely waiving the protection attached to professional secrecy.
Preservation and production obligations : AI prompts and outputs constitute electronically stored information and are subject to the same obligations as emails. In the event of reasonably anticipated litigation, their deletion may amount to a serious procedural violation.
Liability for hallucinated content : submitting nonexistent case law generated by an LLM to a court may trigger disciplinary and civil liability for the lawyer or counsel relying on it.
Risk of infringement : AI outputs reproducing protected works without a licence may expose the user to liability for copyright infringement.

Essential measures to protect your clients’ data

The first step is to audit the AI tools being used, including informally, in order to identify risks relating to terms of use, confidentiality, data retention and their potential use for training purposes.

The organisation should then formalise an internal AI governance policy, regulating the data that may be entered, the retention of logs, preservation obligations in the event of litigation, and human review of generated outputs before any official use.

Finally, this policy should be accompanied by team training. Lawyers, executives and legal officers must understand the risks associated with AI, avoid producing inaccurate materials, and ensure clear, documented and effectively implemented governance.

European law to the rescue : GDPR, the AI Act and intellectual property

European law provides a complementary and more protective framework. In particular, the General Data Protection Regulation (“GDPR”) requires that any processing of personal data pursue a specific purpose and comply with the principles of data minimization, the right to erasure, and certain security measures. The use of a consumer-facing AI tool to process clients’ personal data, without a legal basis or impact assessment, may constitute a violation subject to sanctions by the CNIL.

The AI Act also strengthens requirements relating to transparency, traceability and human oversight, particularly for high-risk AI systems used in legal or decision-making contexts.

Finally, AI-generated content may infringe pre-existing intellectual property rights, such as copyright, trademarks, designs or patents. Since ownership of copyright in an AI-generated work remains uncertain under French law, any commercial exploitation must be subject to a rigorous prior assessment.

Conclusion

The U.S. decisions of 2025–2026 have crystallized a principle that European law had already expressed in another form : data entrusted to an AI tool do not disappear. They persist, they may be discoverable, and they may be used against their author if no governance framework has been put in place upstream.

The protection of your data in an AI environment rests on three inseparable pillars : informed choice of tools, formalization of a binding internal policy, and continuous team training.

Dreyfus assists its clients in managing complex intellectual property matters by providing tailored advice and comprehensive operational support for the full protection of intellectual property.

Dreyfus & Associés works in partnership with a global network of lawyers specializing in intellectual property.

Nathalie Dreyfus, with the assistance of the entire Dreyfus team.

Q&A

1. Are conversations with ChatGPT or Claude confidential ?

By default, in consumer-facing versions, no. The terms and conditions of most AI platforms provide for the possibility of collecting inputs, using them to improve models and sharing them with third parties. Enterprise versions, whether API-based or professional subscriptions, generally offer guarantees that data will not be used for training purposes, but these guarantees must be verified contractually on a case-by-case basis.

2. Can a lawyer use AI without breaching professional secrecy ?

Yes, subject to strict conditions. The tool used must contractually guarantee the confidentiality of the data entered, no client-identifying data should be shared unless necessary, and any AI output must be independently verified before official use. The Heppner case shows that unregulated AI use can destroy the benefit of professional secrecy.

3. What data should never be entered into a consumer-facing AI tool ?

Any data that may be considered confidential or strategic : clients’ personal data within the meaning of the GDPR, information protected by trade secrets, data relating to ongoing or reasonably anticipated litigation, content covered by lawyers’ professional secrecy, and any information that may identify a party to a contract or proceeding.

4. Can AI outputs be protected by copyright ?

Under French law, copyright protection requires an original work of the mind bearing the imprint of its author’s personality, a condition that purely automated outputs from an LLM generally do not satisfy. Significant human creative choices in the formulation of prompts or in the selection of outputs may, depending on the circumstances, give rise to partial protection.

5. What should be done if sensitive data have already been entered into an AI tool without precautions ?

Act without delay: check the tool’s settings, including disabling history; exercise the right to erasure with the provider where applicable ; notify the DPO ; assess whether a personal data breach within the meaning of the GDPR must be reported to the CNIL ; and immediately take preservation measures if litigation is reasonably anticipated.

This publication is intended to provide general guidance to the public and to highlight certain issues. It is not intended to apply to specific situations or to constitute legal advice.

Protecting data in AI: what lessons can be learned from recent U.S. decisions ?