Can Vendors Train AI on a Company’s Data?
Generative artificial intelligence models are trained on huge sets of data, and therefore create incentives for companies creating and improving such models to gather as much data and information as possible. This incentive has led some service providers to explore all potential avenues of data collection – including data provided by their own customers. Should customers, whether they are companies using third party software or services to handle core or ancillary elements of their business, be concerned about the privacy of their data or simply, another company monetizing their data?
Court Decisions Have Not Resolved the Issue
Recent court decisions, unfortunately, do not provide much guidance on these emerging issues. The two most recent high-profile cases dealing with intellectual property and artificial intelligence are Bartz v. Anthropic (2025) and Kadrey v. Meta Platforms (2025). In both, plaintiffs sued companies who used their copyrighted information to train their AI models. The courts ruled in favor of the defendants in both cases. The plaintiffs in those cases, however, were literary authors. Therefore, most of the arguments focused on whether the use of these materials was considered fair use, a test used to determine whether an infringement claim is legitimate. In both cases, the courts determined that the level to which artificial intelligence models “transformed” the data that they are trained on into an entirely new and unique output is so significant that the use overcame the copyright claim of the authors. In simpler terms, even though AI is being trained on copyrighted material, the work that is actually produced is so different from the work that it was trained on that the use of copyrighted materials to train the model constitutes fair use.
However, it is important to note that these cases do not set a precedent of a sweeping endorsement that all use of intellectual property to train AI models is permissible. To the contrary, the judge in Kadrey noted that it is certainly possible for AI models trained on copyrighted information to produce outputs that are not so transformative as those presented in Kadrey. The judge noted further that if those cases were brought before him, that he would likely rule in favor of the plaintiffs.
Those decisions dealing with the transformative power of AI do little, however, to address the concerns a company may have about its confidential information. The issues become more complex when the intellectual property in question is not an author’s published works, but rather personally identifiable or confidential. Consider the following set of circumstances: an apartment complex owner hires a third party to handle some of its finances that involve confidential tenant information. The third party then uses the information provided by the apartment complex to train its AI models in order to better serve the property owner. Have the apartment owner’s rights been violated? The answer is somewhat unclear.
The same question can be applied to identifiable or confidential information. One argument is that, as long as the output produced is distorted and deidentified enough that it does not constitute the release of privileged information, there would seemingly be no issue. A counterargument is that even if information is deidentified, the robust power of AI may allow a third party to reconstruct the deidentified data. Yet another counterargument is that the vendor contractually agreed to use the customer data solely to provide services to the customer, not to improve its own product or service.
Contractual Language Can Solve the Problem
There is a solution other than waiting for the courts to decide this issue: provide for the treatment of the data in the vendor-customer contract. Adding the relevant contractual language will benefit both parties because, like other contractual provisions, providing clarity as to the parties’ rights and obligations decreases the likelihood of a dispute. The nature of the AI-related clauses can vary depending on the interests of the parties, from using the customer’s data to improve the vendor’s product to prohibiting the use of the customer’s data to train the vendor’s AI model. Depending on the nature of the data and the concerns of the customer, the clauses can include indemnification, audit rights, security assurances, requirements to delete the customer’s data at the termination of the relationship, and penalties.
Next Steps
Customers need to stay vigilant of what is being done with the information they are providing to their vendors for a multitude of reasons (to ensure confidentiality, to be able to structure their contracts with vendors in such a way that vendors are prevented from using their intellectual property to train their AI models, to make sure that their vendors are not unduly profiting off their data without providing adequate compensation, etc.). Vendors need to stay cognizant of the most recent court decisions having to do with these issues, and note what their own contracts say in order to avoid costly lawsuits, punitive legal action, and, as importantly as the legal issues, hurting their relationships with their customers. Like AI itself, the laws and court decisions about AI will change rapidly.