Home > Our Thinking > Blogs > Tech & Sourcing @ Morgan Lewis > Key Considerations When Allowing a Vendor to Train Its AI Models on Customer Data

BLOG POST

Tech & Sourcing @ Morgan Lewis

TECHNOLOGY TRANSACTIONS, OUTSOURCING, AND COMMERCIAL CONTRACTS NEWS FOR LAWYERS AND SOURCING PROFESSIONALS

Key Considerations When Allowing a Vendor to Train Its AI Models on Customer Data

Most services agreements for vendor-provided technology services contain standard provisions allowing vendors to use customer data and data generated through the provision of services to improve and enhance service offerings. Vendors are increasingly seeking express rights to use such data to not only improve their services but also train their AI models. While these provisions seem to be a natural extension of traditional service-improvement rights, they can have significantly broader implications. Before agreeing to such language, organizations should carefully evaluate how customer data will be used, the extent of the rights being granted, and whether the potential benefits outweigh the risks.

Understanding What ‘Training on Your Data’ Means

AI vendors may seek rights to use customer data for several purposes, including:

Training foundation models or large language models
Fine-tuning models to improve performance
Developing new products and features
Improving accuracy, safety, and reliability of existing systems
Creating aggregated datasets for analytics or benchmarking

Not all data use is the same. Some vendors use customer data solely to provide services to that customer while others may seek broader rights to incorporate the data into models that benefit all customers. Understanding exactly how data will be used is the first step in evaluating the associated risks.

Data Ownership Does Not Equal Data Control

Organizations should carefully review contractual language governing:

Training and model improvement rights
Creation of derivative works
Aggregated and anonymized data rights
Intellectual property ownership of model outputs
Rights that survive termination of the agreement

In practice, the critical issue is often not who owns the underlying data but what rights the vendor receives to use, retain, and derive value from that data over time. Even if a company retains ownership of its raw data, broad training rights may permit a vendor to create models that incorporate learnings derived from that data indefinitely. Organizations should also consider whether applicable data protection, regulatory, or contractual obligations restrict the use of certain categories of data for AI training purposes.

Confidentiality and Trade Secret Risks

One of the most significant concerns is the potential exposure of confidential information and trade secrets. Organizations whose competitive advantage depends on proprietary information should carefully assess whether the value of allowing model training outweighs the potential loss of exclusivity.

Questions to consider include:

What categories of data will be used for training?
Are confidential or trade secret protections maintained?
Can training be restricted to specific datasets?
What technical controls exist to prevent data leakage?

Technical Safeguards Matter

The legal terms are only part of the analysis. Technical safeguards are equally important. Organizations should understand:

Whether data is segregated between customers
Whether training occurs on shared or dedicated models
Whether the provider uses third-party foundational models
Whether customer data can be excluded from future training cycles
Whether trained models can be retrained or purged upon request

Vendors that offer opt-in or opt-out controls, customer-specific model instances, or strict data segregation may present a lower risk than those operating entirely on a shared infrastructure.

Contractual Protections to Consider

When negotiating agreements involving AI model training, organizations may consider seeking provisions that:

Prohibit training on customer data entirely
Require affirmative opt-in consent before training occurs
Limit training to specific categories of data
Exclude confidential, proprietary, or sensitive information from training
Restrict use of data to customer-specific model improvements
Require data anonymization before training
Provide transparency regarding training practices
Permit audits or compliance reporting
Require deletion or cessation of training activities upon termination

The appropriate approach will depend on the organization’s risk tolerance, the nature of the data involved, and the specific use case.

How We Can Help

As AI technologies become more prevalent, organizations are increasingly being asked to permit vendors to use their data to train AI models. Evaluating these requests requires careful consideration of contractual rights, technical safeguards, intellectual property issues, and confidentiality risks. Our team advises clients on AI contracting and data governance matters, including negotiating data-use provisions, assessing vendor AI practices, and implementing safeguards that align with business objectives and risk tolerance.