Most services agreements for vendor-provided technology services contain standard provisions allowing vendors to use customer data and data generated through the provision of services to improve and enhance service offerings. Vendors are increasingly seeking express rights to use such data to not only improve their services but also train their AI models. While these provisions seem to be a natural extension of traditional service-improvement rights, they can have significantly broader implications. Before agreeing to such language, organizations should carefully evaluate how customer data will be used, the extent of the rights being granted, and whether the potential benefits outweigh the risks.
Understanding What ‘Training on Your Data’ Means
AI vendors may seek rights to use customer data for several purposes, including:
- Training foundation models or large language models
- Fine-tuning models to improve performance
- Developing new products and features
- Improving accuracy, safety, and reliability of existing systems
- Creating aggregated datasets for analytics or benchmarking
Not all data use is the same. Some vendors use customer data solely to provide services to that customer while others may seek broader rights to incorporate the data into models that benefit all customers. Understanding exactly how data will be used is the first step in evaluating the associated risks.
Data Ownership Does Not Equal Data Control
Organizations should carefully review contractual language governing:
- Training and model improvement rights
- Creation of derivative works
- Aggregated and anonymized data rights
- Intellectual property ownership of model outputs
- Rights that survive termination of the agreement
In practice, the critical issue is often not who owns the underlying data but what rights the vendor receives to use, retain, and derive value from that data over time. Even if a company retains ownership of its raw data, broad training rights may permit a vendor to create models that incorporate learnings derived from that data indefinitely. Organizations should also consider whether applicable data protection, regulatory, or contractual obligations restrict the use of certain categories of data for AI training purposes.
Confidentiality and Trade Secret Risks
One of the most significant concerns is the potential exposure of confidential information and trade secrets. Organizations whose competitive advantage depends on proprietary information should carefully assess whether the value of allowing model training outweighs the potential loss of exclusivity.
Questions to consider include:
- What categories of data will be used for training?
- Are confidential or trade secret protections maintained?
- Can training be restricted to specific datasets?
- What technical controls exist to prevent data leakage?
Technical Safeguards Matter
The legal terms are only part of the analysis. Technical safeguards are equally important. Organizations should understand:
- Whether data is segregated between customers
- Whether training occurs on shared or dedicated models
- Whether the provider uses third-party foundational models
- Whether customer data can be excluded from future training cycles
- Whether trained models can be retrained or purged upon request
Vendors that offer opt-in or opt-out controls, customer-specific model instances, or strict data segregation may present a lower risk than those operating entirely on a shared infrastructure.
Contractual Protections to Consider
When negotiating agreements involving AI model training, organizations may consider seeking provisions that:
- Prohibit training on customer data entirely
- Require affirmative opt-in consent before training occurs
- Limit training to specific categories of data
- Exclude confidential, proprietary, or sensitive information from training
- Restrict use of data to customer-specific model improvements
- Require data anonymization before training
- Provide transparency regarding training practices
- Permit audits or compliance reporting
- Require deletion or cessation of training activities upon termination
The appropriate approach will depend on the organization’s risk tolerance, the nature of the data involved, and the specific use case.
How We Can Help
As AI technologies become more prevalent, organizations are increasingly being asked to permit vendors to use their data to train AI models. Evaluating these requests requires careful consideration of contractual rights, technical safeguards, intellectual property issues, and confidentiality risks. Our team advises clients on AI contracting and data governance matters, including negotiating data-use provisions, assessing vendor AI practices, and implementing safeguards that align with business objectives and risk tolerance.