When you send text to an LLM API in a business context, the core privacy question is: does the provider use that text to train future model versions? For the major business APIs (OpenAI, Anthropic, Mistral), the answer is no — API calls are not used for training by default. Consumer products are different. This post covers what actually happens to your data and how to build a practical policy.
What Happens to API Calls
OpenAI API
Data sent to the OpenAI API (not ChatGPT consumer) is not used to train models by default. OpenAI's usage policy for API customers distinguishes between API usage and consumer product usage. API calls may be retained for up to 30 days for safety monitoring and abuse detection, after which they are deleted.
OpenAI offers a zero data retention option for enterprise customers. Under this arrangement, inputs and outputs are not stored after the API call completes. This satisfies stricter compliance requirements for regulated industries.
Consumer ChatGPT (the web and mobile product) has different data practices. Conversations can be used for training unless users opt out in settings. If your team is using consumer ChatGPT rather than the API, the data practices are different from the API.
Anthropic API
Anthropic does not train on Claude API inputs and outputs by default. API calls may be retained for a short period for trust and safety review, with deletion after that window.
Anthropic's Claude.ai consumer product has separate terms. Users can disable conversation history in settings, which prevents that data from being used for training.
Mistral API
Mistral follows a similar pattern. API calls are not used for training. Mistral is a European company subject to GDPR, which provides additional legal constraints on data use.
Google (Gemini API via Vertex AI)
Data processed through Google Cloud Vertex AI (the enterprise API path) is not used to train models. Data processed through the consumer Google AI Studio interface may be treated differently.
What "Zero Data Retention" Means and When You Need It
Zero data retention (ZDR) means the provider does not store your inputs or outputs after the API response is returned. Standard API retention (where data is kept for 30 days for safety monitoring) is acceptable for most use cases. ZDR is appropriate when:
- Your compliance requirements prohibit any third-party retention of specific data types (PHI under HIPAA, classified information, certain financial data under SEC regulations)
- Your contracts with clients prohibit sharing their confidential information with any subprocessor
- You are processing information subject to attorney-client privilege
ZDR is available from OpenAI as an enterprise feature and from some other providers on custom enterprise agreements. If ZDR is a requirement, confirm it explicitly in your vendor contract — do not assume it applies based on general policy language.
On-Premise and Private Cloud Options
When the external API data practices are not acceptable for your workload, you have three practical options.
Self-Hosted Open Models (Llama, Mistral)
Meta's Llama 3 and Mistral's models are available under open licenses that permit local deployment. You run the model on your own hardware. No data leaves your infrastructure.
The capability trade-off is real but manageable. Llama 3.3 70B running on appropriate hardware approaches GPT-4o quality for many tasks. Smaller 7B and 8B models are significantly weaker but fast and resource-efficient.
Hardware requirements: 7B models run on consumer GPU hardware (16GB VRAM). 70B models require a server with 80GB+ VRAM (e.g., two A100s) or CPU inference with significant RAM (64GB+). CPU inference is slower but feasible for batch processing workloads.
Azure OpenAI Service
Microsoft Azure offers OpenAI models (including GPT-4o) through Azure's infrastructure. Data processed through Azure OpenAI stays within your Azure tenant — it is not sent to OpenAI's systems. Azure OpenAI is subject to Microsoft's enterprise data processing agreement and can be configured for HIPAA, FedRAMP, and other compliance frameworks.
Azure OpenAI is the practical choice for organizations already in Microsoft's ecosystem who want GPT-4o quality without public API data practices. The models are the same; the hosting and data path are different.
AWS Bedrock
AWS Bedrock offers multiple model families (Anthropic Claude, Meta Llama, Mistral, and others) through AWS infrastructure. Like Azure OpenAI, data stays within your AWS account. AWS Bedrock fits well for organizations with existing AWS footprint and enables mixing model providers through a single API.
Risk Categories: What to Classify Before Writing Policy
Before writing an LLM data policy, classify the types of content your team works with:
Personally identifiable information (PII). Names, email addresses, phone numbers, physical addresses, government ID numbers. PII handling may be governed by GDPR, CCPA, or sector-specific regulations. Most organizations should prohibit sending raw PII to external LLM APIs, or require it to be redacted or pseudonymized first.
Customer confidential information. Contract terms, pricing, strategic plans, or information that clients have shared with you under confidentiality agreements. Your client contracts may prohibit sharing this with third parties, including AI API providers.
Source code. If your contracts include IP ownership clauses or if your code contains trade secrets, sending it to an external API may create legal risk. This is a common concern in software companies with clients who own their custom code.
Internal strategic information. Business plans, financial projections, M&A discussions, personnel decisions. The concern here is competitive — even if the provider does not use the data for training, the data is transmitted over a network and stored temporarily on provider systems.
Public information used for analysis. News articles, publicly available competitor information, industry reports. This category typically has low risk for external API use.
Building an LLM Policy for Your Team
A practical policy addresses: which types of content can be sent to which services.
Example policy structure:
Category 1 — Approved for any LLM API: public information, general knowledge tasks, code that is not proprietary, internal documents marked "general"
Category 2 — API only (not consumer products), with logging: internal strategy documents, non-PII customer information, proprietary code (if contract allows)
Category 3 — On-premise or private cloud only: PII, PHI, classified information, content governed by confidentiality agreements that prohibit third-party sharing
Category 4 — Prohibited for LLM use: attorney-client privileged communications, board-level M&A discussions, government classified information
The policy should also address: which consumer products employees may use for work tasks, how to handle the fact that many employees already use consumer ChatGPT personally, and what to do when a task falls between categories.
Implementation requires training, not just documentation. Most data leakage through LLMs is accidental — an employee pastes a document to get a summary without considering whether the document contains restricted information. Brief training on the categories and a quick reference card reduces accidental disclosure.
Keep Reading
- Local LLM for Privacy Guide — running models locally for sensitive workloads
- LLM Safety and Alignment Explained — how model safety works alongside data privacy
- Anthropic API Guide — Claude API data practices and configuration options
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.