Data & Privacy
An AI feature moves data — often sensitive data — to places it never went before. Privacy is not a legal afterthought; it’s an architecture decision you make when you choose where the model runs and what you put in the prompt.
Where your data goes
Section titled “Where your data goes”When you call a hosted model API, the prompt leaves your environment — and the prompt contains whatever user data you put in it.
Before sending data to any provider, know the answers to:
- Do they train on it? Most business and API tiers don’t, or let you opt out — but verify it in the contract, don’t assume.
- How long is it retained? Providers often keep request data for a window (abuse monitoring). Know the period and whether you can shorten it.
- Where is it processed? Region matters for data-residency rules.
- Who are the sub-processors? Your provider’s vendors touch your data too.
This lives in the provider’s data processing agreement. Read it, or have someone read it, before sensitive data flows.
Handle PII deliberately
Section titled “Handle PII deliberately”PII (personally identifiable information) — and its stricter cousins like PHI (health) and financial data — needs active handling, not hope.
- Classify what’s sensitive before you build: names, contacts, identifiers, health, financial, credentials, proprietary content.
- Minimize. Send the model only what the task needs. The cheapest data to protect is the data you never transmitted.
- Redact or tokenize. Where feasible, mask or replace PII before it reaches the model, then re-insert it in the response. The model rarely needs the real social security number to do its job.
- Never put secrets in prompts. API keys, passwords, internal tokens — a prompt is not a vault, and it may be logged downstream.
Regulated data and compliance
Section titled “Regulated data and compliance”If you handle regulated data, the rules are not optional. You don’t need to be a lawyer — you need to know the shape and involve one early.
- GDPR (and similar regimes) — a lawful basis for processing, data-subject rights (access, deletion), and limits on cross-border transfer.
- HIPAA and sector rules — health, finance, and education data carry specific obligations and require appropriate agreements with any processor.
- The EU AI Act and emerging AI regulation — risk-tiered obligations, with transparency duties (such as disclosing AI interaction and labeling AI-generated content) and stricter requirements for “high-risk” uses like decisions about employment, credit, or essential services.
The engineering takeaway: know which category your data and use case fall into before you design the system — it determines whether a hosted API is even allowed.
Logging is a privacy surface
Section titled “Logging is a privacy surface”Easy to forget: your observability pipeline logs full prompts and responses — which means it stores a second copy of every piece of sensitive data the model saw. Treat those logs as sensitive systems: restrict access, set retention limits, and redact PII from them just as you would from the prompts themselves.
When to self-host
Section titled “When to self-host”If a data-residency, contractual, or regulatory requirement means data cannot leave your environment, the answer is to run an open model on infrastructure you control, so no prompt ever crosses the boundary. That’s a real cost and operational burden — but for some data it’s the only compliant option.
Be straight with users
Section titled “Be straight with users”Privacy is also a product surface. Tell users what data an AI feature uses and where it goes, honor deletion requests through to the logs, and don’t quietly feed user content into training. Trust, once lost here, doesn’t come back.
Key takeaways
Section titled “Key takeaways”Calling a hosted API sends your prompt — and the user data in it — across a trust boundary; verify the provider’s training, retention, region, and sub-processor terms first. Handle PII deliberately: classify it, minimize it, redact it, and never put secrets in prompts. Know your compliance category (GDPR, HIPAA, the EU AI Act) before designing, because it can rule out hosted APIs entirely. Logs are a second copy of sensitive data — secure them. When data legally can’t leave, self-host an open model.