Skip to content
About

Data & Privacy

An AI feature moves data — often sensitive data — to places it never went before. Privacy is not a legal afterthought; it’s an architecture decision you make when you choose where the model runs and what you put in the prompt.

When you call a hosted model API, the prompt leaves your environment — and the prompt contains whatever user data you put in it.

Your environment Your application user data · PII · secrets trust boundary Redact / minimize Model provider verify their terms

Before sending data to any provider, know the answers to:

  • Do they train on it? Most business and API tiers don’t, or let you opt out — but verify it in the contract, don’t assume.
  • How long is it retained? Providers often keep request data for a window (abuse monitoring). Know the period and whether you can shorten it.
  • Where is it processed? Region matters for data-residency rules.
  • Who are the sub-processors? Your provider’s vendors touch your data too.

This lives in the provider’s data processing agreement. Read it, or have someone read it, before sensitive data flows.

PII (personally identifiable information) — and its stricter cousins like PHI (health) and financial data — needs active handling, not hope.

  • Classify what’s sensitive before you build: names, contacts, identifiers, health, financial, credentials, proprietary content.
  • Minimize. Send the model only what the task needs. The cheapest data to protect is the data you never transmitted.
  • Redact or tokenize. Where feasible, mask or replace PII before it reaches the model, then re-insert it in the response. The model rarely needs the real social security number to do its job.
  • Never put secrets in prompts. API keys, passwords, internal tokens — a prompt is not a vault, and it may be logged downstream.

If you handle regulated data, the rules are not optional. You don’t need to be a lawyer — you need to know the shape and involve one early.

  • GDPR (and similar regimes) — a lawful basis for processing, data-subject rights (access, deletion), and limits on cross-border transfer.
  • HIPAA and sector rules — health, finance, and education data carry specific obligations and require appropriate agreements with any processor.
  • The EU AI Act and emerging AI regulation — risk-tiered obligations, with transparency duties (such as disclosing AI interaction and labeling AI-generated content) and stricter requirements for “high-risk” uses like decisions about employment, credit, or essential services.

The engineering takeaway: know which category your data and use case fall into before you design the system — it determines whether a hosted API is even allowed.

Easy to forget: your observability pipeline logs full prompts and responses — which means it stores a second copy of every piece of sensitive data the model saw. Treat those logs as sensitive systems: restrict access, set retention limits, and redact PII from them just as you would from the prompts themselves.

If a data-residency, contractual, or regulatory requirement means data cannot leave your environment, the answer is to run an open model on infrastructure you control, so no prompt ever crosses the boundary. That’s a real cost and operational burden — but for some data it’s the only compliant option.

Privacy is also a product surface. Tell users what data an AI feature uses and where it goes, honor deletion requests through to the logs, and don’t quietly feed user content into training. Trust, once lost here, doesn’t come back.

Calling a hosted API sends your prompt — and the user data in it — across a trust boundary; verify the provider’s training, retention, region, and sub-processor terms first. Handle PII deliberately: classify it, minimize it, redact it, and never put secrets in prompts. Know your compliance category (GDPR, HIPAA, the EU AI Act) before designing, because it can rule out hosted APIs entirely. Logs are a second copy of sensitive data — secure them. When data legally can’t leave, self-host an open model.