What 'self-hosted' actually means for an AI chatbot.
What "self-hosted" actually means for an AI chatbot.
"Self-hosted" is the most overloaded word in chatbot pricing pages. Four vendors will use it on their landing page and mean four different things. Three of those things will not satisfy your security team.
Before you sign anything — DPA, MSA, sub-processor schedule — make the vendor tell you which tier they actually offer. Below is the taxonomy security and engineering teams quietly use when evaluating these tools. It also happens to be the taxonomy that decides whether your existing compliance program covers the chatbot or whether you start a new audit.
Tier 1 — Cloud SaaS
The vendor runs everything. You have an account on their dashboard. Your conversations sit in their database, in a region they chose, accessed by engineers you have never met.
- Where conversation data physically lives. Vendor's database. Often a multi-tenant Postgres or DynamoDB shared across every customer.
- Who can subpoena it. Anyone with jurisdiction over the vendor.
- Vendor goes down. You go down. Your bot is gone until they restore.
- Compliance inheritance. None. The vendor becomes a sub-processor. New DPA, new vendor questionnaire, new line on your sub-processor disclosure, new audit evidence.
This is what 90% of "AI chatbot" products are. There is nothing wrong with it for low-stakes use cases. It is not self-hosted.
Tier 2 — BYO API key
The vendor still hosts the chatbot. The only thing "yours" is the OpenAI/Anthropic API key. They route inference to your provider account and the inference bill lands on your card instead of theirs.
- Where conversation data physically lives. Still the vendor's database. The API key changes who pays for inference, not who stores the messages.
- Who can subpoena it. Vendor and OpenAI/Anthropic.
- Vendor goes down. You go down. Your key is useless because the application using it is offline.
- Compliance inheritance. None. Same as Tier 1, plus you have now added the inference provider as a second sub-processor.
BYO API key sometimes gets called "self-hosted." It is not. It is a billing arrangement.
Tier 3 — Vendor container in your VPC
The vendor ships you a container or a Helm chart. You run it on your Kubernetes cluster, in your VPC, often air-gapped from the public internet. Logs and conversations sit in databases inside your network.
- Where conversation data physically lives. Your infra.
- Who can subpoena it. You.
- Vendor goes down. You keep running. But you cannot get updates, security patches, or model changes until they are back.
- Compliance inheritance. Partial. The data lives in your environment, which helps. But the container has phone-home telemetry, the vendor often retains shell access for support, and the container itself is a closed binary you cannot audit. Most security teams still treat the vendor as a sub-processor.
This is the version most enterprise vendors mean when they say "self-hosted." It is materially better than Tier 2. It is not the strongest version.
Tier 4 — Truly in your own cloud account
You deploy the application yourself, into a cloud account you already own. Your IAM, your VPC, your database, your logs, your audit trail. The vendor sees nothing because there is no link back to the vendor.
- Where conversation data physically lives. A database in your account. You can drop it, snapshot it, encrypt it with your KMS keys.
- Who can subpoena it. You. The vendor has no copy.
- Vendor goes down. You do not notice. The deployed software is running on infrastructure you control.
- Compliance inheritance. Full. The chatbot becomes a service inside your existing audit boundary. The cloud provider you already named in your DPA is the only sub-processor. No new questionnaire. No new disclosure. Your existing GDPR, HIPAA, and SOC2 controls apply automatically because the data never leaves the perimeter those controls already cover.
This is the only tier where the chatbot inherits your compliance posture for free. It is also the only tier where "the vendor cannot see your data" is a structural fact rather than a privacy-policy promise.
The questions to ask any vendor
Print these. Send them before the demo.
- Where does my conversation data physically reside, by region and by database?
- Do you have read access — direct or through support tooling?
- If you go bankrupt tomorrow, does my chatbot keep running? For how long?
- Are you a sub-processor on my DPA? If yes, do my customers need to be notified?
- Can I export every conversation, embedding, and training document in a format I can import elsewhere?
If the answers do not match the tier you thought you were buying, you are buying the wrong tier.
Where this lands
We built Chatmancer as Tier 4 — a CloudFormation stack that drops the entire chatbot platform into your AWS account in one click. Your conversations sit in your DynamoDB. Your logs sit in your CloudWatch. We do not have a database with your name on a row.
Pick whatever tier matches the data you are putting through it. Just make sure the vendor agrees with you about which tier it is.