Swiss Sovereign LLM Hosting Apertus
What Swiss sovereign hosting delivers
CH-datacenter Apertus inference
Apertus 8B or 70B serves from a Swiss-resident GPU cluster on Exoscale, Infomaniak or AWS Zurich, via the open serving stack — vLLM or Text-Generation-Inference behind a private endpoint. No prompt, no response and no embedding ever leaves Swiss soil. That is what separates real swiss llm hosting from a CH-themed front door over a US backend.
Data-residency contract clauses
We draft data-residency addenda that name the primary inference region, the backup region, the log destinations and the operator personnel who can touch them. The clauses sit beside the engagement contract as factual deployment commitments, not legal advice. Your counsel keeps the final word; the wording matches what the deployment does.
FINMA, cantonal, healthcare posture
For FINMA-bound banks, cantonal administrations and MDR or IVDR-bound health workloads, we implement Apertus under the controls those regimes require: documented data flow, named processors, retention windows and access logs. SAPIENTROQ does not hold FINMA, MDR or IVDR certification — we deploy under the regime when a client carries it.
Monitoring and observability in CH
Metrics, traces and request logs land on Swiss-resident storage; dashboards and alerting are hosted on the same CH footprint as the model. Incident timelines, prompt audits and observability data stay inside the same residency envelope as inference itself, so a forensic review never has to follow a request into a foreign region.
Backup data residency in CH
Backups of model weights, vector indexes and request stores replicate inside Switzerland only — primary in one CH region, secondary in another. The contract names both. Disaster recovery does not cross a border, so ch data residency llm posture survives an outage and a routine recovery drill, not only a healthy day on the live cluster.
Sovereign-routing audit
We audit the full request path — load balancer, API gateway, model proxy, telemetry collector — for any hop that resolves to a US-routed endpoint. The output is a routing map, the evidence that no US gateway sits between client browser and Apertus inference, plus a remediation list. Evaluation and POC reuses the same map.
How CH hosting ships
Residency scoping
We map the regimes that apply — FINMA, cantonal data-protection, MDR or IVDR — and the dataset classes in scope. The output is a residency scope: what stays in CH, what may leave, and what the contract has to commit to in writing.
CH host selection
We compare Exoscale, Infomaniak and AWS Zurich against the scope — GPU SKU for Apertus 8B or 70B, network topology, backup region and operator location. We pick the host that fits the workload, not the loudest brand.
Contract drafting
We draft the data-residency addendum: primary region, backup region, telemetry destination, named operators and retention windows. Factual deployment language describing the build, handed to your counsel as a basis.
Infrastructure build
We provision the GPU cluster, the serving stack, the private endpoint and the network controls. Apertus runs on vLLM or TGI; the app spine — Laravel and Next.js, PostgreSQL with pgvector — sits in the same CH region.
Monitoring wiring
Metrics, traces and prompt audit logs are wired into Swiss-resident sinks. Alerting reaches your on-call via CH-resident channels. Every request carries a trace ID that anchors it to deployment and residency clause.
Compliance handover
We hand the deployment to your compliance and audit functions with a residency map, a routing audit, the contract clauses, the operator list and the log retention policy. A regime review starts from evidence.
We map the regimes that apply — FINMA, cantonal data-protection, MDR or IVDR — and the dataset classes in scope. The output is a residency scope: what stays in CH, what may leave, and what the contract has to commit to in writing.
We compare Exoscale, Infomaniak and AWS Zurich against the scope — GPU SKU for Apertus 8B or 70B, network topology, backup region and operator location. We pick the host that fits the workload, not the loudest brand.
We draft the data-residency addendum: primary region, backup region, telemetry destination, named operators and retention windows. Factual deployment language describing the build, handed to your counsel as a basis.
We provision the GPU cluster, the serving stack, the private endpoint and the network controls. Apertus runs on vLLM or TGI; the app spine — Laravel and Next.js, PostgreSQL with pgvector — sits in the same CH region.
Metrics, traces and prompt audit logs are wired into Swiss-resident sinks. Alerting reaches your on-call via CH-resident channels. Every request carries a trace ID that anchors it to deployment and residency clause.
We hand the deployment to your compliance and audit functions with a residency map, a routing audit, the contract clauses, the operator list and the log retention policy. A regime review starts from evidence.
Why a contract beats a region label
Sovereignty is a contract, not a hosting region
A Swiss IP address on the marketing page does not make a deployment sovereign. As Roland Kurmann puts it: "Sovereignty isn't a hosting region. It's a contract that names the disks, names the backups, names the people who can touch them — and a deployment that doesn't leak a single request through a US-routed API gateway on the way to inference." That is the working definition we deploy against, and the line that decides which Swiss hosting offers are real and which are CH front doors over a foreign backend.
No US gateway in the request path
Many "Swiss" LLM offerings front a US-routed API gateway with a CH endpoint. The first hop leaves the country before inference happens, and the residency posture is gone. We audit the full routing map end-to-end so finma compliant llm hosting actually means the request, the response and the logs stayed inside Switzerland — verifiable in the trace data, not just asserted on a slide.
The difference from on-prem
If you carry a datacenter footprint and want every byte under your own roof, on-prem Apertus deployment is the right shape. Sovereign hosting is the option for clients who must keep data inside Switzerland but cannot operate their own GPU fleet — a CH datacenter and a written contract carry the residency posture instead of metal you run yourself.
What gets hosted, and where to start
Most workloads on swiss sovereign llm hosting fall into two shapes: Apertus RAG over your private corpus and a document Q&A copilot. If you are still deciding whether Apertus fits your workload, start with an evaluation and paid POC, or open an AI consulting discovery. The Apertus hub covers the full track from first call to production.
Frequently Asked Questions
Every byte of inference, every backup and every log stays on Swiss-resident infrastructure under a written contract. The contract names the primary region, the backup region, the operators and the retention windows. No US-routed gateway sits between client and model.
We deploy on Exoscale (CH), Infomaniak (CH) or AWS Zurich (regional). Choice depends on GPU availability for the 8B or 70B variant, network topology and how strict the regime is on operator location. Exoscale and Infomaniak are Swiss-owned; AWS Zurich is a regional landing zone.
The clause names the primary inference region, the backup and DR region, the telemetry and log destinations, and the personnel who can touch them. Training data is in scope only if you fine-tune; for inference and RAG, corpus residency is covered in the addendum.
We implement the controls those regimes require — documented data flow, named processors, retention windows, access logs and breach paths. SAPIENTROQ does not hold FINMA, MDR or IVDR certification; we deploy under the regime the client carries, with audit evidence.
On-prem runs Apertus inside your own datacenter — every byte under your roof, your operators. Sovereign hosting runs it on a third-party CH datacenter with a written residency contract. Same posture, different operator. On-prem fits a GPU fleet; hosting fits clients without one.
Metrics, traces, prompt audit logs and incident timelines land on Swiss-resident storage; dashboards and alerting run on the same CH footprint as the model. Forensic review never follows a request abroad — observability lives in the same residency envelope as inference itself.
About SAPIENTROQ
Interested in a solution?
We are glad to show you various options without any obligation.

Roland Kurmann
CEO, SAPIENTROQ