Every inference request comes with a compliance certificate

“This sounds awesome. Do you host in the EU too?”

“Yes, you can run the models on Bedrock and then route there.”

Prospect politely ends the conversation and walks away.

We’ve watched this play out at industry conferences for years. There are variations. Just replace the region. Answering “do you host in X?” with “yes, but you have to operate your own infrastructure” is a lost deal.

Casola routes AI inference requests across region-specific GPU workers. Behind a single API, the platform handles scheduling, capacity management, and regional dispatch for LLM, image, video, and voice workloads, so you don’t have to.

The three options that don’t work

Data residency gets solved in one of three ways: handwaving, provider home region, or dedicated infrastructure.

The first is handwaving. A data processing addendum, a diagram in a whitepaper, a checkbox in a compliance portal. None of it is verifiable. Your legal team flags it, solutions decides it’s probably fine, and the deal proceeds with fingers crossed. When an audit lands, you’re negotiating with customer support about how much paperwork is acceptable. It doesn’t hold up.

The second is provider home region. Some providers have clear policies; they just restrict themselves to their home region. That works if your customers are in the same region. A global customer base means integrating multiple providers, each with their own APIs, contracts, and quirks. Corner cases and audit coordination add up fast.

The third is dedicated infrastructure. Your jobs only run on machines you’ve provisioned in the target region. Real isolation, but it comes with a minimum commitment that puts it out of reach for most workloads, and adds back the capacity management overhead you were trying to avoid. Large enterprises sometimes choose it for legal reasons; most can’t justify the price.

We wanted a fourth option: verifiable guarantees that don’t require owning the hardware.

Every request comes with a compliance certificate

Every job processed by Casola gets a UUIDv7 when it’s created. UUIDv7 encodes a millisecond timestamp in the high bits for time-ordered IDs. Byte 8’s variant field leaves 6 bits available under RFC 9562. We put the region code there.

# UUID byte 8 layout
bits 7–6:  variant (10, RFC 9562 requirement, always set)
bits 5–0:  casola region code (6 bits → 64 possible regions)

encode:  byte8 = 0x80 | (region_code & 0x3f)
decode:  region_code = byte8 & 0x3f

# Assignments
0 = none, 1 = iad, 2 = sjc, 3 = fra
4 = nrt, 5 = sin, 6 = syd

Macro regions like us, eu, and ap are routing labels. The UUID stores the leaf queue POP that actually processed the job.

Parse any Casola job ID and you know which region processed it. No database query. No API call. No access to Casola infrastructure required.

If job UUIDs are exposed to your customers (in your own product, in logs you share, in audit exports) they can self-audit every request. A customer verifying EU data residency can script the UUID parsing themselves. No trust required.

Every downstream operation (status poll, result fetch, streaming connection) derives the correct regional queue from the job UUID. The routing table is the ID. There’s no separate lookup, no routing state, no session data that could drift or be misconfigured independently.

Three independent layers of regional isolation

The region code in the job UUID determines where everything downstream lives. Three storage layers, all regional, all independent.

Architecture diagram: control plane at top with auth, billing, and settings boxes; data plane below with three regional columns (US, Global, and EU) each containing a router, queue, storage, and logger

Job queue. Job payload, status, intermediate state, and final results live in the queue for the job’s region. The EU and US queues are separate deployments with separate namespaces. No shared storage, no replication between them.

Object storage. Generated images, video, audio: all media outputs land in a region-scoped storage bucket.

Worker logs. GPU worker logs are also region-scoped. The operational record of what happened to your job lives alongside the job.

The control plane (auth tokens, org config, billing, model catalog) lives in a globally replicated database. It has to; every edge location needs to authenticate requests quickly. But job content never touches the control-plane database. There are no columns for it. The code paths don’t exist. The isolation isn’t enforced by access controls; there’s no code path that could move job content into the control plane.

The enforcement boundary

The UUID attests routing, but it doesn’t prevent a misconfigured client from sending requests to the wrong region. So we went a step further.

Every organization in Casola can have an allowed_regions constraint: the list of regions their jobs are permitted to route to. The API enforces it before any request is handled:

# API middleware
if region not in org.allowed_regions:
    return 403 Forbidden
# route and handle

An explicit wrong-region request returns a 403 immediately. A request with no region preference routes to the optimal allowed region.

This gives you routing certainty even with third-party clients. Your existing OpenAI or fal clients get data locality guarantees without modifications.

The allowed_regions check is the admission gate. Once a job is admitted, its region is fixed in the UUID. There’s no subsequent step that can move it.

Sometimes org-level constraints aren’t granular enough. If different customers on the same platform have different residency requirements, you need per-request routing. Casola supports this through a request header: specify the target region on any request, and the API enforces it against the org’s allowed_regions before handling the job. Existing clients work without modification; add the header upstream.

Agent workflows and the chain contamination problem

Single-request residency is tractable: one job, one routing decision, one UUID.

Agent workflows are different. A multi-step agent can have dozens of stages: a planning step, parallel tool calls, synthesis, followup loops. Each stage is a job. Each job is a routing decision. With 20 stages, you have 20 opportunities to violate data residency. A violation anywhere in the chain contaminates everything downstream. If stage 14 of a 20-stage EU workflow routes to a US worker, stage 15’s input contains content that touched US infrastructure. The violation propagates forward through every subsequent step.

In Casola, this is solved at the UUID generation layer. When an agent spawns sub-jobs, each sub-job UUID is generated with the parent job’s region code. The entire workflow inherits the root job’s region. Sub-jobs, intermediate results, fan-out branches, fan-in assembly: all of it stays in the same regional storage. The guarantee isn’t applied per-node by a policy check that could be missed or misconfigured. It falls out automatically from how UUIDs are generated.

Multi-stage agent DAG where all Casola nodes carry the same EU region code inherited from the parent UUID, with one node making a dotted-line external call to an external API that sits outside the regional boundary

Not every workflow is hermetic. A practical agent might call an external search API, a third-party tool, or a customer’s own service. Casola can’t control what those services do with data.

Every UUID tells you exactly where that node ran. The external calls are the tractable exception: instead of auditing a 20-stage workflow end-to-end, you audit the 1–2 external integrations. The compliance surface shrinks dramatically.

What a compliance audit actually looks like

Take any job ID from your logs. Decode byte 8: bits 5–0 give you the region code. That tells you which queue held the job, which storage bucket holds the output, and which log storage holds the worker record. For agent workflows, every sub-job UUID in the trace carries the same region code as the root. You get a complete regional audit trail from the IDs alone.

An EU-only org produces only EU job IDs, with every job and workflow running on EU workers. The same holds for any other region, or combinations.

Data residency should be a structural property, not a paperwork exercise. Your customers shouldn’t need dedicated infrastructure to answer “where is my data?” so we built that answer into every request.