How We Built a Compliance-First AI Feature That Enforces Data Boundaries by Design

Security and Compliance answers live in the worst possible places: buried in audit reports, scattered across policy PDFs, locked behind NDAs. The people who need those answers fastest aka. sales teams mid-deal, customer success handling renewals... are usually the farthest from the source.

At Orbiq, we built Slack Ask so teams can type /orbiq ask in any channel and get sourced, access-controlled answers from their trust center in seconds. What looks like a simple Slack command on the surface is actually a compliance-aware retrieval pipeline with strict privilege boundaries, staged evidence scoring, and data-minimized model routing.

This post walks through how we designed it, the regulatory requirements that shaped our decisions, and why "just wrap a prompt" was never an option.

Three surfaces, one platform

We didn't start with three ask surfaces. We started with one, then learned that different contexts demand different trust boundaries.

Today, Orbiq has three entrypoints into the same underlying ask infrastructure:

Ask API — a Q&A API for users, agents and integrations. Used by compliance teams and automated workflows that need structured, typed responses e.g. multi select, select, boolean, text options are all supported. Designed with auditability in mind: every ask is logged with its context, requester, and outcome.

Trust Center Answer — contact-scoped answers for approved external contacts visiting a company's trust center portal. Every response respects the contact's access level, NDA status, and document assignments — enforcing the principle of least privilege at the content layer.

Slack Ask — conversational Q&A from Slack with strict evidence gating. Designed for speed, but never at the cost of surfacing something the asker shouldn't see. Supporting multiple questions in a single ask with the warning policy regarding the sensitivity of the doc e.g. "This document contains NDA-sensitive information make sure you have one in place" ...

All three converge on the same platform, but they differ intentionally in authentication, response contracts, and how aggressively they optimize for latency. Crucially, they share the same access control enforcement, so compliance properties hold regardless of the surface.

Layered privilege model

When you're generating answers from sensitive compliance documents, the auth model has to be more than "check a token at the edge." We enforce multiple layers of privilege, each scoped to limit blast radius following the principle of defense in depth.

Contact claims

In addition to the standard tenant claims, we augment every public Trust Center request with contact-level claims covering NDA status, session freshness, and document access grants.

Domain-isolated authentication

Each service domain has dedicated, isolated authentication. Compromise of one domain's credentials does not grant access to others, limiting lateral movement in line with zero-trust principles.

Tenant-scoped data access

Every data read requires a tenant-specific credential fetched from a secrets manager at runtime. If the credential is missing or revoked, the system fails closed. This supports security best practices for access control measures proportionate to the risk, and ensures that even infrastructure-level issues cannot silently widen data access.

Slack ingress verification

Slack commands and events go through signature verification with timestamp freshness checks to prevent replay attacks, following Slack's own security best practices for app verification.

Where we say "no" — leak prevention before the model sees anything

Most AI systems try to prevent data leaks through prompt engineering. We enforce it architecturally in the retrieval layer, before any content reaches the model context window. This distinction matters for compliance: prompt-level controls are not auditable safeguards, but retrieval-level enforcement is.

Contact gating. For trust center answers, the system loads the contact's access profile and rejects requests from non-approved contacts before retrieval begins.

Access-level enforcement on documents. We classify every document with an access level, and enforce it before evidence text enters the model context:

Internal — never surfaced externally, period.
Public — shared with the world.
Restricted — shareable only if the document is explicitly assigned to the requesting contact.
Requires NDA — shareable only if the contact has a completed NDA.

This classification maps directly to how enterprises manage document sensitivity under frameworks like ISO 27001 and SOC 2, making it natural to integrate Orbiq into existing information classification policies.

Search-level filtering for external users. Even the initial retrieval query excludes internal content at the filter level. Documents that shouldn't be accessible are never candidates for retrieval.

Sensitivity flagging for internal users. When an answer references NDA-protected or internal-only content, Slack Ask flags it in the response before anyone can copy-paste it into an external conversation. This serves as a procedural control for preventing inadvertent disclosure.

Response contracts: technical compliance

Not every surface can afford to wait for generation to finish, and not every surface should pretend answers are instant.

Ask API: optimistic sync with async fallback

When a human or agent submits a question via the Ask API, the system creates a record, queues the job, and attempts a fast-path response. If the answer resolves quickly, it returns synchronously. Otherwise, it returns an accepted status with a reference ID for polling.

This gives fast-path DX when generation is quick, without blocking the client on long-tail requests.

Trust Center Answer: async-first by design

External-facing answers always return asynchronously with an answer ID. The frontend polls for status, and the server updates the record when generation completes.

This keeps portal request lifetimes predictable and avoids timeout issues for external users on slower connections. It also means the trust center frontend can show meaningful loading states instead of a hanging spinner.

Following the Open Responses standard

Our response contracts align with the Open Responses specification, the open-source standard for building multi-provider, interoperable LLM interfaces. Open Responses defines a shared schema for request/response patterns, streaming events, and tool invocation across providers. By adopting it, our response patterns follow a portable contract that any client or integration can implement against, regardless of which model provider sits behind the pipeline.

Staged retrieval: data minimization by design

The Slack Ask pipeline doesn't run a single monolithic retrieval step. It runs progressively through stages, and stops early when it has enough evidence.

Stage 1: Knowledge base. Search the structured knowledge base for direct matches. If the answer is well-covered by existing Q&A pairs or policy statements, we score and potentially return here.

Stage 2: Document metadata. If knowledge base evidence is insufficient, search document metadata for relevant sources. This is cheaper than full-text retrieval but often enough to identify the right documents.

Stage 3: Reranked document snippets. Only if the first two stages don't produce high-confidence evidence do we perform full snippet extraction, reranking, and deep context assembly.

Each stage produces an evidence score. If a stage clears the confidence threshold, the pipeline short-circuits. For common compliance questions like "Are you SOC 2 certified?", "Where is data hosted?", "Do you support SSO?", the answer often resolves at the earliest stage, meaning the model only processes what's strictly necessary to produce a sourced answer.

Model routing and observability

We route to different models based on the question type and desired output format:

Short, factual questions route to a fast, cost-efficient model.
Questions that benefit from deeper reasoning route to a thinking-optimized model.
Default questions go through a fuller generation flow.

We route to different models based on the question type and desired output format. The routing layer is provider-agnostic, so we can adjust model selection per surface, per tenant, or per regulatory requirement without code changes.

The important part is observability parity. Every model call is traced with the same instrumentation:

Provider, model, and token usage metadata are captured per call.
Generation and embedding spans are exported to our observability platform.
Confidence scores are captured for ongoing quality monitoring.

This gives us auditability across the pipeline. When we evaluate a new model version, we can measure exactly how answer quality, latency, and cost distributions shift. We can then document the decision for compliance reviews.

On-demand OCR with deduplication

Not every document in a trust center has machine-readable text. Some are scanned PDFs, some are image-heavy compliance certificates.

When the pipeline encounters a document without extracted text, it triggers on-demand OCR with resource limits to prevent abuse. Extracted text is persisted for future reuse, avoiding redundant processing.

Confidence as a first-class output

Every answer carries a confidence score, normalized and persisted alongside the answer. Confidence is derived from multiple signals depending on the retrieval path, including model output, evidence quality heuristics, and stage evaluation outcomes.

We write confidence to both storage and our observability platform, which lets us monitor reliability trends over time and build product controls on top.

For example, showing a "low confidence" indicator in the UI, requiring human review before an answer is shared externally, or flagging answers that fall below a tenant-configured threshold. For regulated industries, this creates a reviewable decision trail. The system doesn't just answer, it documents how certain it was and why.

What this architecture gives us

We could have built Slack Ask as a thin wrapper around a retrieval-augmented generation call. It would have shipped faster. It also would have leaked data the first time someone asked about a document they shouldn't have access to.

Instead, we treat "ask" as a compliance-sensitive distributed workflow:

Strict privilege boundaries with layered auth from edge to data layer.
Leak prevention at the retrieval layer, architecturally enforced.
Data minimization through staged retrieval that limits what the model sees.
Latency-aware response contracts that match each surface's UX and compliance needs.
Auditability through consistent observability, confidence scoring, and decision trails.
Provider flexibility to meet data residency and sovereignty requirements per tenant.

If you're building AI features on top of sensitive data, the architecture around the model matters as much as the model itself. The hard part is proving to your customers, auditors, and regulators that the system handles their data responsibly.

We chose EU-sovereign models

One decision we haven't touched on yet: model provider selection. For our default production pipeline, we opted for Mistral, an EU-headquartered, EU-sovereign model provider.

For an EU-first platform serving European enterprises navigating NIS2, DORA, and GDPR requirements, data sovereignty is a compliance requirement. Our customers' data stays within EU jurisdiction by default, and our architecture supports per-tenant provider configuration for organizations with specific residency requirements.

Our architecture makes this practical. Provider selection is configurable per surface and per tenant. Observability parity means we can measure quality across providers precisely, and staged retrieval reduces the model's role — high-quality evidence in, high-quality answers out, regardless of which model does the final generation.

We'd rather ship with full EU data residency than chase benchmarks that don't match our customers' compliance reality.

Orbiq is a trust center platform that helps B2B companies turn security and compliance from sales blockers into competitive advantages. Learn more about Slack Ask or see our trust center in action.