Skip to content
AIMOCS

AIMOCS · White papers

White paper

Security and audit for production AI operators

The security model AIMOCS uses for operators that touch money, customer data, or production systems — and the audit trail that makes the model verifiable.

Updated · 2026-05-21

14 min read

every tool credential is workflow-scoped, not tenant-scoped

0broad scopes

of operator actions logged with reasoning and request ID

100%

incident-response detection target for an out-of-bar action

<5min

01Abstract
02What we defend against

The threat model

A production operator is exposed to threats that an internal automation script is not. Five categories matter, and the security model is designed against each:

  1. 01Prompt injection: a customer input containing instructions that the model treats as authoritative. The defense is that the model is not the security perimeter — the tool gateway is.
  2. 02Tool misuse: the operator calls a tool with parameters that produce an unintended effect. The defense is that tools have scoped credentials and parameter validation at the gateway, not just in the prompt.
  3. 03Model drift: a new model version changes the operator's behavior silently. The defense is the frozen-version + regression-suite discipline.
  4. 04Secret exfiltration: the operator or an attacker convinces the operator to disclose a secret value. The defense is that the operator never sees secret values — only environment variables injected at container start, with the audit log recording the request but never the value.
  5. 05Audit-trail tampering: someone with access to the operator tries to retroactively edit the action log. The defense is append-only storage with cryptographic integrity checks and external retention.
03Authorisation model

The signed authority bar

Every operator runs with a signed, versioned specification of what actions it is allowed to take without human approval. The specification names the tools, the parameter bounds, and the customer tier for which auto-action is allowed. It is enforced at the tool gateway: a call outside the bar is rejected and routed to escalation, not silently passed through. The bar lives in the same repository as the operator and changes through code review with the workflow owner.

This is the most important security control because it is the one that fails closed. If the model is jailbroken, if a new prompt injection works, if a tool is misused — the gateway still rejects the call. The model can be wrong; the gateway cannot.

04Blast radius

Container isolation

Each operator runs in its own contained environment per workflow. The container holds the toolchain the workflow needs, the operator's memory of the current run, and nothing else. The host filesystem is not mounted. The host network is firewalled to the specific egress endpoints the operator's tools need. A misfired command can break the container; it cannot reach the host, another operator, or production.

The container is also the unit of resource isolation. A runaway operator burns its own container budget, hits its own rate limit, and is killed independently. It cannot cause cascading failures across the fleet.

05Accountability

The immutable audit log

Every action the operator takes lands in an append-only MongoDB collection as a record containing: the request ID, the tool name, the input parameters (with secrets redacted), the tool's response, the model's stated reasoning, the audio reference if relevant, the timestamp, and a hash chained to the previous record. The hash chain means retroactive tampering is detectable; the append-only enforcement means an attacker with database access still cannot rewrite history without leaving evidence.

The log is retained per the customer's jurisdiction — typically seven years for financial workflows, longer for regulated healthcare, shorter for marketing. Retention policy is enforced at the database, not at the application. The operator itself does not have permission to delete log entries.

The log is queryable through a typed interface that finance, legal, security, and the workflow owner can use. We provide saved queries for the common review patterns: "show me every action above this dollar amount last week", "show me every escalation involving this account", "show me every tool call that was rejected by the authority bar this quarter".

06Credentials

Secret handling

The operator never sees secret values. Three layers prevent this:

  1. 01Secrets live in the customer's vault (Vercel environment variables, AWS Secrets Manager, or a customer-supplied vault). They are injected into the container at start time as environment variables.
  2. 02The tool gateway uses the secret on the operator's behalf when it makes a downstream call. The operator calls the tool by name; Glama attaches the credential at the gateway boundary.
  3. 03The audit log records that the credential was used. The credential value never enters the log, never enters the prompt, and never enters the operator's memory.

Credentials are rotated on a schedule — typically 90 days for production credentials, 30 days for high-risk credentials. Rotation is automated through the gateway; the operator does not need to be redeployed for a rotation.

07Behavior stability

Model version discipline

The model the operator uses is pinned by version. Model updates from Anthropic, OpenAI, or any vendor do not automatically propagate. A new version reaches the operator only after a regression evaluation:

  • A replay set of the last 30 days of operator runs is executed against the new model version.
  • The outputs are diffed against the production outputs at the action level (not the token level).
  • Any action that would differ — a different tool call, a different parameter, a different escalation decision — is flagged for human review.
  • The new version graduates to production only when the diff is acceptable and the workflow owner has reviewed the flagged cases.

This discipline is how we catch silent behavior drift. Model updates are the single most common failure mode for production agents that nobody warns you about; the regression suite makes them visible.

08When something goes wrong

Incident response

The incident-response runbook for an operator-driven incident has three phases: detect, contain, learn.

Detect

The tool gateway emits a real-time stream of rejected calls (authority-bar violations), unusual call patterns (rate-limit hits), and high-confidence escalation triggers. The detection target is under five minutes from event to alert; in practice we see most under two.

Contain

Containment is a single config change: revoke the operator's credentials at the gateway. The container continues running but every tool call is denied. The operator effectively goes into a read-only mode until a human can investigate. No customer-facing outage; no in-flight actions complete.

Learn

Every incident produces a post-mortem entry in the audit log, the model version, the authority bar version, and the workflow specification at the time of the incident. The post-mortem ships back into the regression suite as a test case the next model version must not fail.

09What we hand auditors

The customer security review packet

For customers under SOC 2, GDPR DPIA, or internal security review, AIMOCS provides a standard packet:

  • The signed authority bar specification at the time of the review window, version-controlled.
  • The container image SBOM and the egress firewall ruleset.
  • The audit-log retention policy and the hash-chain integrity proof for the review window.
  • The model-version log: every version change with the regression-evaluation results attached.
  • The secret-rotation log: every credential rotation with the new credential's scope.
  • The incident log: every detection event, the containment action, the post-mortem outcome.
  • The data-residency map: where the operator container, memory, audit log, and downstream tool calls were located during the review window.

The packet exists because the alternative — auditors trying to reconstruct what an operator did from log scraps — is what makes deploying agents to regulated workflows hard. The packet makes the review repeatable.

Questions
  • How is the model prevented from leaking secrets via responses?

    The operator never sees secret values, so it cannot leak them via response. Outputs are also passed through an outbound redaction pass before they leave the operator container, as a defense-in-depth layer against the rare case a secret value made it into the conversation through a misconfigured tool response.

  • Can the customer revoke the operator's access without AIMOCS in the loop?

    Yes. The credentials live in the customer's vault; revocation there immediately denies the operator at the gateway. AIMOCS is notified through the same channel but does not need to act for the revocation to take effect.

  • How does this work with our existing SIEM?

    The audit log is streamed in real time to the customer's SIEM (Splunk, Datadog, or equivalent) as a parallel sink. Security can search agent activity alongside human commits and platform events using their existing tooling.

  • Is the hash chain externally verifiable?

    Yes. The hash root is periodically anchored to an external timestamping service so the chain can be verified without trusting AIMOCS infrastructure. The verification utility is open-source.

  • How long does a customer security review typically take?

    With the standard packet, two to three weeks for SOC 2 and four to six weeks for GDPR DPIA in our experience. Without the packet — when the customer has to reconstruct the review from scratch — typically two to three months.

Citations
  1. [1]Operator anatomy white paper — aimocs.com/papers/autonomous-operator-anatomy.
  2. [2]Glama MCP stack guide — aimocs.com/stack/glama.
  3. [3]OWASP LLM Top 10 (prompt injection, training data poisoning, etc.) — used as part of the operator threat-model review.
  4. [4]SOC 2 Trust Services Criteria — AICPA.
Begin

We don't advise on AI. We run it for you.