Security Model and SLOs¶

This document describes the threats Yantra Gaming defends against, the controls it uses, and the service-level objectives those controls must meet. Cross-reference with wallet-api.md for the signing spec and error-codes.md for the behavioural contract.

Threat model¶

The RGS sits between an untrusted player browser and a semi-trusted operator backend. Five threat actors and what they try to do:

Actor	What they want	Controls
Unauthenticated external requester	Call the RGS API without credentials	All public endpoints except `/healthz` / `/readyz` / `/metrics` require a valid operator-signed request. All `/v1/*` endpoints run through `operatorAuth` middleware
Man-in-the-middle on the network path	Read or tamper with traffic between operator and RGS	TLS 1.3 minimum, HSTS, no certificate pinning (see notes). Body hash is part of the HMAC canonical string, so tampering breaks the signature
Replay attacker	Re-send a captured valid request to double-spend or trigger a side-effect	±30 s timestamp window in the HMAC canonical string; 24 h idempotency cache keyed on `(operatorId, endpoint, requestUuid)`
Malicious operator	Produce fake rounds, claim wins that did not happen, forge proofs	Per-tenant isolation on every row; commit-reveal RNG (operator cannot forge outcomes without also forging the pre-committed `serverSeedHash`); every round signed by the RGS
Compromised operator (credential leak)	Use leaked `apiSecret` to impersonate the operator	Credentials have `notBefore` / `notAfter` / `revokedAt` on `OperatorCredential`; envelope encryption at rest (AES-GCM); per-operator IP allow-list; rotate-on-incident workflow in the operator portal

Not in scope for the RGS to defend against: the operator's own KYC, AML, deposit, or withdrawal flows. Those are operator responsibilities.

Transport¶

TLS 1.3¶

Ingress terminates TLS 1.3 only. TLS 1.2 is acceptable as a downstream-origin fallback for legacy operators but disabled by default. No TLS 1.0 / 1.1 / SSL under any circumstance. Preferred cipher suites:

TLS_AES_256_GCM_SHA384
TLS_CHACHA20_POLY1305_SHA256
TLS_AES_128_GCM_SHA256

HSTS¶

Strict-Transport-Security: max-age=63072000; includeSubDomains; preload on every response from the RGS and game-client domains. Preload submission is part of the production deployment checklist.

Certificate pinning¶

HPKP (HTTP Public Key Pinning) is not recommended in 2026 and is not used. It has a well-documented operational-risk profile (wedged pins brick the service) and has been deprecated by browsers. Equivalent protection is achieved via CAA records on the DNS zone and Certificate Transparency monitoring via tools such as CertStream.

Request signing¶

Symmetric HMAC-SHA256 on every request in both directions. See wallet-api.md for the full canonical-string spec. Summary:

canonical = METHOD + "\n" + PATH + "\n" + TIMESTAMP + "\n" + SHA256_HEX(BODY)
signature = base64( HMAC_SHA256(secret, canonical) )

Headers:

X-Yantra-Key-Id:    <kid>
X-Yantra-Timestamp: <unix-seconds>
X-Yantra-Signature: <base64>

Constant-time comparison¶

Signature verification uses crypto.timingSafeEqual in Node. A naive === compare leaks byte-wise equality through wall-clock timing differences, allowing an attacker to reconstruct a valid signature byte by byte via repeated probing. The implementation lives in apps/rgs-server/src/utils/signing.ts:

const a = Buffer.from(expected, 'base64');
const b = Buffer.from(signature, 'base64');
if (a.length !== b.length) return false;
return crypto.timingSafeEqual(a, b);

Any operator SDK consuming this spec must do the same.

±30 second clock window¶

The receiver rejects any request with |now - timestamp| > 30 s. A wider window tolerates more clock skew but extends the replay-attack grace period. NTP your servers and stay within ±5 s of real time.

The window is configured via SIGNATURE_WINDOW_SECONDS (default 30) in apps/rgs-server/src/config.ts.

Why not asymmetric¶

Symmetric HMAC is simpler to implement and has no operational downsides for the operator-to-RGS path. For the reverse direction (player launch) the roadmap plans RS256 + JWKS to remove the shared-secret burden from operators, see Forward path below.

Idempotency cache¶

The InboundIdempotency table caches the response for every inbound request for 24 hours, keyed on (operatorId, requestUuid, endpoint). Rows expire via a background cleaner.

Purpose:

Replay protection beyond the 30 s window. Even if an attacker got inside the window, re-sending the same signed request returns the cached response instead of re-effecting.
Safe retries for operators. An operator retrying a session-create after a network blip gets the original response, not a second session.

Schema excerpt:

model InboundIdempotency {
  operatorId   String
  requestUuid  String
  endpoint     String
  responseBody Json
  httpStatus   Int
  createdAt    DateTime @default(now())
  expiresAt    DateTime

  @@id([operatorId, requestUuid, endpoint])
}

The cache is consulted by middleware/idempotency.ts before the handler runs.

Secret management¶

Every operator credential secret is stored encrypted with AES-256-GCM using a master key that never leaves the RGS process memory. OperatorCredential.cipherBlob is the ciphertext.

Envelope encryption¶

dataKey    = random 32 bytes per credential
ciphertext = AES-256-GCM(dataKey, plaintext)
wrappedKey = AES-256-GCM(masterKey, dataKey)
cipherBlob = concat(iv, wrappedKey, iv2, ciphertext, tag)

In production the master key is sourced from a KMS (AWS KMS, GCP KMS, or HashiCorp Vault Transit). In this repo, development uses SECRETS_MASTER_KEY_B64 in .env, validated at boot as 32 base64-decoded bytes (apps/rgs-server/src/config.ts).

KMS vs HSM, what belongs where. Two distinct key classes, two distinct stores:

Key	Role	Production store	Rotation
Data-encryption master key	Wraps per-credential AES-256-GCM data keys (envelope encryption above).	KMS (AWS KMS / GCP KMS / Vault Transit) is sufficient, FIPS 140-2 L2 baseline.	Annual, key-lineage stored in `OperatorCredential.cipherBlob` header so older rows remain decryptable.
Outbound HMAC signing secrets	Per-operator, signed on every outbound `/wallet/*` call.	KMS-wrapped at rest; hot copy in RGS process memory (short-lived).	90-day rotation cadence (NIST SP 800-57 Part 1 r5 §5.3, originator-usage period for symmetric authentication). `OperatorCredential.notBefore / notAfter / revokedAt` implements the overlap window.
Launch-JWT signing keys (RS256, v1.1)	Signs the operator → RGS session-launch JWT. Verified by JWKS.	FIPS 140-2 L3 HSM (AWS CloudHSM / Azure Dedicated HSM / Entrust nShield) when the operator is regulated in Germany (GGL), Malta (MGA), Brazil (SPA): those jurisdictions expect HSM-resident signing keys in the SOC 2 / ISO 27001 evidence pack. KMS-resident is accepted elsewhere.	90-day; JWKS `kid` rotation, the JWKS endpoint serves both current and previous keys during the overlap window.

For operators deploying into regulated markets, the runbook (docs/runbook.md) documents the CloudHSM / Dedicated HSM provisioning flow and the KMS-to-HSM migration checklist. For operators outside those markets, KMS-only is compliant and is the default.

Rotation¶

OperatorCredential carries three fields:

notBefore: credential is valid from this timestamp.
notAfter: credential stops being accepted on reads after this timestamp.
revokedAt: hard kill, takes effect immediately on any read.

Rotation is an overlap-then-cut pattern:

Mint a new credential with notBefore = now.
Share with the operator out of band.
Operator switches their outbound signing to the new kid.
After confirmation, set notAfter = now + 24h on the old credential.
Revoke (set revokedAt) once traffic has fully shifted.

The operator portal surfaces this as "Rotate key" → "Retire old key" two-step flow.

Disposal¶

Revoked credentials are never deleted from the table, the ciphertext is zeroed but the row persists for audit trail. OperatorConfigAuditLog retains the full rotation history.

Rate limiting¶

Two-tier token-bucket limiters per operator, implemented in middleware/operator-rate-limit.ts:

Tier 1: session creation¶

Burst 10, steady 5 per second per operatorId. This is the only inbound money-path endpoint that creates new resources (sessions). Bursting above tier 1 returns HTTP 429 with Retry-After.

Tier 2: read endpoints¶

GET /v1/rounds, GET /v1/rounds/:id, GET /v1/reports/*. Burst 60, steady 20 per second per operatorId. These are reconciliation reads; they are not in the hot path.

Tier 3: outbound¶

Not a rate limit per se, a circuit breaker on the RGS → operator call path. See next section.

Rate-limit decisions are made per-operator, not global, so a noisy tenant cannot starve a quiet one. IP-based limits are layered at the reverse proxy (Cloudflare, Nginx, etc.) and not in the application.

Circuit breaker on outbound wallet calls¶

An operator whose wallet is returning errors harms both us and their players. The outbound path wraps every call with a circuit breaker:

Open: after 5 consecutive non-OK responses on the same (operatorId, endpoint) pair, the circuit opens. New calls fail immediately with RS_ERROR_TIMEOUT and enter the retry queue without hitting the operator.
Half-open: 30 seconds after opening, one probe call is allowed through. If it succeeds, the circuit closes. If it fails, wait another 30 s and retry.
Closed: normal operation.

Circuit state is per (operatorId, endpoint) pair so a /wallet/win outage does not block /wallet/bet. Metrics exposed at circuit_state{operator, endpoint}; see observability.md.

Session token security¶

Player session tokens are short-lived HS256 JWTs:

{
  "sub":          "<opaque-playerRef>",
  "operatorId":   "<tenant>",
  "sessionId":    "<uuid>",
  "currency":     "LKR",
  "jurisdiction": "LK",
  "lang":         "si",
  "mode":         "real",
  "rgLimits":     { "dailyLossMicro": "…", "dailyWagerMicro": "…", "sessionTimeSeconds": 3600 },
  "iat":          1745400000,
  "exp":          1745403600
}

Rules:

Signed with SESSION_JWT_SECRET. Separate from the operator signing secret.
Max lifetime 60 minutes; the exp claim is enforced on every socket handshake and every REST call scoped to a session.
Single player, single game, single currency per token.
The token is conveyed via the launchUrl query string and placed into socket.handshake.auth.token for the WebSocket leg. Not stored in localStorage; held only in memory on the game client.
On RS_ERROR_USER_DISABLED or RS_ERROR_TOKEN_EXPIRED the token is invalidated RGS-side and the socket is force-closed.

Forward path: asymmetric launch requests¶

For POST /v1/session specifically, the medium-term direction is RS256 (RSA-PSS) or ES256 (ECDSA) over a JWKS URL published by the operator. The RGS fetches and caches the operator's public key via JWKS and verifies signed launch requests with it. This removes the shared-secret risk on the highest-value endpoint.

An algorithm allowlist is enforced per-credential, a OperatorCredential row declares allowedAlgs: ["RS256"] and the RGS refuses to validate with anything else. This blocks the classic alg: "none" and RS256-to-HS256 confusion CVEs that continued to surface throughout 2024-2026.

IP allow-list per operator¶

Operator.ipAllowList is a String[] column. When non-empty, every inbound request from that operator is additionally checked against the caller IP (extracted from X-Forwarded-For honouring the reverse-proxy trust configuration). Mismatches return HTTP 403 before the signature is even verified.

This is belt-and-braces against credential theft, even with a valid signed request, the attacker must also be coming from an approved IP.

For operators that cannot pin their egress IPs (they run from dynamic cloud infrastructure), the allow-list is left empty; signature is the sole control.

Content Security Policy on the game iframe¶

The game-client is served with a strict CSP header:

Content-Security-Policy:
  default-src 'self';
  script-src 'self';
  style-src 'self' 'unsafe-inline';
  img-src 'self' data:;
  connect-src 'self' wss://rgs.yantra.example;
  frame-ancestors https://*.operator-domain.example;
  base-uri 'self';
  form-action 'self';

frame-ancestors is the critical bit, it restricts which parent pages can iframe the game. The allow-list is per-operator, set via Operator.frameAncestors (or the equivalent column). Any attempt to iframe from a domain not on the list returns a browser-enforced block.

The RGS also sets X-Frame-Options: SAMEORIGIN on non-game pages (the operator portal) to prevent clickjacking.

Per-operator `frame-ancestors` bootstrap¶

When a new operator is onboarded, or an existing operator launches a new brand domain, the frame-ancestors allow-list must be updated before the iframe will render anywhere new. The flow:

Operator requests a domain addition via the operator-portal Settings → Branding page (written to Operator.frameAncestors as a pending row; emits an OperatorConfigAuditLog entry).
Yantra staff review in provider-admin: sanity-check the domain is controlled by the operator (DNS TXT challenge on _yantra-verify.<domain> carrying Operator.verificationToken is the default verification mechanism, same shape as ACME DNS-01).
Approval writes the domain into Operator.frameAncestors with approvedAt + approvedBy.
CSP rebuild: the next rgs-server response to any routes/session.ts-minted launch URL embeds the updated frame-ancestors allowlist. There is no reload or restart, the header is computed per-response from Operator.frameAncestors.
Effective TTL: the CSP header itself is per-response. The Operator.frameAncestors row is read through a 60-second in-memory cache (EngineRegistry.operatorCache); a new domain becomes effective within one minute of approval.

Domain removal is symmetric, revokedAt is set and the CSP stops including the domain within the cache TTL. Existing iframes already loaded against the old CSP will continue to render until the page is reloaded (browser caches the enforcing CSP with the document).

For staged rollouts (brand A → brand B migration), operators typically list both domains in frame-ancestors for the overlap window and cut the old one once traffic has shifted.

Published SLOs¶

These are the published service-level objectives. They are committed to in operator contracts and enforced by CI load tests (tests/load/k6-bet-flow.js) and by multi-window burn-rate alerts in production (observability.md).

Objective	Target	Window	Source metric
`POST /wallet/bet` outbound latency	p99 < 300 ms	30 days	`wallet_call_latency_ms{endpoint="bet"}`
`POST /wallet/bet` outbound latency	p99.9 < 800 ms	30 days	`wallet_call_latency_ms{endpoint="bet"}`
Bet → settlement end-to-end	p99 < 1.5 s	30 days	`bet_to_settlement_ms`
`POST /v1/session` inbound latency	p99 < 150 ms	30 days	HTTP server histogram
`/wallet/bet` error budget	< 0.1% failure rate	30 days	`wallet_call_errors_total{endpoint="bet"} / wallet_call_total{endpoint="bet"}`
Round-state-machine uptime	> 99.9%	30 days	`up{job="rgs-server"}`

Burn-rate alerts (multi-window, multi-burn-rate per the Google SRE book) fire before the budget is exhausted. See observability.md.

Responsible disclosure¶

Security issues should be reported via the channel in SECURITY.md at the repo root. The policy in short:

Email security@yantra.example with PGP-encrypted payload (public key in SECURITY.md).
90-day coordinated disclosure window.
Safe harbour for good-faith research against the staging environment.
Out of scope: anything targeting an operator's production wallet or any player account.

For integration-time questions (not vulnerabilities), use the normal support channels.

Cloudflare Turnstile (optional bot challenge on /v1/session)¶

POST /v1/session supports an optional Cloudflare Turnstile verification step, enabled by setting TURNSTILE_SECRET_KEY in the RGS environment. When set, every session-create request must present a Turnstile token in either the cfTurnstileToken body field or the cf-turnstile-response header; the RGS verifies it via Cloudflare's siteverify endpoint before minting the session.

Flow:

Operator renders a Turnstile widget on the launch page (client-side).
Player solves the challenge; Cloudflare returns a token to the browser.
Operator's backend relays that token in the POST /v1/session body.
RGS verifies via Cloudflare; on failure, returns 401 turnstile_verification_failed.
On success, the session is minted and the token is discarded (one-shot use is enforced by Cloudflare, not by the RGS).

Operational notes:

Leave TURNSTILE_SECRET_KEY unset in dev and staging environments , the middleware is a pass-through when the secret is absent. Set it in production once the operator's launch page renders a Turnstile widget and relays the token.
Default behaviour on Cloudflare being unreachable is fail-open (log, increment turnstile_verifications_total{status="error"}, allow through). Set failClosedOnError: true when wiring the middleware if you prefer to block on network failure.
The outcome distribution is visible as the turnstile_verifications_total Prometheus counter, labelled status=ok|fail|skip|error. Alert if rate(turnstile_verifications_total{status="fail"}[5m]) spikes , indicates either a bot wave or a broken operator integration.
Reference: apps/rgs-server/src/middleware/turnstile.ts.