Skip to content

Program Operating Model and Ownership Map

Scope: thegent orchestration platform Date: 2026-02-14 Related: WP-0005, docs/RUNBOOK.md, docs/research/GOVERNANCE_WP_GAPS.md


1. Overview

This document defines the RACI matrix, ownership assignments, and escalation paths for the thegent orchestration platform. It supports org readiness and clear accountability for operations, governance, and recovery.


2. RACI Matrix

ActivityProduct OwnerTech LeadOperatorSecurity/ComplianceStakeholder
Orchestration (run, bg, dag run)ARRII
Policy definition & thresholdsARIRC
Override approval (--override)AIRCI
Escalation queue (govern escalate)AIRCI
Drift detection & sweepIRRII
Audit trail & integrityIRIRC
Recovery (reconcile, rollback)IRRII
Data protection & retentionARIRC
Contract migration & versioningARIIC
Post-launch observationARRII

Legend: R = Responsible, A = Accountable, C = Consulted, I = Informed


3. Ownership Assignments

DomainOwnerBackupScope
OrchestrationTech LeadOperatorrun, bg, dag, agents, routing
GovernanceProduct OwnerSecuritypolicy, override, escalation, data-protection
RecoveryTech LeadOperatorreconcile, rollback, recover, stop
ObservabilityOperatorTech Leadcockpit, benchmark, drift, KPIs
ContractsTech LeadProduct Ownerconformance, migration, schema versioning
ComplianceSecurity/ComplianceProduct Owneraudit, retention, evidence

4. Escalation Paths

4.1 Policy Denial / Blocked Run

  1. T0: Run blocked by policy (e.g. trust score, critical lane, drift budget).
  2. T0+0: Run added to escalation queue (thegent govern escalate list).
  3. SLA: Resolve within THGENT_ESCALATION_SLA_MINUTES (default 30 min).
  4. T0+SLA: If past SLA, escalate to Product Owner.
  5. Resolution: thegent govern escalate resolve <run_id> or --override with justification.

4.2 Contract Drift / Adapter Failure

  1. T0: Drift detected (thegent observe drift, thegent govern conformance --check-drift).
  2. T0+0: Critical lane blocked (XC2); DAG run with --check-drift exits 2.
  3. SLA: Investigate within 60 min; remediate within 4 hours.
  4. Escalation: Tech Lead → Product Owner if adapter change required.

4.3 Audit / Integrity Failure

  1. T0: thegent history verify fails or hash chain broken.
  2. T0+0: Escalate immediately to Security/Compliance.
  3. SLA: Root cause within 2 hours; remediation per incident severity.

4.4 Post-Launch Incident

See docs/POST_LAUNCH_OBSERVATION_PLAYBOOK.md for severity→SLA mapping.


5. Handoff and Continuity

  • Shift handoff: Operator documents active escalations and running sessions in continuity log.
  • Ownership transfer: Incoming owner confirms escalation queue and past-SLA items via thegent govern escalate list --past-sla.
  • Runbook reference: All procedures in docs/RUNBOOK.md; escalation links in §3.

6. Configuration (Code Enforcement)

The operating model is enforced via config. These settings map to the escalation paths above:

Config / EnvDefaultMaps To
escalation_sla_minutes / THGENT_ESCALATION_SLA_MINUTES30§4.1 Policy Denial SLA
escalation_sla_breach_alerttrue§4.1 Past-SLA alert in govern sweep
override_ttl_seconds86400 (24h)§4.1 Override validity

Source: src/thegent/config.pyThegentSettings


7. References

  • docs/RUNBOOK.md — On-call procedures, recovery, escalation
  • docs/research/GOVERNANCE_WP_GAPS.md — WP-3008 escalation queue
  • docs/plans/09-RISK-REGISTRY.md — Risk-based SLA targets