ai-agencee logoai-agencee
security

orchestration

Resilience Patterns

Retry policies, circuit breakers, and graceful fallbacks that keep your workflows running through transient failures.

Retry Policy

Every LLM call is wrapped in an exponential-backoff retry loop:

- 4 attempts by default (configurable) - 1 s → 32 s max delay (2ⁿ s × jitter) - Retries on HTTP 429, 500, 503 - Respects `Retry-After` response headers

Circuit Breaker

Per-provider circuit breaker with three states:

| State | Behaviour | |-------|----------| | CLOSED | Normal operation | | OPEN | Fast-fail all requests; no LLM calls | | HALF_OPEN | One probe request; success → CLOSED, failure → OPEN |

Threshold: 5 consecutive failures → OPEN. Cooldown: 60 s.

Supervisor Verdicts

When a supervisor check fails beyond its `retryBudget`, the engine chooses a verdict:

- RETRY — Re-run with injected corrective prompt - HANDOFF — Forward output to a specialist lane - ESCALATE — Halt DAG, fire `human-review-required` event