Retry policies
By default a failed run is final. A retry policy lets a failed run be automatically re-attempted: a named, reusable, account-global resource that decides whether, how many times, how long apart, and on which failures a run is retried. Reference a policy from a job and any matching failure spawns a fresh attempt; the retried runs carry trigger = RETRY (distinct from the manual rerun action). A job that references no policy is never retried — so a job that sets nothing behaves exactly as it did before retries existed.
A policy is identified by a customer-supplied id you choose on create. Its attributes are:
| Field | Notes |
|---|---|
name | Required, 1–200 characters. |
max_retries | How many times a failed run is retried after the initial attempt — max_retries of 3 means up to 4 attempts in total. 0 disables retries. Range 0–10. |
backoff | How the wait between retries grows: fixed waits delay_seconds before every retry; exponential doubles the wait each time (delay_seconds, then 2×, 4×, …), capped at max_delay_seconds. |
delay_seconds | The wait before a retry, in seconds (≥ 1). For fixed it is the constant wait; for exponential it is the base wait that doubles each retry. |
max_delay_seconds | The ceiling on the wait between retries, in seconds — only valid with exponential backoff; omit it for fixed. |
retry_on_timeout | Retry a run that did not complete within the job's timeout. Boolean; defaults to false. |
retry_on_connection_error | Retry a run whose destination could not be reached (DNS, refused connection, TLS, or transport error). Boolean; defaults to false. |
retry_statuses | Allowlist of response status patterns to retry when a run failed because the response did not match the job's success status. Each element is an exact 3-digit code ("429") or a class ("5xx"). Empty (the default) matches nothing. |
retry_statuses_except | Patterns subtracted from retry_statuses, using the same syntax — except wins on overlap. Empty (the default) subtracts nothing. |
Each match field carries a neutral identity, so a field you omit does nothing — a policy retries exactly the failures you opt into:
retry_on_timeoutandretry_on_connection_errortoggle retries for a timed-out run and an unreachable destination, respectively.retry_statusesis an allowlist of statuses to retry on a non-success response. Each element is either an exact 3-digit HTTP code ("429") or a status class writtenNxx— one of"1xx","2xx","3xx","4xx","5xx". Statuses are strings, and class tokens are case-insensitive ("5XX"is stored as"5xx"). An empty list matches nothing, so nothing is retried on a non-success response.retry_statuses_exceptsubtracts fromretry_statusesusing the same exact-code-or-class syntax. When a status matches both lists,exceptwins and the run is not retried — soretry_statusesof["5xx"]withretry_statuses_exceptof["501"]retries every server error except501. An element that does not overlapretry_statusesis allowed and simply has no effect.
A status is retried only when it matches retry_statuses and is not excluded by retry_statuses_except:
| Status pattern membership | Result |
|---|---|
In retry_statuses, not in retry_statuses_except | Retried |
| In both lists | Not retried (except wins) |
| In neither list | Not retried (an allowlist match is required) |
retry_statuses empty | No non-success response is retried |
Some failures are never retried regardless of the policy.
A policy is created with POST /api/v1/retry-policies, supplying your chosen id:
POST /api/v1/retry-policies
Content-Type: application/vnd.api+json
Authorization: Bearer <api-key>
{
"data": {
"id": "retry-on-5xx",
"type": "retry_policy",
"attributes": {
"name": "Retry on server errors",
"max_retries": 5,
"backoff": "exponential",
"delay_seconds": 2,
"max_delay_seconds": 60,
"retry_on_timeout": true,
"retry_on_connection_error": true,
"retry_statuses": ["429", "5xx"],
"retry_statuses_except": ["501"]
}
}
}A suggested starting policy
If you're not sure where to begin, this policy retries the failures that are usually worth retrying — timeouts, unreachable destinations, rate limits (429), and server errors (5xx) — with exponential backoff. Copy it, give it an id, and adjust to taste; nothing here is a default, so you stay in full control.
{
"max_retries": 5,
"backoff": "exponential",
"delay_seconds": 2,
"max_delay_seconds": 60,
"retry_on_timeout": true,
"retry_on_connection_error": true,
"retry_statuses": ["429", "5xx"],
"retry_statuses_except": []
}Migrating from retry_on
Earlier the match rule was a single retry_on object — { "statuses": [int], "reasons": [...] }. It has been replaced by the four fields above. To translate an existing policy: integer statuses become string retry_statuses (which now also accept Nxx classes); reasons: ["TIMEOUT"] becomes retry_on_timeout: true; reasons: ["CONNECTION_ERROR"] becomes retry_on_connection_error: true; and the old catch-all reasons: ["NON_SUCCESS_STATUS"] (retry every non-success response) becomes the classes you want, e.g. retry_statuses: ["1xx", "3xx", "4xx", "5xx"].
| Endpoint | Purpose |
|---|---|
POST /api/v1/retry-policies | Create a policy (caller-supplied id). |
GET /api/v1/retry-policies | List policies (page paginated; filter[name] for a case-insensitive substring match on the name). |
GET /api/v1/retry-policies/{id} | Read a policy. |
PUT /api/v1/retry-policies/{id} | Replace a policy. |
DELETE /api/v1/retry-policies/{id} | Soft-delete a policy. |
Updates follow the standard get-mutate-put pattern: read the policy, change the fields you want, and PUT the full representation back.
Attaching a policy to a job
Set the job's retry_policy attribute to a policy id, and override it per environment with the same key inside an environments entry — exactly parallel to the per-environment schedule, timezone, and request-leaf overrides:
{
"retry_policy": "retry-on-5xx",
"environments": {
"production": { "enabled": true },
"staging": { "enabled": true, "retry_policy": "no-retries" }
}
}A per-environment retry_policy that is omitted inherits the job's base retry_policy; an absent base means the job references no policy and is never retried. A per-environment override must name a real policy id — there is no null override to opt a single environment out of retries, so to make one environment not retry while another does, create a zero-retry policy (max_retries: 0) and reference it, as staging does here with no-retries. Production inherits retry-on-5xx while staging uses the zero-retry no-retries policy.
Related
- Smpl Jobs overview — the job model, scheduling, and environments
- Running jobs — how retries appear in run history
- API Reference — Jobs — full schema and filters

