Retry policies

By default a failed run is final. A retry policy lets a failed run be automatically re-attempted: a named, reusable, account-global resource that decides whether, how many times, how long apart, and on which failures a run is retried. Reference a policy from a job and any matching failure spawns a fresh attempt; the retried runs carry trigger = RETRY (distinct from the manual rerun action). A job that references no policy is never retried — so a job that sets nothing behaves exactly as it did before retries existed.

A policy is identified by a customer-supplied id you choose on create. Its attributes are:

Field	Notes
`name`	Required, 1–200 characters.
`max_retries`	How many times a failed run is retried after the initial attempt — `max_retries` of `3` means up to 4 attempts in total. `0` disables retries. Range `0`–`10`.
`backoff`	How the wait between retries grows: `fixed` waits `delay_seconds` before every retry; `exponential` doubles the wait each time (`delay_seconds`, then `2×`, `4×`, …), capped at `max_delay_seconds`.
`delay_seconds`	The wait before a retry, in seconds (≥ 1). For `fixed` it is the constant wait; for `exponential` it is the base wait that doubles each retry.
`max_delay_seconds`	The ceiling on the wait between retries, in seconds — only valid with `exponential` backoff; omit it for `fixed`.
`retry_on_timeout`	Retry a run that did not complete within the job's timeout. Boolean; defaults to `false`.
`retry_on_connection_error`	Retry a run whose destination could not be reached (DNS, refused connection, TLS, or transport error). Boolean; defaults to `false`.
`retry_statuses`	Allowlist of response status patterns to retry when a run failed because the response did not match the job's success status. Each element is an exact 3-digit code (`"429"`) or a class (`"5xx"`). Empty (the default) matches nothing.
`retry_statuses_except`	Patterns subtracted from `retry_statuses`, using the same syntax — `except` wins on overlap. Empty (the default) subtracts nothing.

Each match field carries a neutral identity, so a field you omit does nothing — a policy retries exactly the failures you opt into:

retry_on_timeout and retry_on_connection_error toggle retries for a timed-out run and an unreachable destination, respectively.
retry_statuses is an allowlist of statuses to retry on a non-success response. Each element is either an exact 3-digit HTTP code ("429") or a status class written Nxx — one of "1xx", "2xx", "3xx", "4xx", "5xx". Statuses are strings, and class tokens are case-insensitive ("5XX" is stored as "5xx"). An empty list matches nothing, so nothing is retried on a non-success response.
retry_statuses_except subtracts from retry_statuses using the same exact-code-or-class syntax. When a status matches both lists, except wins and the run is not retried — so retry_statuses of ["5xx"] with retry_statuses_except of ["501"] retries every server error except 501. An element that does not overlap retry_statuses is allowed and simply has no effect.

A status is retried only when it matches retry_statuses and is not excluded by retry_statuses_except:

Status pattern membership	Result
In `retry_statuses`, not in `retry_statuses_except`	Retried
In both lists	Not retried (`except` wins)
In neither list	Not retried (an allowlist match is required)
`retry_statuses` empty	No non-success response is retried

Some failures are never retried regardless of the policy.

A policy is created with POST /api/v1/retry-policies, supplying your chosen id:

POST /api/v1/retry-policies
Content-Type: application/vnd.api+json
Authorization: Bearer <api-key>

{
  "data": {
    "id": "retry-on-5xx",
    "type": "retry_policy",
    "attributes": {
      "name": "Retry on server errors",
      "max_retries": 5,
      "backoff": "exponential",
      "delay_seconds": 2,
      "max_delay_seconds": 60,
      "retry_on_timeout": true,
      "retry_on_connection_error": true,
      "retry_statuses": ["429", "5xx"],
      "retry_statuses_except": ["501"]
    }
  }
}

A suggested starting policy

If you're not sure where to begin, this policy retries the failures that are usually worth retrying — timeouts, unreachable destinations, rate limits (429), and server errors (5xx) — with exponential backoff. Copy it, give it an id, and adjust to taste; nothing here is a default, so you stay in full control.

json

{
  "max_retries": 5,
  "backoff": "exponential",
  "delay_seconds": 2,
  "max_delay_seconds": 60,
  "retry_on_timeout": true,
  "retry_on_connection_error": true,
  "retry_statuses": ["429", "5xx"],
  "retry_statuses_except": []
}

Migrating from retry_on

Earlier the match rule was a single retry_on object — { "statuses": [int], "reasons": [...] }. It has been replaced by the four fields above. To translate an existing policy: integer statuses become string retry_statuses (which now also accept Nxx classes); reasons: ["TIMEOUT"] becomes retry_on_timeout: true; reasons: ["CONNECTION_ERROR"] becomes retry_on_connection_error: true; and the old catch-all reasons: ["NON_SUCCESS_STATUS"] (retry every non-success response) becomes the classes you want, e.g. retry_statuses: ["1xx", "3xx", "4xx", "5xx"].

Endpoint	Purpose
`POST /api/v1/retry-policies`	Create a policy (caller-supplied `id`).
`GET /api/v1/retry-policies`	List policies (page paginated; `filter[name]` for a case-insensitive substring match on the name).
`GET /api/v1/retry-policies/{id}`	Read a policy.
`PUT /api/v1/retry-policies/{id}`	Replace a policy.
`DELETE /api/v1/retry-policies/{id}`	Soft-delete a policy.

Updates follow the standard get-mutate-put pattern: read the policy, change the fields you want, and PUT the full representation back.

Attaching a policy to a job

Set the job's retry_policy attribute to a policy id, and override it per environment with the same key inside an environments entry — exactly parallel to the per-environment schedule, timezone, and request-leaf overrides:

json

{
  "retry_policy": "retry-on-5xx",
  "environments": {
    "production": { "enabled": true },
    "staging":    { "enabled": true, "retry_policy": "no-retries" }
  }
}

A per-environment retry_policy that is omitted inherits the job's base retry_policy; an absent base means the job references no policy and is never retried. A per-environment override must name a real policy id — there is no null override to opt a single environment out of retries, so to make one environment not retry while another does, create a zero-retry policy (max_retries: 0) and reference it, as staging does here with no-retries. Production inherits retry-on-5xx while staging uses the zero-retry no-retries policy.

Smpl Jobs overview — the job model, scheduling, and environments
Running jobs — how retries appear in run history
API Reference — Jobs — full schema and filters

Retry policies ​

A suggested starting policy ​

Attaching a policy to a job ​

Related ​

Retry policies

A suggested starting policy

Attaching a policy to a job

Related