Skip to content

Timeouts

OJS defines five timeout mechanisms that protect against runaway jobs, stalled workers, and unbounded queue wait times. Each addresses a different failure mode.

TimeoutDefaultScopeDescription
Execution timeout1800s (30m)Per attemptMaximum time for a single execution attempt
Total timeoutAll attemptsWall-clock limit from creation to completion
Enqueue TTLPre-executionMaximum time in a non-active state before discard
Grace period30sPer attemptTime between soft timeout signal and forced termination
Heartbeat timeout60sPer attemptMaximum interval between heartbeats before a job is considered stalled

The execution timeout limits how long a single attempt can run. When exceeded, the job transitions to retryable (if attempts remain) or discarded (if exhausted).

{
"type": "report.generate",
"args": ["quarterly-2026-q1"],
"timeout": 3600
}

Detection is server-side via visibility timeout expiry. Workers MAY also enforce the timeout locally for faster response.

The total timeout sets a wall-clock limit across all attempts, from job creation to completion. This prevents jobs from retrying indefinitely when individual attempts are short but numerous.

{
"type": "payment.process",
"args": ["txn_abc123"],
"timeout": 300,
"total_timeout": 900,
"retry": { "max_attempts": 10 }
}

Before scheduling a retry, the backend checks whether the next attempt would exceed the total timeout. If so, the job is discarded or dead-lettered instead of retried.

The enqueue TTL limits how long a job can wait before being processed. If a job has not started execution within this period, it is discarded with error type enqueue_ttl_expired.

{
"type": "notification.push",
"args": ["user_123", "Flash sale ending!"],
"enqueue_ttl": 300
}

Use case: time-sensitive notifications where delivery after a delay is worse than no delivery.

The grace period is the time between a soft timeout signal and forced termination. It gives the worker a chance to save progress or clean up resources.

{
"type": "video.transcode",
"args": ["video_abc"],
"timeout": 7200,
"grace_period": 60
}

The default grace period (30s) aligns with Kubernetes shutdown semantics.

The heartbeat timeout detects stalled jobs—jobs where the worker process is alive but the handler is stuck (deadlock, infinite loop, blocked I/O).

{
"type": "data.import",
"args": ["dataset_xyz"],
"heartbeat_timeout": 120
}

Progress updates (PUT /ojs/v1/jobs/{id}/progress) reset the heartbeat clock. Long-running jobs that report progress regularly will not trigger the heartbeat timeout.

When a timeout fires:

Timeout TypeError TypeNext State
Executiontimeoutretryable or discarded
Totaltotal_timeoutdiscarded
Enqueue TTLenqueue_ttl_expireddiscarded
Heartbeatstalledretryable or discarded
EventDescription
job.timeoutExecution timeout fired
job.stalledHeartbeat timeout fired
job.ttl_expiredEnqueue TTL expired
job.total_timeoutTotal timeout reached
  • Retry: Before scheduling a retry, the backend checks if the next attempt would exceed total_timeout.
  • Workflows: If a workflow step times out, the workflow halts (chain) or records the step failure (group/batch).
  • Progress: Progress updates reset the heartbeat clock, allowing long-running jobs to avoid stall detection.
  • Graceful shutdown: On shutdown, the error type is "shutdown" rather than "timeout", distinguishing planned stops from timeouts.