Graceful Shutdown
Graceful shutdown ensures workers stop processing cleanly without losing jobs. OJS defines a three-phase shutdown protocol aligned with container orchestration platforms.
Shutdown Phases
Section titled “Shutdown Phases”Running → Quiet → Drain → StopPhase 1: Quiet
Section titled “Phase 1: Quiet”The worker stops fetching new jobs but continues processing in-flight jobs. Triggered by SIGTSTP (quiet only) or SIGTERM (begins full shutdown).
Phase 2: Drain
Section titled “Phase 2: Drain”In-flight jobs are given time to complete. The worker continues sending heartbeats and acknowledging completed jobs.
Phase 3: Stop
Section titled “Phase 3: Stop”After the grace period expires, remaining in-flight jobs are reported as failed with error type "shutdown". The worker deregisters and exits.
Signal Handling
Section titled “Signal Handling”| Signal | Action |
|---|---|
SIGTERM | Begin graceful shutdown (quiet → drain → stop) |
SIGINT | Same as SIGTERM |
SIGTSTP | Enter quiet mode only (stop fetching, keep processing) |
SIGCONT | Resume from quiet mode to running |
Grace Period
Section titled “Grace Period”The grace period (default: 30 seconds) controls how long the drain phase lasts. It MUST be configurable and SHOULD be shorter than the container’s termination grace period.
# Worker configurationgrace_period: 25s # 5s less than Kubernetes default (30s)Kubernetes Alignment
Section titled “Kubernetes Alignment”Kubernetes sends SIGTERM and waits terminationGracePeriodSeconds (default: 30s) before sending SIGKILL. Setting the worker grace period to 25s leaves 5 seconds for cleanup and deregistration.
# Kubernetes deploymentspec: terminationGracePeriodSeconds: 30 containers: - name: worker env: - name: OJS_GRACE_PERIOD value: "25s"Docker Compose
Section titled “Docker Compose”services: worker: stop_grace_period: 30sIn-Flight Job Handling
Section titled “In-Flight Job Handling”Jobs that do not complete within the grace period are handled as follows:
- The worker sends a
FAILfor each incomplete job with error type"shutdown". - These failures count as an attempt and follow the retry policy.
- The backend’s dead worker detection provides a safety net—if the worker crashes during shutdown, the heartbeat timeout recovers the jobs.
Container Integration
Section titled “Container Integration”Kubernetes preStop Hook
Section titled “Kubernetes preStop Hook”For additional drain time before SIGTERM:
lifecycle: preStop: exec: command: ["sh", "-c", "sleep 5"]This gives the load balancer time to stop routing traffic before the worker begins shutdown.
Server Shutdown
Section titled “Server Shutdown”Backend servers also support graceful shutdown:
- Stop accepting new HTTP connections
- Drain in-flight HTTP requests (with timeout)
- Stop background schedulers (cron, retry promoter, stalled reaper)
- Close backend connections (Redis, PostgreSQL, etc.)
Conformance
Section titled “Conformance”Implementations MUST:
- Handle SIGTERM and initiate graceful shutdown
- Support configurable grace period
- Report incomplete jobs as failed on shutdown
- Deregister the worker from the backend
- Send final heartbeat with
state: "terminated"