External Workers
Any step handler not registered as a built-in is automatically dispatched to the external worker queue. Workers can be written in any language — they just poll a REST endpoint, execute the task, and report the result back. No message broker, no sidecar, no SDK lock-in.
How It Works#
External workers follow a simple poll-execute-report loop. The engine manages the task queue internally using PostgreSQL, and workers communicate over plain HTTP.
- 1.Engine encounters an unknown handler name → creates a worker task in the queue with the step's params and context data.
- 2.Worker polls:
POST /workers/tasks/poll { "handler_names": ["process_image"] } - 3.Engine returns the task payload including params, context data, and metadata (instance ID, step ID, attempt number).
- 4.Worker executes the task. For long-running work, it sends periodic heartbeats to keep the lease alive.
- 5.Worker reports the result:
POST /workers/tasks/{id}/complete { "output": { ... } } - 6.Engine receives the output, persists it, and resumes the workflow from where it left off.
SKIP LOCKED for concurrent worker support, preventing double-claiming even under high concurrency.Properties#
Pull-basedNo message broker needed. Workers poll at their own pace.At-least-onceHeartbeat timeout (60s) + reaper (30s interval) ensures no stuck tasks.Error classificationWorkers declare retryable vs permanent failures. Retryable returns the task to the queue; permanent marks the step failed.Concurrent workersSKIP LOCKED prevents double-claiming. Scale horizontally by running more workers.Language agnosticAny language that can make HTTP requests works. No SDK required.
Minimal Worker Example#
A worker is just a loop: poll for a task, execute it, report the result. Here is a complete minimal worker in TypeScript:
import { Orch8Client } from "@orch8/sdk";
const client = new Orch8Client({ baseUrl: "http://localhost:3001" });
async function sleep(ms: number) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
while (true) {
const task = await client.pollTask({
handler_names: ["process_image"],
worker_id: "img-worker-1",
});
if (!task) {
await sleep(1000);
continue;
}
try {
const result = await processImage(task.params);
await client.completeTask(task.id, { output: result });
} catch (err: any) {
await client.failTask(task.id, {
error: err.message,
retryable: true,
});
}
}The same pattern applies in every language. Here it is in Python, Go, and as raw cURL calls:
# Poll for a task
TASK=$(curl -s -X POST http://localhost:3001/workers/tasks/poll \
-H "Content-Type: application/json" \
-d '{ "handler_names": ["process_image"], "worker_id": "img-worker-1" }')
TASK_ID=$(echo $TASK | jq -r '.id')
# Complete the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
-H "Content-Type: application/json" \
-d '{ "output": { "url": "https://cdn.example.com/processed.png" } }'
# Or fail the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/fail" \
-H "Content-Type: application/json" \
-d '{ "error": "Out of memory", "retryable": true }'Worker API#
Workers interact with four endpoints. All requests and responses use JSON. Authentication depends on your deployment — see the configuration docs for API key and tenant header setup.
Poll for tasks#
Long-polls for an available task matching the given filters. Returns a task object if one is available, or an empty response after the poll timeout (default 30s).
| Parameter | Type | Required | Description |
|---|---|---|---|
| handler_names | string[] | No | Filter by handler name. Only tasks with a matching handler are returned. Omit to accept any handler. |
| queue_name | string | No | Poll from a specific named queue instead of the default queue. |
| worker_id | string | No | Unique identifier for this worker instance. Used for observability and debugging. |
| batch_size | integer | No | Number of tasks to claim at once. Default: 1. Max: 50. |
// Response — task claimed
{
"id": "task_01J5K...",
"handler_name": "process_image",
"params": { "image_url": "https://...", "width": 800 },
"context": { "instance_id": "inst_01J5K...", "step_id": "resize" },
"metadata": { "attempt": 1, "created_at": "2025-01-15T10:30:00Z" }
}Report success#
Reports that the task completed successfully. The output is persisted as the step's result and the workflow resumes.
| Parameter | Type | Required | Description |
|---|---|---|---|
| output | object | Yes | The step output. This becomes available as steps.<step_id>.output in subsequent steps. |
POST /workers/tasks/task_01J5K.../complete
{
"output": {
"resized_url": "https://cdn.example.com/800w.png",
"bytes": 142038
}
}Report failure#
Reports that the task failed. The retryable flag controls whether the task is re-queued or permanently failed.
| Parameter | Type | Required | Description |
|---|---|---|---|
| error | string | Yes | Human-readable error message. Stored in the step's error output. |
| retryable | boolean | No | If true, the task is returned to the queue and the step's retry policy applies. Default: false. |
POST /workers/tasks/task_01J5K.../fail
{
"error": "GPU out of memory: requested 12GB, available 8GB",
"retryable": true
}Send heartbeat#
Extends the task lease. Send every 15-30 seconds for long-running tasks to prevent the reaper from reclaiming them. The request body is empty.
curl -X POST http://localhost:3001/workers/tasks/task_01J5K.../heartbeatHeartbeat & Timeout#
The engine runs a background reaper that reclaims tasks from unresponsive workers. This ensures that crashed or stuck workers do not block workflows indefinitely.
Default heartbeat timeout60 seconds. If no heartbeat is received within this window, the task is considered abandoned.Reaper intervalRuns every 30 seconds, scanning for tasks past their heartbeat deadline.ReclamationAbandoned tasks are returned to the queue and become available for other workers to claim.Recommended cadenceSend heartbeats every 15-30 seconds to stay well within the 60-second timeout.
Here is a heartbeat loop running alongside a long task:
# Start heartbeat in background
while true; do
curl -s -X POST "http://localhost:3001/workers/tasks/$TASK_ID/heartbeat"
sleep 20
done &
HEARTBEAT_PID=$!
# Do the work
process_video "$INPUT_FILE" "$OUTPUT_FILE"
# Stop heartbeats and report result
kill $HEARTBEAT_PID
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
-H "Content-Type: application/json" \
-d '{ "output": { "status": "done" } }'Queue Routing#
By default, all worker tasks land in a single default queue. Named queues let you route specific step types to dedicated worker pools. Common use cases:
GPU workersRoute image/video processing to GPU-equipped machines while keeping CPU workers lean.Region-specificRun data-residency-sensitive tasks on workers deployed in the correct region.Priority isolationGive paying customers a dedicated queue so they are never blocked by free-tier batch jobs.Resource sizingRoute memory-intensive tasks to high-memory workers without over-provisioning the whole fleet.
Step configuration#
Assign a step to a named queue by setting queue_name in the step definition:
{
"type": "step",
"id": "render_video",
"handler": "video_renderer",
"queue_name": "gpu-workers",
"params": {
"input_url": "{{context.data.video_url}}",
"resolution": "1080p"
}
}Polling from a queue#
Workers that serve a specific queue pass the queue_name parameter when polling. They will only receive tasks routed to that queue.
POST /workers/tasks/poll
{
"queue_name": "gpu-workers",
"handler_names": ["video_renderer", "image_upscaler"],
"worker_id": "gpu-worker-us-east-1"
}Workers that poll without a queue_name claim from the default queue only. Tasks routed to a named queue are never visible to default-queue workers — they wait until the correct worker type polls.
Error Handling#
When a worker reports a failure, the retryable flag determines what happens next. There are two distinct paths:
Retryable errors#
When retryable: true, the task is returned to the queue. The step's retry policy controls how many times it can be retried and the backoff between attempts.
// Worker encounters a transient error
try {
const result = await callExternalAPI(task.params);
await client.completeTask(task.id, { output: result });
} catch (err: any) {
if (isTransient(err)) {
// Task goes back to the queue; retry policy applies
await client.failTask(task.id, {
error: `Transient: ${err.message}`,
retryable: true,
});
} else {
// Step fails permanently — no retry
await client.failTask(task.id, {
error: `Fatal: ${err.message}`,
retryable: false,
});
}
}Permanent errors#
When retryable: false (the default), the step is marked as permanently failed. The workflow instance transitions to a failed state unless an error handler or fallback is configured at the sequence or workflow level.
retryable: true, the task is re-queued and the step's retry policy applies (max attempts, exponential backoff). When retryable: false, the step fails permanently and no further retries are attempted.Common classification patterns#
RetryableNetwork timeouts, 429 rate limits, 503 service unavailable, temporary resource exhaustion.Permanent400 validation errors, 404 not found, authentication failures, business logic violations.Worker crashIf the worker crashes without reporting, the heartbeat timeout triggers and the task is automatically re-queued.
Ready to try Orch8?
One command to install. Two minutes to your first workflow.
curl -fsSL https://orch8.io/start.sh | sh