Skip to content

External Workers

Any step handler not registered as a built-in is automatically dispatched to the external worker queue. Workers can be written in any language — they just poll a REST endpoint, execute the task, and report the result back. No message broker, no sidecar, no SDK lock-in.

How It Works#

External workers follow a simple poll-execute-report loop. The engine manages the task queue internally using PostgreSQL, and workers communicate over plain HTTP.

  • 1.Engine encounters an unknown handler name → creates a worker task in the queue with the step's params and context data.
  • 2.Worker polls: POST /workers/tasks/poll { "handler_names": ["process_image"] }
  • 3.Engine returns the task payload including params, context data, and metadata (instance ID, step ID, attempt number).
  • 4.Worker executes the task. For long-running work, it sends periodic heartbeats to keep the lease alive.
  • 5.Worker reports the result: POST /workers/tasks/{id}/complete { "output": { ... } }
  • 6.Engine receives the output, persists it, and resumes the workflow from where it left off.
Tip
Workers are pull-based — no message broker needed. The engine uses PostgreSQL SKIP LOCKED for concurrent worker support, preventing double-claiming even under high concurrency.

Properties#

  • Pull-basedNo message broker needed. Workers poll at their own pace.
  • At-least-onceHeartbeat timeout (60s) + reaper (30s interval) ensures no stuck tasks.
  • Error classificationWorkers declare retryable vs permanent failures. Retryable returns the task to the queue; permanent marks the step failed.
  • Concurrent workersSKIP LOCKED prevents double-claiming. Scale horizontally by running more workers.
  • Language agnosticAny language that can make HTTP requests works. No SDK required.

Minimal Worker Example#

A worker is just a loop: poll for a task, execute it, report the result. Here is a complete minimal worker in TypeScript:

TypeScript
import { Orch8Client } from "@orch8/sdk";

const client = new Orch8Client({ baseUrl: "http://localhost:3001" });

async function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

while (true) {
  const task = await client.pollTask({
    handler_names: ["process_image"],
    worker_id: "img-worker-1",
  });

  if (!task) {
    await sleep(1000);
    continue;
  }

  try {
    const result = await processImage(task.params);
    await client.completeTask(task.id, { output: result });
  } catch (err: any) {
    await client.failTask(task.id, {
      error: err.message,
      retryable: true,
    });
  }
}

The same pattern applies in every language. Here it is in Python, Go, and as raw cURL calls:

Bash
# Poll for a task
TASK=$(curl -s -X POST http://localhost:3001/workers/tasks/poll \
  -H "Content-Type: application/json" \
  -d '{ "handler_names": ["process_image"], "worker_id": "img-worker-1" }')

TASK_ID=$(echo $TASK | jq -r '.id')

# Complete the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
  -H "Content-Type: application/json" \
  -d '{ "output": { "url": "https://cdn.example.com/processed.png" } }'

# Or fail the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/fail" \
  -H "Content-Type: application/json" \
  -d '{ "error": "Out of memory", "retryable": true }'

Worker API#

Workers interact with four endpoints. All requests and responses use JSON. Authentication depends on your deployment — see the configuration docs for API key and tenant header setup.

Poll for tasks#

POST/workers/tasks/poll

Long-polls for an available task matching the given filters. Returns a task object if one is available, or an empty response after the poll timeout (default 30s).

ParameterTypeRequiredDescription
handler_namesstring[]NoFilter by handler name. Only tasks with a matching handler are returned. Omit to accept any handler.
queue_namestringNoPoll from a specific named queue instead of the default queue.
worker_idstringNoUnique identifier for this worker instance. Used for observability and debugging.
batch_sizeintegerNoNumber of tasks to claim at once. Default: 1. Max: 50.
JSON
// Response  task claimed
{
  "id": "task_01J5K...",
  "handler_name": "process_image",
  "params": { "image_url": "https://...", "width": 800 },
  "context": { "instance_id": "inst_01J5K...", "step_id": "resize" },
  "metadata": { "attempt": 1, "created_at": "2025-01-15T10:30:00Z" }
}

Report success#

POST/workers/tasks/{id}/complete

Reports that the task completed successfully. The output is persisted as the step's result and the workflow resumes.

ParameterTypeRequiredDescription
outputobjectYesThe step output. This becomes available as steps.<step_id>.output in subsequent steps.
JSON
POST /workers/tasks/task_01J5K.../complete
{
  "output": {
    "resized_url": "https://cdn.example.com/800w.png",
    "bytes": 142038
  }
}

Report failure#

POST/workers/tasks/{id}/fail

Reports that the task failed. The retryable flag controls whether the task is re-queued or permanently failed.

ParameterTypeRequiredDescription
errorstringYesHuman-readable error message. Stored in the step's error output.
retryablebooleanNoIf true, the task is returned to the queue and the step's retry policy applies. Default: false.
JSON
POST /workers/tasks/task_01J5K.../fail
{
  "error": "GPU out of memory: requested 12GB, available 8GB",
  "retryable": true
}

Send heartbeat#

POST/workers/tasks/{id}/heartbeat

Extends the task lease. Send every 15-30 seconds for long-running tasks to prevent the reaper from reclaiming them. The request body is empty.

Bash
curl -X POST http://localhost:3001/workers/tasks/task_01J5K.../heartbeat

Heartbeat & Timeout#

The engine runs a background reaper that reclaims tasks from unresponsive workers. This ensures that crashed or stuck workers do not block workflows indefinitely.

  • Default heartbeat timeout60 seconds. If no heartbeat is received within this window, the task is considered abandoned.
  • Reaper intervalRuns every 30 seconds, scanning for tasks past their heartbeat deadline.
  • ReclamationAbandoned tasks are returned to the queue and become available for other workers to claim.
  • Recommended cadenceSend heartbeats every 15-30 seconds to stay well within the 60-second timeout.
Warning
Always send heartbeats for tasks exceeding 30 seconds. Without heartbeats, the reaper may reclaim your task and hand it to another worker, causing duplicate execution.

Here is a heartbeat loop running alongside a long task:

Bash
# Start heartbeat in background
while true; do
  curl -s -X POST "http://localhost:3001/workers/tasks/$TASK_ID/heartbeat"
  sleep 20
done &
HEARTBEAT_PID=$!

# Do the work
process_video "$INPUT_FILE" "$OUTPUT_FILE"

# Stop heartbeats and report result
kill $HEARTBEAT_PID
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
  -H "Content-Type: application/json" \
  -d '{ "output": { "status": "done" } }'

Queue Routing#

By default, all worker tasks land in a single default queue. Named queues let you route specific step types to dedicated worker pools. Common use cases:

  • GPU workersRoute image/video processing to GPU-equipped machines while keeping CPU workers lean.
  • Region-specificRun data-residency-sensitive tasks on workers deployed in the correct region.
  • Priority isolationGive paying customers a dedicated queue so they are never blocked by free-tier batch jobs.
  • Resource sizingRoute memory-intensive tasks to high-memory workers without over-provisioning the whole fleet.

Step configuration#

Assign a step to a named queue by setting queue_name in the step definition:

JSON
{
  "type": "step",
  "id": "render_video",
  "handler": "video_renderer",
  "queue_name": "gpu-workers",
  "params": {
    "input_url": "{{context.data.video_url}}",
    "resolution": "1080p"
  }
}

Polling from a queue#

Workers that serve a specific queue pass the queue_name parameter when polling. They will only receive tasks routed to that queue.

Bash
POST /workers/tasks/poll
{
  "queue_name": "gpu-workers",
  "handler_names": ["video_renderer", "image_upscaler"],
  "worker_id": "gpu-worker-us-east-1"
}

Workers that poll without a queue_name claim from the default queue only. Tasks routed to a named queue are never visible to default-queue workers — they wait until the correct worker type polls.

Note
Queue names are arbitrary strings. The engine creates them implicitly when a step references a new queue name — no upfront configuration is needed.

Error Handling#

When a worker reports a failure, the retryable flag determines what happens next. There are two distinct paths:

Retryable errors#

When retryable: true, the task is returned to the queue. The step's retry policy controls how many times it can be retried and the backoff between attempts.

TypeScript
// Worker encounters a transient error
try {
  const result = await callExternalAPI(task.params);
  await client.completeTask(task.id, { output: result });
} catch (err: any) {
  if (isTransient(err)) {
    // Task goes back to the queue; retry policy applies
    await client.failTask(task.id, {
      error: `Transient: ${err.message}`,
      retryable: true,
    });
  } else {
    // Step fails permanently — no retry
    await client.failTask(task.id, {
      error: `Fatal: ${err.message}`,
      retryable: false,
    });
  }
}

Permanent errors#

When retryable: false (the default), the step is marked as permanently failed. The workflow instance transitions to a failed state unless an error handler or fallback is configured at the sequence or workflow level.

Note
When a worker reports retryable: true, the task is re-queued and the step's retry policy applies (max attempts, exponential backoff). When retryable: false, the step fails permanently and no further retries are attempted.

Common classification patterns#

  • RetryableNetwork timeouts, 429 rate limits, 503 service unavailable, temporary resource exhaustion.
  • Permanent400 validation errors, 404 not found, authentication failures, business logic violations.
  • Worker crashIf the worker crashes without reporting, the heartbeat timeout triggers and the task is automatically re-queued.

Ready to try Orch8?

One command to install. Two minutes to your first workflow.

Bash
curl -fsSL https://orch8.io/start.sh | sh