External Workers

Any step handler not registered as a built-in is automatically dispatched to the external worker queue. Workers can be written in any language — they just poll a REST endpoint, execute the task, and report the result back. No message broker, no sidecar, no SDK lock-in.

How It Works#

External workers follow a simple poll-execute-report loop. The engine manages the task queue internally using PostgreSQL, and workers communicate over plain HTTP.

1.Engine encounters an unknown handler name → creates a worker task in the queue with the step's params and context data.
2.Worker polls: POST /workers/tasks/poll { "handler_names": ["process_image"] }
3.Engine returns the task payload including params, context data, and metadata (instance ID, step ID, attempt number).
4.Worker executes the task. For long-running work, it sends periodic heartbeats to keep the lease alive.
5.Worker reports the result: POST /workers/tasks/{id}/complete { "output": { ... } }
6.Engine receives the output, persists it, and resumes the workflow from where it left off.

Tip

Workers are pull-based — no message broker needed. The engine uses PostgreSQL SKIP LOCKED for concurrent worker support, preventing double-claiming even under high concurrency.

Properties#

Pull-basedNo message broker needed. Workers poll at their own pace.
At-least-onceHeartbeat timeout (60s) + reaper (30s interval) ensures no stuck tasks.
Error classificationWorkers declare retryable vs permanent failures. Retryable returns the task to the queue; permanent marks the step failed.
Concurrent workersSKIP LOCKED prevents double-claiming. Scale horizontally by running more workers.
Language agnosticAny language that can make HTTP requests works. No SDK required.

Minimal Worker Example#

A worker is just a loop: poll for a task, execute it, report the result. Here is a complete minimal worker in TypeScript:

TypeScript

import { Orch8Client } from "@orch8/sdk";

const client = new Orch8Client({ baseUrl: "http://localhost:3001" });

async function sleep(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

while (true) {
  const task = await client.pollTask({
    handler_names: ["process_image"],
    worker_id: "img-worker-1",
  });

  if (!task) {
    await sleep(1000);
    continue;
  }

  try {
    const result = await processImage(task.params);
    await client.completeTask(task.id, { output: result });
  } catch (err: any) {
    await client.failTask(task.id, {
      error: err.message,
      retryable: true,
    });
  }
}

The same pattern applies in every language. Here it is in Python, Go, and as raw cURL calls:

Bash

# Poll for a task
TASK=$(curl -s -X POST http://localhost:3001/workers/tasks/poll \
  -H "Content-Type: application/json" \
  -d '{ "handler_names": ["process_image"], "worker_id": "img-worker-1" }')

TASK_ID=$(echo $TASK | jq -r '.id')

# Complete the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
  -H "Content-Type: application/json" \
  -d '{ "output": { "url": "https://cdn.example.com/processed.png" } }'

# Or fail the task
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/fail" \
  -H "Content-Type: application/json" \
  -d '{ "error": "Out of memory", "retryable": true }'

Worker API#

Workers interact with four endpoints. All requests and responses use JSON. Authentication depends on your deployment — see the configuration docs for API key and tenant header setup.

Poll for tasks#

POST/workers/tasks/poll

Long-polls for an available task matching the given filters. Returns a task object if one is available, or an empty response after the poll timeout (default 30s).

Parameter	Type	Required	Description
handler_names	string[]	No	Filter by handler name. Only tasks with a matching handler are returned. Omit to accept any handler.
queue_name	string	No	Poll from a specific named queue instead of the default queue.
worker_id	string	No	Unique identifier for this worker instance. Used for observability and debugging.
batch_size	integer	No	Number of tasks to claim at once. Default: 1. Max: 50.

JSON

// Response — task claimed
{
  "id": "task_01J5K...",
  "handler_name": "process_image",
  "params": { "image_url": "https://...", "width": 800 },
  "context": { "instance_id": "inst_01J5K...", "step_id": "resize" },
  "metadata": { "attempt": 1, "created_at": "2025-01-15T10:30:00Z" }
}

Report success#

POST/workers/tasks/{id}/complete

Reports that the task completed successfully. The output is persisted as the step's result and the workflow resumes.

Parameter	Type	Required	Description
output	object	Yes	The step output. This becomes available as steps.<step_id>.output in subsequent steps.

JSON

POST /workers/tasks/task_01J5K.../complete
{
  "output": {
    "resized_url": "https://cdn.example.com/800w.png",
    "bytes": 142038
  }
}

Report failure#

POST/workers/tasks/{id}/fail

Reports that the task failed. The retryable flag controls whether the task is re-queued or permanently failed.

Parameter	Type	Required	Description
error	string	Yes	Human-readable error message. Stored in the step's error output.
retryable	boolean	No	If true, the task is returned to the queue and the step's retry policy applies. Default: false.

JSON

POST /workers/tasks/task_01J5K.../fail
{
  "error": "GPU out of memory: requested 12GB, available 8GB",
  "retryable": true
}

Send heartbeat#

POST/workers/tasks/{id}/heartbeat

Extends the task lease. Send every 15-30 seconds for long-running tasks to prevent the reaper from reclaiming them. The request body is empty.

Bash

curl -X POST http://localhost:3001/workers/tasks/task_01J5K.../heartbeat

Heartbeat & Timeout#

The engine runs a background reaper that reclaims tasks from unresponsive workers. This ensures that crashed or stuck workers do not block workflows indefinitely.

Default heartbeat timeout60 seconds. If no heartbeat is received within this window, the task is considered abandoned.
Reaper intervalRuns every 30 seconds, scanning for tasks past their heartbeat deadline.
ReclamationAbandoned tasks are returned to the queue and become available for other workers to claim.
Recommended cadenceSend heartbeats every 15-30 seconds to stay well within the 60-second timeout.

Warning

Always send heartbeats for tasks exceeding 30 seconds. Without heartbeats, the reaper may reclaim your task and hand it to another worker, causing duplicate execution.

Here is a heartbeat loop running alongside a long task:

Bash

# Start heartbeat in background
while true; do
  curl -s -X POST "http://localhost:3001/workers/tasks/$TASK_ID/heartbeat"
  sleep 20
done &
HEARTBEAT_PID=$!

# Do the work
process_video "$INPUT_FILE" "$OUTPUT_FILE"

# Stop heartbeats and report result
kill $HEARTBEAT_PID
curl -X POST "http://localhost:3001/workers/tasks/$TASK_ID/complete" \
  -H "Content-Type: application/json" \
  -d '{ "output": { "status": "done" } }'

Queue Routing#

By default, all worker tasks land in a single default queue. Named queues let you route specific step types to dedicated worker pools. Common use cases:

GPU workersRoute image/video processing to GPU-equipped machines while keeping CPU workers lean.
Region-specificRun data-residency-sensitive tasks on workers deployed in the correct region.
Priority isolationGive paying customers a dedicated queue so they are never blocked by free-tier batch jobs.
Resource sizingRoute memory-intensive tasks to high-memory workers without over-provisioning the whole fleet.

Step configuration#

Assign a step to a named queue by setting queue_name in the step definition:

JSON

{
  "type": "step",
  "id": "render_video",
  "handler": "video_renderer",
  "queue_name": "gpu-workers",
  "params": {
    "input_url": "{{context.data.video_url}}",
    "resolution": "1080p"
  }
}

Polling from a queue#

Workers that serve a specific queue pass the queue_name parameter when polling. They will only receive tasks routed to that queue.

Bash

POST /workers/tasks/poll
{
  "queue_name": "gpu-workers",
  "handler_names": ["video_renderer", "image_upscaler"],
  "worker_id": "gpu-worker-us-east-1"
}

Workers that poll without a queue_name claim from the default queue only. Tasks routed to a named queue are never visible to default-queue workers — they wait until the correct worker type polls.

Note

Queue names are arbitrary strings. The engine creates them implicitly when a step references a new queue name — no upfront configuration is needed.

Error Handling#

When a worker reports a failure, the retryable flag determines what happens next. There are two distinct paths:

Retryable errors#

When retryable: true, the task is returned to the queue. The step's retry policy controls how many times it can be retried and the backoff between attempts.

TypeScript

// Worker encounters a transient error
try {
  const result = await callExternalAPI(task.params);
  await client.completeTask(task.id, { output: result });
} catch (err: any) {
  if (isTransient(err)) {
    // Task goes back to the queue; retry policy applies
    await client.failTask(task.id, {
      error: `Transient: ${err.message}`,
      retryable: true,
    });
  } else {
    // Step fails permanently — no retry
    await client.failTask(task.id, {
      error: `Fatal: ${err.message}`,
      retryable: false,
    });
  }
}

Permanent errors#

When retryable: false (the default), the step is marked as permanently failed. The workflow instance transitions to a failed state unless an error handler or fallback is configured at the sequence or workflow level.

Note

When a worker reports retryable: true, the task is re-queued and the step's retry policy applies (max attempts, exponential backoff). When retryable: false, the step fails permanently and no further retries are attempted.

Common classification patterns#

RetryableNetwork timeouts, 429 rate limits, 503 service unavailable, temporary resource exhaustion.
Permanent400 validation errors, 404 not found, authentication failures, business logic violations.
Worker crashIf the worker crashes without reporting, the heartbeat timeout triggers and the task is automatically re-queued.

Ready to try Orch8?

One command to install. Two minutes to your first workflow.

Bash

curl -fsSL https://orch8.io/start.sh | sh

Quick Start Guide Or skip the setup with Cloud →

Previous← Template Interpolation NextConcurrency & Rate Limits →