[wip] universal workers

2026-03-21 07:32:11 -05:00
parent 0782675a2b
commit 8ba7e3bb84
59 changed files with 4971 additions and 34 deletions
--- a/docs/QUICKREF-agent-workers.md
+++ b/docs/QUICKREF-agent-workers.md
@@ -0,0 +1,219 @@
+# Quick Reference: Agent-Based Workers
+
+> **TL;DR**: Inject the `attune-agent` binary into _any_ container image to turn it into an Attune worker. No Dockerfiles. No Rust compilation. ~12 lines of YAML.
+
+## How It Works
+
+1. The `init-agent` service (in `docker-compose.yaml`) builds the statically-linked `attune-agent` binary and copies it into the `agent_bin` volume
+2. Your worker service mounts `agent_bin` read-only and uses the agent as its entrypoint
+3. On startup, the agent auto-detects available runtimes (Python, Ruby, Node.js, Shell, etc.)
+4. The worker registers with Attune and starts processing executions
+
+## Quick Start
+
+### Option A: Use the override file
+
+```bash
+# Start all services including the example Ruby agent worker
+docker compose -f docker-compose.yaml -f docker-compose.agent.yaml up -d
+```
+
+The `docker-compose.agent.yaml` file includes a ready-to-use Ruby worker and commented-out templates for Python 3.12, GPU, and custom images.
+
+### Option B: Add to docker-compose.override.yaml
+
+Create a `docker-compose.override.yaml` in the project root:
+
+```yaml
+services:
+  worker-my-runtime:
+    image: my-org/my-custom-image:latest
+    container_name: attune-worker-my-runtime
+    depends_on:
+      init-agent:
+        condition: service_completed_successfully
+      init-packs:
+        condition: service_completed_successfully
+      migrations:
+        condition: service_completed_successfully
+      postgres:
+        condition: service_healthy
+      rabbitmq:
+        condition: service_healthy
+    entrypoint: ["/opt/attune/agent/attune-agent"]
+    stop_grace_period: 45s
+    environment:
+      RUST_LOG: info
+      ATTUNE_CONFIG: /opt/attune/config/config.yaml
+      ATTUNE_WORKER_NAME: worker-my-runtime-01
+      ATTUNE_WORKER_TYPE: container
+      ATTUNE__SECURITY__JWT_SECRET: ${JWT_SECRET:-docker-dev-secret-change-in-production}
+      ATTUNE__SECURITY__ENCRYPTION_KEY: ${ENCRYPTION_KEY:-docker-dev-encryption-key-please-change-in-production-32plus}
+      ATTUNE__DATABASE__URL: postgresql://attune:attune@postgres:5432/attune
+      ATTUNE__MESSAGE_QUEUE__URL: amqp://attune:attune@rabbitmq:5672
+      ATTUNE_API_URL: http://attune-api:8080
+    volumes:
+      - agent_bin:/opt/attune/agent:ro
+      - ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
+      - packs_data:/opt/attune/packs:ro
+      - runtime_envs:/opt/attune/runtime_envs
+      - artifacts_data:/opt/attune/artifacts
+    healthcheck:
+      test: ["CMD-SHELL", "pgrep -f attune-agent || exit 1"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 20s
+    networks:
+      - attune-network
+    restart: unless-stopped
+```
+
+Then run:
+
+```bash
+docker compose up -d
+```
+
+Docker Compose automatically merges `docker-compose.override.yaml`.
+
+## Required Volumes
+
+Every agent worker needs these volumes:
+
+| Volume | Mount Path | Mode | Purpose |
+|--------|-----------|------|---------|
+| `agent_bin` | `/opt/attune/agent` | `ro` | The statically-linked agent binary |
+| `packs_data` | `/opt/attune/packs` | `ro` | Pack files (actions, workflows, etc.) |
+| `runtime_envs` | `/opt/attune/runtime_envs` | `rw` | Isolated runtime environments (venvs, node_modules) |
+| `artifacts_data` | `/opt/attune/artifacts` | `rw` | File-backed artifact storage |
+| Config YAML | `/opt/attune/config/config.yaml` | `ro` | Attune configuration |
+
+## Required Environment Variables
+
+| Variable | Description | Example |
+|----------|-------------|---------|
+| `ATTUNE_CONFIG` | Path to config file inside container | `/opt/attune/config/config.yaml` |
+| `ATTUNE_WORKER_NAME` | Unique worker name | `worker-ruby-01` |
+| `ATTUNE_WORKER_TYPE` | Worker type | `container` |
+| `ATTUNE__DATABASE__URL` | PostgreSQL connection string | `postgresql://attune:attune@postgres:5432/attune` |
+| `ATTUNE__MESSAGE_QUEUE__URL` | RabbitMQ connection string | `amqp://attune:attune@rabbitmq:5672` |
+| `ATTUNE__SECURITY__JWT_SECRET` | JWT signing secret | (use env var) |
+| `ATTUNE__SECURITY__ENCRYPTION_KEY` | Encryption key for secrets | (use env var) |
+
+### Optional Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `ATTUNE_WORKER_RUNTIMES` | Override auto-detection | Auto-detected |
+| `ATTUNE_API_URL` | API URL for token generation | `http://attune-api:8080` |
+| `RUST_LOG` | Log level | `info` |
+
+## Runtime Auto-Detection
+
+The agent probes for these runtimes automatically:
+
+| Runtime | Probed Binaries |
+|---------|----------------|
+| Shell | `bash`, `sh` |
+| Python | `python3`, `python` |
+| Node.js | `node`, `nodejs` |
+| Ruby | `ruby` |
+| Go | `go` |
+| Java | `java` |
+| R | `Rscript` |
+| Perl | `perl` |
+
+To override, set `ATTUNE_WORKER_RUNTIMES`:
+
+```yaml
+environment:
+  ATTUNE_WORKER_RUNTIMES: python,shell  # Only advertise Python and Shell
+```
+
+## Testing Detection
+
+Run the agent in detect-only mode to see what it finds:
+
+```bash
+# In a running container
+docker exec <container> /opt/attune/agent/attune-agent --detect-only
+
+# Or start a throwaway container
+docker run --rm -v agent_bin:/opt/attune/agent:ro ruby:3.3-slim /opt/attune/agent/attune-agent --detect-only
+```
+
+## Examples
+
+### Ruby Worker
+```yaml
+worker-ruby:
+  image: ruby:3.3-slim
+  entrypoint: ["/opt/attune/agent/attune-agent"]
+  # ... (standard depends_on, volumes, env, networks)
+```
+
+### Node.js 22 Worker
+```yaml
+worker-node22:
+  image: node:22-slim
+  entrypoint: ["/opt/attune/agent/attune-agent"]
+  # ...
+```
+
+### GPU Worker (NVIDIA CUDA)
+```yaml
+worker-gpu:
+  image: nvidia/cuda:12.3.1-runtime-ubuntu22.04
+  runtime: nvidia
+  entrypoint: ["/opt/attune/agent/attune-agent"]
+  environment:
+    ATTUNE_WORKER_RUNTIMES: python,shell  # Override — CUDA image has python
+  # ...
+```
+
+### Multi-Runtime Custom Image
+```yaml
+worker-data-science:
+  image: my-org/data-science:latest  # Has Python, R, and Julia
+  entrypoint: ["/opt/attune/agent/attune-agent"]
+  # Agent auto-detects all available runtimes
+  # ...
+```
+
+## Comparison: Traditional vs Agent Workers
+
+| Aspect | Traditional Worker | Agent Worker |
+|--------|-------------------|--------------|
+| Docker build | Required (5+ min) | None |
+| Dockerfile | Custom per runtime | Not needed |
+| Base image | `debian:bookworm-slim` | Any image |
+| Runtime install | Via apt/NodeSource | Pre-installed in image |
+| Configuration | Manual `ATTUNE_WORKER_RUNTIMES` | Auto-detected |
+| Binary | Compiled into image | Injected via volume |
+| Update cycle | Rebuild image | Restart `init-agent` |
+
+## Troubleshooting
+
+### Agent binary not found
+```
+exec /opt/attune/agent/attune-agent: no such file or directory
+```
+The `init-agent` service hasn't completed. Check:
+```bash
+docker compose logs init-agent
+```
+
+### "No runtimes detected"
+The container image doesn't have any recognized interpreters in `$PATH`. Either:
+- Use an image that includes your runtime (e.g., `ruby:3.3-slim`)
+- Set `ATTUNE_WORKER_RUNTIMES` manually
+
+### Connection refused to PostgreSQL/RabbitMQ
+Ensure your `depends_on` conditions include `postgres` and `rabbitmq` health checks, and that the container is on the `attune-network`.
+
+## See Also
+
+- [Universal Worker Agent Plan](plans/universal-worker-agent.md) — Full architecture document
+- [Docker Deployment](docker-deployment.md) — General Docker setup
+- [Worker Service](architecture/worker-service.md) — Worker architecture details
--- a/docs/QUICKREF-kubernetes-agent-workers.md
+++ b/docs/QUICKREF-kubernetes-agent-workers.md
@@ -0,0 +1,146 @@
+# Quick Reference: Kubernetes Agent Workers
+
+Agent-based workers let you run Attune actions inside **any container image** by injecting a statically-linked `attune-agent` binary via a Kubernetes init container. No custom Dockerfile required — just point at an image that has your runtime installed.
+
+## How It Works
+
+1. An **init container** (`agent-loader`) copies the `attune-agent` binary from the `attune-agent` image into an `emptyDir` volume
+2. The **worker container** uses your chosen image (e.g., `ruby:3.3`) and runs the agent binary as its entrypoint
+3. The agent **auto-detects** available runtimes (python, ruby, node, shell, etc.) and registers with Attune
+4. Actions targeting those runtimes are routed to the agent worker via RabbitMQ
+
+## Helm Values
+
+Add entries to `agentWorkers` in your `values.yaml`:
+
+```yaml
+agentWorkers:
+  - name: ruby
+    image: ruby:3.3
+    replicas: 2
+
+  - name: python-gpu
+    image: nvidia/cuda:12.3.1-runtime-ubuntu22.04
+    replicas: 1
+    runtimes: [python, shell]
+    runtimeClassName: nvidia
+    nodeSelector:
+      gpu: "true"
+    tolerations:
+      - key: nvidia.com/gpu
+        operator: Exists
+        effect: NoSchedule
+    resources:
+      limits:
+        nvidia.com/gpu: 1
+
+  - name: custom
+    image: my-org/my-custom-image:latest
+    replicas: 1
+    env:
+      - name: MY_CUSTOM_VAR
+        value: my-value
+```
+
+### Supported Fields
+
+| Field | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `name` | Yes | — | Unique name (used in Deployment and worker names) |
+| `image` | Yes | — | Container image with your desired runtime(s) |
+| `replicas` | No | `1` | Number of pod replicas |
+| `runtimes` | No | `[]` (auto-detect) | List of runtimes to expose (e.g., `[python, shell]`) |
+| `resources` | No | `{}` | Kubernetes resource requests/limits |
+| `env` | No | — | Extra environment variables (`[{name, value}]`) |
+| `imagePullPolicy` | No | — | Pull policy for the worker image |
+| `logLevel` | No | `info` | `RUST_LOG` level |
+| `runtimeClassName` | No | — | Kubernetes RuntimeClass (e.g., `nvidia`) |
+| `nodeSelector` | No | — | Node selector for pod scheduling |
+| `tolerations` | No | — | Tolerations for pod scheduling |
+| `stopGracePeriod` | No | `45` | Termination grace period (seconds) |
+
+## Install / Upgrade
+
+```bash
+helm upgrade --install attune oci://registry.example.com/namespace/helm/attune \
+  --version 0.3.0 \
+  --set global.imageRegistry=registry.example.com \
+  --set global.imageNamespace=namespace \
+  --set global.imageTag=0.3.0 \
+  -f my-values.yaml
+```
+
+## What Gets Created
+
+For each `agentWorkers` entry, the chart creates a Deployment named `<release>-attune-agent-worker-<name>` with:
+
+- **Init containers**:
+  - `agent-loader` — copies the agent binary from the `attune-agent` image to an `emptyDir` volume
+  - `wait-for-schema` — polls PostgreSQL until the Attune schema is ready
+  - `wait-for-packs` — waits for the core pack to be available on the shared PVC
+- **Worker container** — runs `attune-agent` as the entrypoint inside your chosen image
+- **Volumes**: `agent-bin` (emptyDir), `config` (ConfigMap), `packs` (PVC, read-only), `runtime-envs` (PVC), `artifacts` (PVC)
+
+## Runtime Auto-Detection
+
+When `runtimes` is empty (the default), the agent probes the container for interpreters:
+
+| Runtime | Probed Binaries |
+|---------|----------------|
+| Shell | `bash`, `sh` |
+| Python | `python3`, `python` |
+| Node.js | `node`, `nodejs` |
+| Ruby | `ruby` |
+| Go | `go` |
+| Java | `java` |
+| R | `Rscript` |
+| Perl | `perl` |
+
+Set `runtimes` explicitly to skip auto-detection and only register the listed runtimes.
+
+## Prerequisites
+
+- The `attune-agent` image must be available in your registry (built from `docker/Dockerfile.agent`, target `agent-init`)
+- Shared PVCs (`packs`, `runtime-envs`, `artifacts`) must support `ReadWriteMany` if agent workers run on different nodes than the standard worker
+- The Attune database and RabbitMQ must be reachable from agent worker pods
+
+## Differences from the Standard Worker
+
+| Aspect | Standard Worker (`worker`) | Agent Worker (`agentWorkers`) |
+|--------|---------------------------|-------------------------------|
+| Image | Built from `Dockerfile.worker.optimized` | Any image (ruby, python, cuda, etc.) |
+| Binary | Baked into the image | Injected via init container |
+| Runtimes | Configured at build time | Auto-detected or explicitly listed |
+| Use case | Known, pre-built runtime combos | Custom images, exotic runtimes, GPU |
+
+Both worker types coexist — actions are routed to whichever worker has the matching runtime registered.
+
+## Troubleshooting
+
+**Agent binary not found**: Check that the `agent-loader` init container completed. View its logs:
+```bash
+kubectl logs <pod> -c agent-loader
+```
+
+**Runtime not detected**: Run the agent with `--detect-only` to see what it finds:
+```bash
+kubectl exec <pod> -c worker -- /opt/attune/agent/attune-agent --detect-only
+```
+
+**Worker not registering**: Check the worker container logs for database/MQ connectivity:
+```bash
+kubectl logs <pod> -c worker
+```
+
+**Packs not available**: Ensure the `init-packs` job has completed and the PVC is mounted:
+```bash
+kubectl get jobs | grep init-packs
+kubectl exec <pod> -c worker -- ls /opt/attune/packs/core/
+```
+
+## See Also
+
+- [Agent Workers (Docker Compose)](QUICKREF-agent-workers.md)
+- [Universal Worker Agent Plan](plans/universal-worker-agent.md)
+- [Gitea Registry and Helm](deployment/gitea-registry-and-helm.md)
+- [Production Deployment](deployment/production-deployment.md)
--- a/docs/deployment/gitea-registry-and-helm.md
+++ b/docs/deployment/gitea-registry-and-helm.md
@@ -19,6 +19,7 @@ The workflow publishes these images to the Gitea OCI registry:
 - `attune-migrations`
 - `attune-init-user`
 - `attune-init-packs`
+- `attune-agent`

 The Helm chart is pushed as an OCI chart to:

--- a/docs/plans/universal-worker-agent.md
+++ b/docs/plans/universal-worker-agent.md