# Universal Worker Agent Injection ## Overview This plan describes a new deployment model for Attune workers: a **statically-linked agent binary** that can be injected into any Docker container at runtime, turning arbitrary images into Attune workers. This eliminates the need to build custom worker Docker images for each runtime environment. ### Problem Today, every Attune worker is a purpose-built Docker image: the same `attune-worker` Rust binary baked into Debian images with specific interpreters installed (see `docker/Dockerfile.worker.optimized`). Adding a new runtime (e.g., Ruby, Go, Java, R) means: 1. Modifying `Dockerfile.worker.optimized` to add a new build stage 2. Installing the interpreter via apt or a package repository 3. Managing the combinatorial explosion of worker variants 4. Rebuilding images (~5 min) for every change 5. Standardizing on `debian:bookworm-slim` as the base (not the runtime's official image) ### Solution Flip the model: **any Docker image becomes an Attune worker** by injecting a lightweight agent binary at container startup. The agent binary is a statically-linked (musl) Rust executable that connects to MQ/DB, consumes execution messages, spawns subprocesses, and reports results — functionally identical to the current worker, but packaged for universal deployment. Want Ruby support? Point at `ruby:3.3` and go. Need a GPU runtime? Use `nvidia/cuda:12.3-runtime`. Need a specific Python version with scientific libraries pre-installed? Use any image that has them. ### Industry Precedent This pattern is battle-tested in major CI/CD and workflow systems: | System | Pattern | How It Works | |--------|---------|-------------| | **Tekton** | InitContainer + shared volume | Copies a static Go `entrypoint` binary into an `emptyDir`; overrides the user container's entrypoint to use it. Steps coordinate via file-based signaling. | | **Argo Workflows (Emissary)** | InitContainer + sidecar | The `emissary` binary runs as both an init container and a sidecar. Disk-based coordination, no Docker socket, no privileged access. | | **GitLab CI Runner (Step Runner)** | Binary injection | Newer "Native Step Runner" mode injects a `step-runner` binary into the build container and adjusts `$PATH`. Communicates via gRPC. | | **Istio** | Mutating webhook | Kubernetes admission controller adds init + sidecar containers transparently. | The **Tekton/Argo pattern** (static binary + shared volume) is the best fit for Attune because: - It works with Docker Compose (not K8s-only) via bind mounts / named volumes - It requires zero dependencies in the user image (just a Linux kernel) - A static Rust binary (musl-linked) is ~15–25MB and runs anywhere - No privileged access, no Docker socket needed inside the container ### Compatibility This plan is **purely additive**. Nothing changes for existing workers: - `Dockerfile.worker.optimized` and its four targets remain unchanged and functional - Current `docker-compose.yaml` worker services keep working - All MQ protocols, DB schemas, and execution flows remain identical - The agent is just another way to run the same execution engine ## Architecture ``` ┌──────────────────────────────────────────────────────────────────┐ │ Attune Control Plane │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ API │ │ Executor │ │ RabbitMQ │ │ Postgres │ │ │ └────┬─────┘ └────┬─────┘ └─────┬────┘ └────┬─────┘ │ │ │ │ │ │ │ └───────┼───────────────┼───────────────┼───────────────┼────────┘ │ │ │ │ ┌─────┼───────────────┼───────────────┼───────────────┼──────┐ │ ▼ ▼ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ attune-agent (injected binary) │ │ │ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐ │ │ │ │ │MQ Client │ │DB Client │ │ Process │ │Artifact│ │ │ │ │ │(lapin) │ │(sqlx) │ │Executor │ │Manager │ │ │ │ │ └──────────┘ └──────────┘ └─────────┘ └────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ User Container (ANY Docker image) │ │ │ │ ruby:3.3, python:3.12, nvidia/cuda, alpine, ... │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ Shared Volumes: │ │ /opt/attune/agent/ (agent binary, read-only) │ │ /opt/attune/packs/ (pack files, read-only) │ │ /opt/attune/runtime_envs/(virtualenvs, node_modules) │ │ /opt/attune/artifacts/ (artifact files) │ └────────────────────────────────────────────────────────────┘ ``` ### Agent vs. Full Worker Comparison The agent binary is functionally identical to the current `attune-worker`. The difference is packaging and startup behavior: | Capability | Full Worker (`attune-worker`) | Agent (`attune-agent`) | |-----------|------------------------------|------------------------| | MQ consumption | ✅ | ✅ | | DB access | ✅ | ✅ | | Process execution | ✅ | ✅ | | Artifact management | ✅ | ✅ | | Secret management | ✅ | ✅ | | Cancellation / timeout | ✅ | ✅ | | Heartbeat | ✅ | ✅ | | Runtime env setup (venvs) | ✅ Proactive at startup | ✅ Lazy on first use | | Version verification | ✅ Full sweep at startup | ✅ On-demand per-execution | | Runtime discovery | Manual (`ATTUNE_WORKER_RUNTIMES`) | Auto-detect + optional manual override | | Linking | Dynamic (glibc) | Static (musl) | | Base image requirement | `debian:bookworm-slim` | None (any Linux container) | | Binary size | ~30–50MB | ~15–25MB (stripped, musl) | ### Binary Distribution Methods Two methods for getting the agent binary into a container: **Method A: Shared Volume (Docker Compose — recommended)** An init container copies the agent binary into a Docker named volume. User containers mount this volume read-only and use the binary as their entrypoint. **Method B: HTTP Download (remote / cloud deployments)** A new API endpoint (`GET /api/v1/agent/binary`) serves the static binary. A small wrapper script in the container downloads it on first run. Useful for Kubernetes, ECS, or remote Docker hosts where shared volumes are impractical. ## Implementation Phases ### Phase 1: Static Binary Build Infrastructure **Goal**: Produce a statically-linked `attune-agent` binary that runs in any Linux container. **Effort**: 3–5 days **Dependencies**: None #### 1.1 TLS Backend Audit and Alignment The agent must link statically with musl. This requires all TLS to use `rustls` (pure Rust) instead of OpenSSL/native-tls. **Current state** (from `Cargo.toml` workspace dependencies): - `sqlx`: Already uses `runtime-tokio-rustls` ✅ - `reqwest`: Uses default features (native-tls) — needs `rustls-tls` feature ❌ - `tokio-tungstenite`: Uses `native-tls` feature — needs `rustls` ❌ - `lapin` (v4.3): Uses native-tls by default — needs `rustls` feature ❌ **Changes needed in workspace `Cargo.toml`**: ```toml # Change reqwest to use rustls reqwest = { version = "0.13", features = ["json", "rustls-tls"], default-features = false } # Change tokio-tungstenite to use rustls tokio-tungstenite = { version = "0.28", features = ["rustls"] } # Check lapin's TLS features — if using amqps://, need rustls support. # For plain amqp:// (typical in Docker Compose), no TLS needed. # For production amqps://, evaluate lapin's rustls support or use a TLS-terminating proxy. ``` **Important**: These changes affect the entire workspace. Test all services (`api`, `executor`, `worker`, `notifier`, `sensor`, `cli`) after switching TLS backends. If switching workspace-wide is too disruptive, use feature flags to conditionally select the TLS backend for the agent build only. **Alternative**: If workspace-wide rustls migration is too risky, the agent crate can override specific dependencies: ```toml [dependencies] reqwest = { workspace = true, default-features = false, features = ["json", "rustls-tls"] } ``` #### 1.2 New Crate or New Binary Target **Option A (recommended): New binary target in the worker crate** Add a second binary target to `crates/worker/Cargo.toml`: ```toml [[bin]] name = "attune-worker" path = "src/main.rs" [[bin]] name = "attune-agent" path = "src/agent_main.rs" ``` This reuses all existing code — `ActionExecutor`, `ProcessRuntime`, `WorkerService`, `RuntimeRegistry`, `SecretManager`, `ArtifactManager`, etc. The agent entrypoint is a thin wrapper with different startup behavior (auto-detection instead of manual config). **Pros**: Zero code duplication. Same test suite covers both binaries. **Cons**: The agent binary includes unused code paths (e.g., full worker service setup). **Option B: New crate `crates/agent/`** ``` crates/agent/ ├── Cargo.toml # Depends on attune-common + selected worker modules ├── src/ │ ├── main.rs # Entry point │ ├── agent.rs # Core agent loop │ ├── detect.rs # Runtime auto-detection │ └── health.rs # Health check (file-based or tiny HTTP) ``` **Pros**: Cleaner separation, can minimize binary size by excluding unused deps. **Cons**: Requires extracting shared execution code into a library or duplicating it. **Recommendation**: Start with **Option A** (new binary target in worker crate) for speed. Refactor into a separate crate later if binary size becomes a concern. #### 1.3 Agent Entrypoint (`src/agent_main.rs`) The agent entrypoint differs from `main.rs` in: 1. **Runtime auto-detection** instead of relying on `ATTUNE_WORKER_RUNTIMES` 2. **Lazy environment setup** instead of proactive startup sweep 3. **Simplified config loading** — env vars are the primary config source (no config file required, but supported if mounted) 4. **Container-aware defaults** — sensible defaults for paths, timeouts, concurrency ``` src/agent_main.rs responsibilities: 1. Parse CLI args / env vars for DB URL, MQ URL, worker name 2. Run runtime auto-detection (Phase 2) to discover available interpreters 3. Initialize WorkerService with detected capabilities 4. Start the normal execution consumer loop 5. Handle SIGTERM/SIGINT for graceful shutdown ``` #### 1.4 Dockerfile for Agent Binary Create `docker/Dockerfile.agent`: ```dockerfile # Stage 1: Build the statically-linked agent binary FROM rust:1.83-bookworm AS builder RUN apt-get update && apt-get install -y musl-tools RUN rustup target add x86_64-unknown-linux-musl WORKDIR /build ENV RUST_MIN_STACK=67108864 COPY Cargo.toml Cargo.lock ./ COPY crates/ ./crates/ COPY migrations/ ./migrations/ COPY .sqlx/ ./.sqlx/ # Build only the agent binary, statically linked RUN --mount=type=cache,target=/usr/local/cargo/registry,sharing=shared \ --mount=type=cache,target=/usr/local/cargo/git,sharing=shared \ --mount=type=cache,id=agent-target,target=/build/target \ SQLX_OFFLINE=true cargo build --release \ --target x86_64-unknown-linux-musl \ --bin attune-agent \ && cp /build/target/x86_64-unknown-linux-musl/release/attune-agent /attune-agent \ && strip /attune-agent # Stage 2: Minimal image for volume population FROM scratch AS agent-binary COPY --from=builder /attune-agent /attune-agent ``` **Multi-architecture support**: For ARM64 (Apple Silicon, Graviton), add a parallel build stage targeting `aarch64-unknown-linux-musl`. Use Docker buildx multi-platform builds or separate images. #### 1.5 Makefile Targets Add to `Makefile`: ```makefile build-agent: SQLX_OFFLINE=true cargo build --release --target x86_64-unknown-linux-musl --bin attune-agent strip target/x86_64-unknown-linux-musl/release/attune-agent docker-build-agent: docker buildx build -f docker/Dockerfile.agent -t attune-agent:latest . ``` --- ### Phase 2: Runtime Auto-Detection **Goal**: The agent automatically discovers what interpreters are available in the container, without requiring `ATTUNE_WORKER_RUNTIMES` to be set. **Effort**: 1–2 days **Dependencies**: Phase 1 (agent binary exists) #### 2.1 Interpreter Discovery Module Create a new module (in `crates/worker/src/` or `crates/common/src/`) that probes the container's filesystem for known interpreters: ``` src/runtime_detect.rs (or extend existing crates/common/src/runtime_detection.rs) struct DetectedInterpreter { runtime_name: String, // "python", "ruby", "node", etc. binary_path: PathBuf, // "/usr/local/bin/python3" version: Option, // "3.12.1" (parsed from version command output) } /// Probe the container for available interpreters. /// /// For each known runtime, checks common binary names via `which` or /// direct path existence, then runs the version command to extract /// the version string. fn detect_interpreters() -> Vec { let probes = [ InterpreterProbe { runtime_name: "python", binaries: &["python3", "python"], version_flag: "--version", version_regex: r"Python (\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "node", binaries: &["node", "nodejs"], version_flag: "--version", version_regex: r"v(\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "ruby", binaries: &["ruby"], version_flag: "--version", version_regex: r"ruby (\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "go", binaries: &["go"], version_flag: "version", version_regex: r"go(\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "java", binaries: &["java"], version_flag: "-version", version_regex: r#""(\d+[\.\d+]*)""#, }, InterpreterProbe { runtime_name: "perl", binaries: &["perl"], version_flag: "--version", version_regex: r"v(\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "r", binaries: &["Rscript", "R"], version_flag: "--version", version_regex: r"R.*version (\d+\.\d+\.\d+)", }, InterpreterProbe { runtime_name: "shell", binaries: &["bash", "sh"], version_flag: "--version", version_regex: r"(\d+\.\d+\.\d+)", }, ]; // For each probe: // 1. Run `which ` or check known paths // 2. If found, run ` ` with a short timeout (2s) // 3. Parse version from output using the regex // 4. Return DetectedInterpreter with the results } ``` **Integration with existing code**: The existing `crates/common/src/runtime_detection.rs` already has `normalize_runtime_name()` and alias groups. The auto-detection module should use these for matching detected interpreters against DB runtime records. #### 2.2 Integration with Worker Registration The agent startup sequence: 1. Run `detect_interpreters()` 2. Match detected interpreters against known runtimes in the `runtime` table (using alias-aware matching from `runtime_detection.rs`) 3. If `ATTUNE_WORKER_RUNTIMES` is set, use it as an override (intersection or union — TBD, probably override wins) 4. Register the worker with the detected/configured capabilities 5. Log what was detected for debugging: ``` [INFO] Detected runtimes: python 3.12.1 (/usr/local/bin/python3), ruby 3.3.0 (/usr/local/bin/ruby), shell 5.2.21 (/bin/bash) [INFO] Registering worker with capabilities: [python, ruby, shell] ``` #### 2.3 Runtime Hints File (Optional Enhancement) Allow a `.attune-runtime.yaml` file in the container that declares runtime capabilities and custom configuration. This handles cases where auto-detection isn't sufficient (e.g., custom interpreters, non-standard paths, special environment setup). ```yaml # /opt/attune/.attune-runtime.yaml (or /.attune-runtime.yaml) runtimes: - name: ruby interpreter: /usr/local/bin/ruby file_extension: .rb version_command: "ruby --version" env_setup: create_command: "mkdir -p {env_dir}" install_command: "cd {env_dir} && bundle install --gemfile {pack_dir}/Gemfile" - name: custom-ml interpreter: /opt/conda/bin/python file_extension: .py version_command: "/opt/conda/bin/python --version" ``` The agent checks for this file at startup and merges it with auto-detected runtimes (hints file takes precedence for conflicting runtime names). **This is a nice-to-have for Phase 2 — implement only if auto-detection proves insufficient for common use cases.** --- ### Phase 3: Refactor Worker for Code Reuse **Goal**: Ensure the execution engine is cleanly reusable between the full `attune-worker` and the `attune-agent` binary, without code duplication. **Effort**: 2–3 days **Dependencies**: Phase 1 (agent entrypoint exists), can be done in parallel with Phase 2 #### 3.1 Identify Shared vs. Agent-Specific Code Current worker crate modules and their reuse status: | Module | File(s) | Shared? | Notes | |--------|---------|---------|-------| | `ActionExecutor` | `executor.rs` | ✅ Fully shared | Core execution orchestration | | `ProcessRuntime` | `runtime/process.rs` | ✅ Fully shared | Subprocess spawning, interpreter resolution | | `process_executor` | `runtime/process_executor.rs` | ✅ Fully shared | Streaming output capture, timeout, cancellation | | `NativeRuntime` | `runtime/native.rs` | ✅ Fully shared | Direct binary execution | | `LocalRuntime` | `runtime/local.rs` | ✅ Fully shared | Fallback runtime facade | | `RuntimeRegistry` | `runtime/mod.rs` | ✅ Fully shared | Runtime selection and registration | | `ExecutionContext` | `runtime/mod.rs` | ✅ Fully shared | Execution parameters, env vars, secrets | | `BoundedLogWriter` | `runtime/log_writer.rs` | ✅ Fully shared | Streaming log capture with size limits | | `parameter_passing` | `runtime/parameter_passing.rs` | ✅ Fully shared | Stdin/file/env parameter delivery | | `SecretManager` | `secrets.rs` | ✅ Fully shared | Secret decryption via `attune_common::crypto` | | `ArtifactManager` | `artifacts.rs` | ✅ Fully shared | Artifact finalization (file stat, size update) | | `HeartbeatManager` | `heartbeat.rs` | ✅ Fully shared | Periodic DB heartbeat | | `WorkerRegistration` | `registration.rs` | ✅ Shared, extended | Needs auto-detection integration | | `env_setup` | `env_setup.rs` | ✅ Shared, lazy mode | Agent uses lazy setup instead of proactive | | `version_verify` | `version_verify.rs` | ✅ Shared, on-demand mode | Agent verifies on-demand instead of full sweep | | `WorkerService` | `service.rs` | ⚠️ Needs refactoring | Extract reusable `AgentService` or parameterize | **Conclusion**: Almost everything is already reusable. The main work is in `service.rs`, which needs to be parameterized for the two startup modes (proactive vs. lazy). #### 3.2 Refactor `WorkerService` for Dual Modes Instead of duplicating `WorkerService`, add a configuration enum: ```rust // In service.rs or a new config module /// Controls how the worker initializes its runtime environment. pub enum StartupMode { /// Full worker mode: proactive environment setup, full version /// verification sweep at startup. Used by `attune-worker`. Worker, /// Agent mode: lazy environment setup (on first use), on-demand /// version verification, auto-detected runtimes. Used by `attune-agent`. Agent { /// Runtimes detected by the auto-detection module. detected_runtimes: Vec, }, } ``` The `WorkerService::start()` method checks this mode: ```rust match &self.startup_mode { StartupMode::Worker => { // Existing behavior: full version verification sweep self.verify_all_runtime_versions().await?; // Existing behavior: proactive environment setup for all packs self.setup_all_environments().await?; } StartupMode::Agent { .. } => { // Skip proactive setup — will happen lazily on first execution info!("Agent mode: deferring environment setup to first execution"); } } ``` #### 3.3 Lazy Environment Setup In agent mode, the first execution for a given pack+runtime combination triggers environment setup: ```rust // In executor.rs, within execute_with_cancel() // Before executing, ensure the runtime environment exists if !env_dir.exists() { info!("Creating runtime environment on first use: {}", env_dir.display()); self.env_setup.setup_environment(&pack_ref, &runtime_name, &env_dir).await?; } ``` The current worker already handles this partially — the `ProcessRuntime::execute()` method has auto-repair logic for broken venvs. The lazy setup extends this to handle the case where the env directory doesn't exist at all. --- ### Phase 4: Docker Compose Integration **Goal**: Make it trivial to add agent-based workers to `docker-compose.yaml`. **Effort**: 1 day **Dependencies**: Phase 1 (agent binary and Dockerfile exist) #### 4.1 Init Service for Agent Volume Add to `docker-compose.yaml`: ```yaml services: # Populates the agent binary volume (runs once) init-agent: build: context: . dockerfile: docker/Dockerfile.agent volumes: - agent_bin:/opt/attune/agent entrypoint: ["/bin/sh", "-c", "cp /attune-agent /opt/attune/agent/attune-agent && chmod +x /opt/attune/agent/attune-agent"] restart: "no" networks: - attune volumes: agent_bin: # Named volume holding the static agent binary ``` Note: The init-agent service needs a minimal base with `/bin/sh` for the `cp` command. Since the agent Dockerfile's final stage is `FROM scratch`, the init service should use the builder stage or a separate `FROM alpine` stage. **Revised Dockerfile.agent approach** — use Alpine for the init image so it has a shell: ```dockerfile # Stage 1: Build FROM rust:1.83-bookworm AS builder # ... (build steps from Phase 1.4) # Stage 2: Init image (has a shell for cp) FROM alpine:3.20 AS agent-init COPY --from=builder /attune-agent /attune-agent # Default command copies the binary into the mounted volume CMD ["cp", "/attune-agent", "/opt/attune/agent/attune-agent"] # Stage 3: Bare binary (for HTTP download or direct use) FROM scratch AS agent-binary COPY --from=builder /attune-agent /attune-agent ``` #### 4.2 Agent-Based Worker Services Example services that can be added to `docker-compose.yaml` or a user's `docker-compose.override.yaml`: ```yaml # Ruby worker — uses the official Ruby image worker-ruby: image: ruby:3.3-slim depends_on: init-agent: condition: service_completed_successfully postgres: condition: service_healthy rabbitmq: condition: service_healthy entrypoint: ["/opt/attune/agent/attune-agent"] volumes: - agent_bin:/opt/attune/agent:ro - packs_data:/opt/attune/packs:ro - runtime_envs:/opt/attune/runtime_envs - artifacts_data:/opt/attune/artifacts - ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro environment: ATTUNE_WORKER_NAME: worker-ruby-1 # ATTUNE_WORKER_RUNTIMES omitted — auto-detected as ruby,shell networks: - attune restart: unless-stopped stop_grace_period: 45s # R worker — uses the official R base image worker-r: image: r-base:4.4.0 depends_on: init-agent: condition: service_completed_successfully postgres: condition: service_healthy rabbitmq: condition: service_healthy entrypoint: ["/opt/attune/agent/attune-agent"] volumes: - agent_bin:/opt/attune/agent:ro - packs_data:/opt/attune/packs:ro - runtime_envs:/opt/attune/runtime_envs - artifacts_data:/opt/attune/artifacts - ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro environment: ATTUNE_WORKER_NAME: worker-r-1 networks: - attune restart: unless-stopped # GPU worker — NVIDIA CUDA image with Python worker-gpu: image: nvidia/cuda:12.3.1-runtime-ubuntu22.04 depends_on: init-agent: condition: service_completed_successfully postgres: condition: service_healthy rabbitmq: condition: service_healthy entrypoint: ["/opt/attune/agent/attune-agent"] runtime: nvidia volumes: - agent_bin:/opt/attune/agent:ro - packs_data:/opt/attune/packs:ro - runtime_envs:/opt/attune/runtime_envs - artifacts_data:/opt/attune/artifacts - ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro environment: ATTUNE_WORKER_NAME: worker-gpu-1 ATTUNE_WORKER_RUNTIMES: python,shell # Manual override (image has python pre-installed) networks: - attune restart: unless-stopped ``` #### 4.3 User Experience Summary Adding a new runtime to an Attune deployment becomes a ~12 line addition to `docker-compose.override.yaml`: ```yaml services: worker-my-runtime: image: my-org/my-custom-image:latest depends_on: init-agent: condition: service_completed_successfully postgres: condition: service_healthy rabbitmq: condition: service_healthy entrypoint: ["/opt/attune/agent/attune-agent"] volumes: - agent_bin:/opt/attune/agent:ro - packs_data:/opt/attune/packs:ro - runtime_envs:/opt/attune/runtime_envs - artifacts_data:/opt/attune/artifacts - ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro networks: - attune ``` No Dockerfiles. No rebuilds. No waiting for Rust compilation. Start to finish in seconds. --- ### Phase 5: API Binary Download Endpoint **Goal**: Support deployments where shared Docker volumes are impractical (Kubernetes, ECS, remote Docker hosts). **Effort**: 1 day **Dependencies**: Phase 1 (agent binary exists) #### 5.1 New API Route Add to `crates/api/src/routes/`: ``` GET /api/v1/agent/binary GET /api/v1/agent/binary?arch=x86_64 (default) GET /api/v1/agent/binary?arch=aarch64 Response: application/octet-stream Headers: Content-Disposition: attachment; filename="attune-agent" ``` The API serves the binary from a configurable filesystem path (e.g., `/opt/attune/agent/attune-agent`). The binary can be placed there at build time (baked into the API image) or mounted via volume. **Configuration** (`config.yaml`): ```yaml agent: binary_dir: /opt/attune/agent # Directory containing agent binaries # Files expected: attune-agent-x86_64, attune-agent-aarch64 ``` **OpenAPI documentation** via `utoipa`: ```rust #[utoipa::path( get, path = "/api/v1/agent/binary", params(("arch" = Option, Query, description = "Target architecture (x86_64, aarch64)")), responses( (status = 200, description = "Agent binary", content_type = "application/octet-stream"), (status = 404, description = "Binary not found for requested architecture"), ), tag = "agent" )] ``` **Authentication**: This endpoint should be **unauthenticated** or use a simple shared token, since the agent needs to download the binary before it can authenticate. Alternatively, require basic auth or a bootstrap token passed via environment variable. #### 5.2 Bootstrap Wrapper Script Provide `scripts/attune-agent-wrapper.sh` for use as a container entrypoint: ```bash #!/bin/sh # attune-agent-wrapper.sh — Bootstrap the Attune agent in any container set -e AGENT_DIR="${ATTUNE_AGENT_DIR:-/opt/attune/agent}" AGENT_BIN="$AGENT_DIR/attune-agent" AGENT_URL="${ATTUNE_AGENT_URL:-http://attune-api:8080/api/v1/agent/binary}" # Use volume-mounted binary if available, otherwise download if [ ! -x "$AGENT_BIN" ]; then echo "[attune] Agent binary not found at $AGENT_BIN, downloading from $AGENT_URL..." mkdir -p "$AGENT_DIR" if command -v wget >/dev/null 2>&1; then wget -q -O "$AGENT_BIN" "$AGENT_URL" elif command -v curl >/dev/null 2>&1; then curl -sL "$AGENT_URL" -o "$AGENT_BIN" else echo "[attune] ERROR: Neither wget nor curl available. Cannot download agent." >&2 exit 1 fi chmod +x "$AGENT_BIN" echo "[attune] Agent binary downloaded successfully." fi echo "[attune] Starting agent..." exec "$AGENT_BIN" "$@" ``` Usage: ```yaml # In docker-compose or K8s — when volume mount isn't available worker-remote: image: python:3.12-slim entrypoint: ["/opt/attune/scripts/attune-agent-wrapper.sh"] volumes: - ./scripts/attune-agent-wrapper.sh:/opt/attune/scripts/attune-agent-wrapper.sh:ro environment: ATTUNE_AGENT_URL: http://attune-api:8080/api/v1/agent/binary ``` --- ### Phase 6: Database & Runtime Registry Extensions **Goal**: Support arbitrary runtimes without requiring every possible runtime to be pre-registered in the DB. **Effort**: 1–2 days **Dependencies**: Phase 2 (auto-detection working) #### 6.1 Extended Runtime Detection Metadata Add a migration to support auto-detected runtimes: ```sql -- Migration: NNNNNN_agent_runtime_detection.sql -- Track whether a runtime was auto-registered by an agent ALTER TABLE runtime ADD COLUMN IF NOT EXISTS auto_detected BOOLEAN NOT NULL DEFAULT FALSE; -- Store detection configuration for auto-discovered runtimes -- Example: { "binaries": ["ruby", "ruby3.2"], "version_command": "--version", -- "version_regex": "ruby (\\d+\\.\\d+\\.\\d+)" } ALTER TABLE runtime ADD COLUMN IF NOT EXISTS detection_config JSONB NOT NULL DEFAULT '{}'; ``` #### 6.2 Runtime Template Packs Ship pre-configured runtime definitions for common languages in the `core` pack (or a new `runtimes` pack). These are registered during pack loading and provide the `execution_config` that auto-detected interpreters need. Add runtime YAML files for new languages: ``` packs/core/runtimes/ruby.yaml packs/core/runtimes/go.yaml packs/core/runtimes/java.yaml packs/core/runtimes/perl.yaml packs/core/runtimes/r.yaml ``` Example `ruby.yaml`: ```yaml ref: core.ruby name: Ruby label: Ruby Runtime description: Execute Ruby scripts execution_config: interpreter: binary: ruby file_extension: .rb env_vars: GEM_HOME: "{env_dir}/gems" GEM_PATH: "{env_dir}/gems" BUNDLE_PATH: "{env_dir}/gems" environment: create_command: "mkdir -p {env_dir}/gems" install_command: "cd {pack_dir} && GEM_HOME={env_dir}/gems bundle install --quiet 2>/dev/null || true" dependency_file: Gemfile ``` #### 6.3 Dynamic Runtime Registration When the agent detects an interpreter that matches a runtime template (by name/alias) but the runtime doesn't exist in the DB yet, the agent can auto-register it: 1. Look up the runtime by name in the DB using alias-aware matching 2. If found → use it (existing behavior) 3. If not found → check if a runtime template exists in loaded packs 4. If template found → register the runtime using the template's `execution_config` 5. If no template → register a minimal runtime with just the detected interpreter binary path 6. Mark auto-registered runtimes with `auto_detected = true` This ensures the agent can work with new runtimes immediately, even if the runtime hasn't been explicitly configured. --- ### Phase 7: Kubernetes Support ✅ **Status**: Complete **Goal**: Provide Kubernetes manifests and Helm chart support for agent-based workers. **Effort**: 1–2 days **Dependencies**: Phase 4 (Docker Compose working), Phase 5 (binary download) **Implemented**: - Helm chart `agent-workers.yaml` template — creates a Deployment per `agentWorkers[]` entry - InitContainer pattern (`agent-loader`) copies the statically-linked binary via `emptyDir` volume - Full scheduling support: `nodeSelector`, `tolerations`, `runtimeClassName` (GPU/nvidia) - Runtime auto-detect by default; explicit `runtimes` list override - Custom env vars, resource limits, log level, termination grace period - `images.agent` added to `values.yaml` for registry-aware image resolution - `attune-agent` image added to the Gitea Actions publish workflow (`agent-init` target) - `NOTES.txt` updated to list enabled agent workers on install - Quick-reference docs at `docs/QUICKREF-kubernetes-agent-workers.md` #### 7.1 InitContainer Pattern The agent maps naturally to Kubernetes using the same Tekton/Argo pattern: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: attune-worker-ruby spec: replicas: 2 selector: matchLabels: app: attune-worker-ruby template: metadata: labels: app: attune-worker-ruby spec: initContainers: - name: agent-loader image: attune/agent:latest # Built from Dockerfile.agent, agent-init target command: ["cp", "/attune-agent", "/opt/attune/agent/attune-agent"] volumeMounts: - name: agent-bin mountPath: /opt/attune/agent containers: - name: worker image: ruby:3.3 command: ["/opt/attune/agent/attune-agent"] env: - name: ATTUNE__DATABASE__URL valueFrom: secretKeyRef: name: attune-secrets key: database-url - name: ATTUNE__MESSAGE_QUEUE__URL valueFrom: secretKeyRef: name: attune-secrets key: mq-url volumeMounts: - name: agent-bin mountPath: /opt/attune/agent readOnly: true - name: packs mountPath: /opt/attune/packs readOnly: true - name: runtime-envs mountPath: /opt/attune/runtime_envs - name: artifacts mountPath: /opt/attune/artifacts volumes: - name: agent-bin emptyDir: {} - name: packs persistentVolumeClaim: claimName: attune-packs - name: runtime-envs persistentVolumeClaim: claimName: attune-runtime-envs - name: artifacts persistentVolumeClaim: claimName: attune-artifacts ``` #### 7.2 Helm Chart Values ```yaml # values.yaml (future Helm chart) workers: - name: ruby image: ruby:3.3 replicas: 2 runtimes: [] # auto-detect - name: python-gpu image: nvidia/cuda:12.3.1-runtime-ubuntu22.04 replicas: 1 runtimes: [python, shell] resources: limits: nvidia.com/gpu: 1 ``` --- ## Implementation Order & Effort Summary | Phase | Description | Effort | Dependencies | Priority | |-------|------------|--------|-------------|----------| | **Phase 1** | Static binary build infrastructure | 3–5 days | None | Critical | | **Phase 3** | Refactor worker for code reuse | 2–3 days | Phase 1 | Critical | | **Phase 2** | Runtime auto-detection | 1–2 days | Phase 1 | High | | **Phase 4** | Docker Compose integration | 1 day | Phase 1 | High | | **Phase 6** | DB runtime registry extensions | 1–2 days | Phase 2 | Medium | | **Phase 5** | API binary download endpoint | 1 day | Phase 1 | Medium | | **Phase 7** ✅ | Kubernetes manifests | 1–2 days | Phase 4, 5 | Complete | **Total estimated effort: 10–16 days** Phases 2 and 3 can be done in parallel. Phase 4 can start as soon as Phase 1 produces a working binary. **Minimum viable feature**: Phases 1 + 3 + 4 (~6–9 days) produce a working agent that can be injected into any container via Docker Compose, with manual `ATTUNE_WORKER_RUNTIMES` configuration. Auto-detection (Phase 2) and dynamic registration (Phase 6) add polish. ## Risks & Mitigations ### musl + Crate Compatibility **Risk**: Some crates may not compile cleanly with `x86_64-unknown-linux-musl` due to C library dependencies. **Impact**: Build failures or runtime issues. **Mitigation**: - SQLx already uses `rustls` (no OpenSSL dependency) ✅ - Switch `reqwest` and `tokio-tungstenite` to `rustls` features (Phase 1.1) - `lapin` uses pure Rust AMQP — no C dependencies ✅ - Test the musl build early in Phase 1 to surface issues quickly - If a specific crate is problematic, evaluate alternatives or use `cross` for cross-compilation ### DNS Resolution with musl **Risk**: musl's DNS resolver behaves differently from glibc (no `/etc/nsswitch.conf`, limited mDNS support). This can cause DNS resolution failures in Docker networks. **Impact**: Agent can't resolve `postgres`, `rabbitmq`, etc. by Docker service name. **Mitigation**: - Use `trust-dns` (now `hickory-dns`) resolver feature in SQLx and reqwest instead of the system resolver - Test DNS resolution in Docker Compose early - If issues arise, document the workaround: use IP addresses or add `dns` configuration to the container ### Binary Size **Risk**: A full statically-linked binary with all worker deps could be 40MB+. **Impact**: Slow volume population, slow download via API. **Mitigation**: - Strip debug symbols (`strip` command) — typically reduces by 50–70% - Use `opt-level = 'z'` and `lto = true` in release profile - Consider `upx` compression (trades CPU at startup for smaller binary) - Feature-gate unused functionality if size is excessive - Target: <25MB stripped ### Non-root User Conflicts **Risk**: Different base images run as different UIDs. The agent needs write access to `runtime_envs` and `artifacts` volumes. **Impact**: Permission denied errors when the container UID doesn't match the volume owner. **Mitigation**: - Document the UID requirement (current standard: UID 1000) - Provide guidance for running the agent as root with privilege drop - Consider adding a `--user` flag to the agent that drops privileges after setup - For Kubernetes, use `securityContext.runAsUser` in the Pod spec ### Existing Workers Must Keep Working **Risk**: Refactoring `WorkerService` (Phase 3) could introduce regressions in existing workers. **Impact**: Production workers break. **Mitigation**: - The refactoring is additive — existing code paths don't change behavior - Run the full test suite after Phase 3 - Both `attune-worker` and `attune-agent` share the same test infrastructure - The `StartupMode::Worker` path is the existing code path with no behavioral changes ### Volume Mount Ordering **Risk**: The agent container starts before the `init-agent` service has populated the volume. **Impact**: Agent binary not found, container crashes. **Mitigation**: - Use `depends_on: { init-agent: { condition: service_completed_successfully } }` in Docker Compose - The wrapper script (Phase 5.2) retries with a short sleep - For Kubernetes, the initContainer pattern guarantees ordering ## Testing Strategy ### Unit Tests - Auto-detection module: mock filesystem and process execution to test interpreter discovery - `StartupMode::Agent` code paths: ensure lazy setup and on-demand verification work correctly - All existing worker tests continue to pass (regression safety net) ### Integration Tests - Build the agent binary with musl and run it in various container images: - `ruby:3.3-slim` (Ruby + shell) - `python:3.12-slim` (Python + shell) - `node:20-slim` (Node.js + shell) - `alpine:3.20` (shell only) - `ubuntu:24.04` (shell only) - `debian:bookworm-slim` (shell only, matches current worker) - Verify: agent starts, auto-detects runtimes, registers with correct capabilities, executes a simple action, reports results - Verify: DNS resolution works for Docker service names ### Docker Compose Tests - Spin up the full stack with agent-based workers alongside traditional workers - Execute actions that target specific runtimes - Verify the scheduler routes to the correct worker based on capabilities - Verify graceful shutdown (SIGTERM handling) ### Binary Compatibility Tests - Test the musl binary on: Alpine, Debian, Ubuntu, CentOS/Rocky, Amazon Linux - Test on both x86_64 and aarch64 (if multi-arch build is implemented) - Verify no glibc dependency: `ldd attune-agent` should report "not a dynamic executable" ## Future Enhancements These are not part of the initial implementation but are natural extensions: 1. **Per-execution container isolation**: Instead of a long-running agent, spawn a fresh container per execution with the agent injected. Provides maximum isolation (each action runs in a clean environment) at the cost of startup latency. 2. **Container image selection in action YAML**: Allow actions to declare `container: ruby:3.3` in their YAML, and have the executor spin up an appropriate container with the agent injected. Similar to GitHub Actions' container actions. 3. **Warm pool**: Pre-start a pool of agent containers for common runtimes to reduce first-execution latency. 4. **Agent self-update**: The agent periodically checks for a newer version of itself (via the API endpoint) and restarts if updated. 5. **Windows support**: Cross-compile the agent for Windows (MSVC static linking) to support Windows containers. 6. **WebAssembly runtime**: Compile actions to WASM and execute them inside the agent using wasmtime, eliminating the need for interpreter binaries entirely. ## References - Tekton Entrypoint: https://github.com/tektoncd/pipeline/tree/main/cmd/entrypoint - Argo Emissary Executor: https://argoproj.github.io/argo-workflows/workflow-executors/ - GitLab Runner Docker Executor: https://docs.gitlab.com/runner/executors/docker.html - Current worker containerization: `docs/worker-containerization.md` - Current runtime detection: `crates/common/src/runtime_detection.rs` - Worker service: `crates/worker/src/service.rs` - Process executor: `crates/worker/src/runtime/process_executor.rs` - Worker Dockerfile: `docker/Dockerfile.worker.optimized`