43 KiB
Universal Worker Agent Injection
Overview
This plan describes a new deployment model for Attune workers: a statically-linked agent binary that can be injected into any Docker container at runtime, turning arbitrary images into Attune workers. This eliminates the need to build custom worker Docker images for each runtime environment.
Problem
Today, every Attune worker is a purpose-built Docker image: the same attune-worker Rust binary baked into Debian images with specific interpreters installed (see docker/Dockerfile.worker.optimized). Adding a new runtime (e.g., Ruby, Go, Java, R) means:
- Modifying
Dockerfile.worker.optimizedto add a new build stage - Installing the interpreter via apt or a package repository
- Managing the combinatorial explosion of worker variants
- Rebuilding images (~5 min) for every change
- Standardizing on
debian:bookworm-slimas the base (not the runtime's official image)
Solution
Flip the model: any Docker image becomes an Attune worker by injecting a lightweight agent binary at container startup. The agent binary is a statically-linked (musl) Rust executable that connects to MQ/DB, consumes execution messages, spawns subprocesses, and reports results — functionally identical to the current worker, but packaged for universal deployment.
Want Ruby support? Point at ruby:3.3 and go. Need a GPU runtime? Use nvidia/cuda:12.3-runtime. Need a specific Python version with scientific libraries pre-installed? Use any image that has them.
Industry Precedent
This pattern is battle-tested in major CI/CD and workflow systems:
| System | Pattern | How It Works |
|---|---|---|
| Tekton | InitContainer + shared volume | Copies a static Go entrypoint binary into an emptyDir; overrides the user container's entrypoint to use it. Steps coordinate via file-based signaling. |
| Argo Workflows (Emissary) | InitContainer + sidecar | The emissary binary runs as both an init container and a sidecar. Disk-based coordination, no Docker socket, no privileged access. |
| GitLab CI Runner (Step Runner) | Binary injection | Newer "Native Step Runner" mode injects a step-runner binary into the build container and adjusts $PATH. Communicates via gRPC. |
| Istio | Mutating webhook | Kubernetes admission controller adds init + sidecar containers transparently. |
The Tekton/Argo pattern (static binary + shared volume) is the best fit for Attune because:
- It works with Docker Compose (not K8s-only) via bind mounts / named volumes
- It requires zero dependencies in the user image (just a Linux kernel)
- A static Rust binary (musl-linked) is ~15–25MB and runs anywhere
- No privileged access, no Docker socket needed inside the container
Compatibility
This plan is purely additive. Nothing changes for existing workers:
Dockerfile.worker.optimizedand its four targets remain unchanged and functional- Current
docker-compose.yamlworker services keep working - All MQ protocols, DB schemas, and execution flows remain identical
- The agent is just another way to run the same execution engine
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ Attune Control Plane │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ API │ │ Executor │ │ RabbitMQ │ │ Postgres │ │
│ └────┬─────┘ └────┬─────┘ └─────┬────┘ └────┬─────┘ │
│ │ │ │ │ │
└───────┼───────────────┼───────────────┼───────────────┼────────┘
│ │ │ │
┌─────┼───────────────┼───────────────┼───────────────┼──────┐
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ attune-agent (injected binary) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐ │ │
│ │ │MQ Client │ │DB Client │ │ Process │ │Artifact│ │ │
│ │ │(lapin) │ │(sqlx) │ │Executor │ │Manager │ │ │
│ │ └──────────┘ └──────────┘ └─────────┘ └────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ User Container (ANY Docker image) │ │
│ │ ruby:3.3, python:3.12, nvidia/cuda, alpine, ... │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Shared Volumes: │
│ /opt/attune/agent/ (agent binary, read-only) │
│ /opt/attune/packs/ (pack files, read-only) │
│ /opt/attune/runtime_envs/(virtualenvs, node_modules) │
│ /opt/attune/artifacts/ (artifact files) │
└────────────────────────────────────────────────────────────┘
Agent vs. Full Worker Comparison
The agent binary is functionally identical to the current attune-worker. The difference is packaging and startup behavior:
| Capability | Full Worker (attune-worker) |
Agent (attune-agent) |
|---|---|---|
| MQ consumption | ✅ | ✅ |
| DB access | ✅ | ✅ |
| Process execution | ✅ | ✅ |
| Artifact management | ✅ | ✅ |
| Secret management | ✅ | ✅ |
| Cancellation / timeout | ✅ | ✅ |
| Heartbeat | ✅ | ✅ |
| Runtime env setup (venvs) | ✅ Proactive at startup | ✅ Lazy on first use |
| Version verification | ✅ Full sweep at startup | ✅ On-demand per-execution |
| Runtime discovery | Manual (ATTUNE_WORKER_RUNTIMES) |
Auto-detect + optional manual override |
| Linking | Dynamic (glibc) | Static (musl) |
| Base image requirement | debian:bookworm-slim |
None (any Linux container) |
| Binary size | ~30–50MB | ~15–25MB (stripped, musl) |
Binary Distribution Methods
Two methods for getting the agent binary into a container:
Method A: Shared Volume (Docker Compose — recommended)
An init container copies the agent binary into a Docker named volume. User containers mount this volume read-only and use the binary as their entrypoint.
Method B: HTTP Download (remote / cloud deployments)
A new API endpoint (GET /api/v1/agent/binary) serves the static binary. A small wrapper script in the container downloads it on first run. Useful for Kubernetes, ECS, or remote Docker hosts where shared volumes are impractical.
Implementation Phases
Phase 1: Static Binary Build Infrastructure
Goal: Produce a statically-linked attune-agent binary that runs in any Linux container.
Effort: 3–5 days
Dependencies: None
1.1 TLS Backend Audit and Alignment
The agent must link statically with musl. This requires all TLS to use rustls (pure Rust) instead of OpenSSL/native-tls.
Current state (from Cargo.toml workspace dependencies):
sqlx: Already usesruntime-tokio-rustls✅reqwest: Uses default features (native-tls) — needsrustls-tlsfeature ❌tokio-tungstenite: Usesnative-tlsfeature — needsrustls❌lapin(v4.3): Uses native-tls by default — needsrustlsfeature ❌
Changes needed in workspace Cargo.toml:
# Change reqwest to use rustls
reqwest = { version = "0.13", features = ["json", "rustls-tls"], default-features = false }
# Change tokio-tungstenite to use rustls
tokio-tungstenite = { version = "0.28", features = ["rustls"] }
# Check lapin's TLS features — if using amqps://, need rustls support.
# For plain amqp:// (typical in Docker Compose), no TLS needed.
# For production amqps://, evaluate lapin's rustls support or use a TLS-terminating proxy.
Important: These changes affect the entire workspace. Test all services (api, executor, worker, notifier, sensor, cli) after switching TLS backends. If switching workspace-wide is too disruptive, use feature flags to conditionally select the TLS backend for the agent build only.
Alternative: If workspace-wide rustls migration is too risky, the agent crate can override specific dependencies:
[dependencies]
reqwest = { workspace = true, default-features = false, features = ["json", "rustls-tls"] }
1.2 New Crate or New Binary Target
Option A (recommended): New binary target in the worker crate
Add a second binary target to crates/worker/Cargo.toml:
[[bin]]
name = "attune-worker"
path = "src/main.rs"
[[bin]]
name = "attune-agent"
path = "src/agent_main.rs"
This reuses all existing code — ActionExecutor, ProcessRuntime, WorkerService, RuntimeRegistry, SecretManager, ArtifactManager, etc. The agent entrypoint is a thin wrapper with different startup behavior (auto-detection instead of manual config).
Pros: Zero code duplication. Same test suite covers both binaries. Cons: The agent binary includes unused code paths (e.g., full worker service setup).
Option B: New crate crates/agent/
crates/agent/
├── Cargo.toml # Depends on attune-common + selected worker modules
├── src/
│ ├── main.rs # Entry point
│ ├── agent.rs # Core agent loop
│ ├── detect.rs # Runtime auto-detection
│ └── health.rs # Health check (file-based or tiny HTTP)
Pros: Cleaner separation, can minimize binary size by excluding unused deps. Cons: Requires extracting shared execution code into a library or duplicating it.
Recommendation: Start with Option A (new binary target in worker crate) for speed. Refactor into a separate crate later if binary size becomes a concern.
1.3 Agent Entrypoint (src/agent_main.rs)
The agent entrypoint differs from main.rs in:
- Runtime auto-detection instead of relying on
ATTUNE_WORKER_RUNTIMES - Lazy environment setup instead of proactive startup sweep
- Simplified config loading — env vars are the primary config source (no config file required, but supported if mounted)
- Container-aware defaults — sensible defaults for paths, timeouts, concurrency
src/agent_main.rs responsibilities:
1. Parse CLI args / env vars for DB URL, MQ URL, worker name
2. Run runtime auto-detection (Phase 2) to discover available interpreters
3. Initialize WorkerService with detected capabilities
4. Start the normal execution consumer loop
5. Handle SIGTERM/SIGINT for graceful shutdown
1.4 Dockerfile for Agent Binary
Create docker/Dockerfile.agent:
# Stage 1: Build the statically-linked agent binary
FROM rust:1.83-bookworm AS builder
RUN apt-get update && apt-get install -y musl-tools
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /build
ENV RUST_MIN_STACK=67108864
COPY Cargo.toml Cargo.lock ./
COPY crates/ ./crates/
COPY migrations/ ./migrations/
COPY .sqlx/ ./.sqlx/
# Build only the agent binary, statically linked
RUN --mount=type=cache,target=/usr/local/cargo/registry,sharing=shared \
--mount=type=cache,target=/usr/local/cargo/git,sharing=shared \
--mount=type=cache,id=agent-target,target=/build/target \
SQLX_OFFLINE=true cargo build --release \
--target x86_64-unknown-linux-musl \
--bin attune-agent \
&& cp /build/target/x86_64-unknown-linux-musl/release/attune-agent /attune-agent \
&& strip /attune-agent
# Stage 2: Minimal image for volume population
FROM scratch AS agent-binary
COPY --from=builder /attune-agent /attune-agent
Multi-architecture support: For ARM64 (Apple Silicon, Graviton), add a parallel build stage targeting aarch64-unknown-linux-musl. Use Docker buildx multi-platform builds or separate images.
1.5 Makefile Targets
Add to Makefile:
build-agent:
SQLX_OFFLINE=true cargo build --release --target x86_64-unknown-linux-musl --bin attune-agent
strip target/x86_64-unknown-linux-musl/release/attune-agent
docker-build-agent:
docker buildx build -f docker/Dockerfile.agent -t attune-agent:latest .
Phase 2: Runtime Auto-Detection
Goal: The agent automatically discovers what interpreters are available in the container, without requiring ATTUNE_WORKER_RUNTIMES to be set.
Effort: 1–2 days
Dependencies: Phase 1 (agent binary exists)
2.1 Interpreter Discovery Module
Create a new module (in crates/worker/src/ or crates/common/src/) that probes the container's filesystem for known interpreters:
src/runtime_detect.rs (or extend existing crates/common/src/runtime_detection.rs)
struct DetectedInterpreter {
runtime_name: String, // "python", "ruby", "node", etc.
binary_path: PathBuf, // "/usr/local/bin/python3"
version: Option<String>, // "3.12.1" (parsed from version command output)
}
/// Probe the container for available interpreters.
///
/// For each known runtime, checks common binary names via `which` or
/// direct path existence, then runs the version command to extract
/// the version string.
fn detect_interpreters() -> Vec<DetectedInterpreter> {
let probes = [
InterpreterProbe {
runtime_name: "python",
binaries: &["python3", "python"],
version_flag: "--version",
version_regex: r"Python (\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "node",
binaries: &["node", "nodejs"],
version_flag: "--version",
version_regex: r"v(\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "ruby",
binaries: &["ruby"],
version_flag: "--version",
version_regex: r"ruby (\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "go",
binaries: &["go"],
version_flag: "version",
version_regex: r"go(\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "java",
binaries: &["java"],
version_flag: "-version",
version_regex: r#""(\d+[\.\d+]*)""#,
},
InterpreterProbe {
runtime_name: "perl",
binaries: &["perl"],
version_flag: "--version",
version_regex: r"v(\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "r",
binaries: &["Rscript", "R"],
version_flag: "--version",
version_regex: r"R.*version (\d+\.\d+\.\d+)",
},
InterpreterProbe {
runtime_name: "shell",
binaries: &["bash", "sh"],
version_flag: "--version",
version_regex: r"(\d+\.\d+\.\d+)",
},
];
// For each probe:
// 1. Run `which <binary>` or check known paths
// 2. If found, run `<binary> <version_flag>` with a short timeout (2s)
// 3. Parse version from output using the regex
// 4. Return DetectedInterpreter with the results
}
Integration with existing code: The existing crates/common/src/runtime_detection.rs already has normalize_runtime_name() and alias groups. The auto-detection module should use these for matching detected interpreters against DB runtime records.
2.2 Integration with Worker Registration
The agent startup sequence:
- Run
detect_interpreters() - Match detected interpreters against known runtimes in the
runtimetable (using alias-aware matching fromruntime_detection.rs) - If
ATTUNE_WORKER_RUNTIMESis set, use it as an override (intersection or union — TBD, probably override wins) - Register the worker with the detected/configured capabilities
- Log what was detected for debugging:
[INFO] Detected runtimes: python 3.12.1 (/usr/local/bin/python3), ruby 3.3.0 (/usr/local/bin/ruby), shell 5.2.21 (/bin/bash) [INFO] Registering worker with capabilities: [python, ruby, shell]
2.3 Runtime Hints File (Optional Enhancement)
Allow a .attune-runtime.yaml file in the container that declares runtime capabilities and custom configuration. This handles cases where auto-detection isn't sufficient (e.g., custom interpreters, non-standard paths, special environment setup).
# /opt/attune/.attune-runtime.yaml (or /.attune-runtime.yaml)
runtimes:
- name: ruby
interpreter: /usr/local/bin/ruby
file_extension: .rb
version_command: "ruby --version"
env_setup:
create_command: "mkdir -p {env_dir}"
install_command: "cd {env_dir} && bundle install --gemfile {pack_dir}/Gemfile"
- name: custom-ml
interpreter: /opt/conda/bin/python
file_extension: .py
version_command: "/opt/conda/bin/python --version"
The agent checks for this file at startup and merges it with auto-detected runtimes (hints file takes precedence for conflicting runtime names).
This is a nice-to-have for Phase 2 — implement only if auto-detection proves insufficient for common use cases.
Phase 3: Refactor Worker for Code Reuse
Goal: Ensure the execution engine is cleanly reusable between the full attune-worker and the attune-agent binary, without code duplication.
Effort: 2–3 days
Dependencies: Phase 1 (agent entrypoint exists), can be done in parallel with Phase 2
3.1 Identify Shared vs. Agent-Specific Code
Current worker crate modules and their reuse status:
| Module | File(s) | Shared? | Notes |
|---|---|---|---|
ActionExecutor |
executor.rs |
✅ Fully shared | Core execution orchestration |
ProcessRuntime |
runtime/process.rs |
✅ Fully shared | Subprocess spawning, interpreter resolution |
process_executor |
runtime/process_executor.rs |
✅ Fully shared | Streaming output capture, timeout, cancellation |
NativeRuntime |
runtime/native.rs |
✅ Fully shared | Direct binary execution |
LocalRuntime |
runtime/local.rs |
✅ Fully shared | Fallback runtime facade |
RuntimeRegistry |
runtime/mod.rs |
✅ Fully shared | Runtime selection and registration |
ExecutionContext |
runtime/mod.rs |
✅ Fully shared | Execution parameters, env vars, secrets |
BoundedLogWriter |
runtime/log_writer.rs |
✅ Fully shared | Streaming log capture with size limits |
parameter_passing |
runtime/parameter_passing.rs |
✅ Fully shared | Stdin/file/env parameter delivery |
SecretManager |
secrets.rs |
✅ Fully shared | Secret decryption via attune_common::crypto |
ArtifactManager |
artifacts.rs |
✅ Fully shared | Artifact finalization (file stat, size update) |
HeartbeatManager |
heartbeat.rs |
✅ Fully shared | Periodic DB heartbeat |
WorkerRegistration |
registration.rs |
✅ Shared, extended | Needs auto-detection integration |
env_setup |
env_setup.rs |
✅ Shared, lazy mode | Agent uses lazy setup instead of proactive |
version_verify |
version_verify.rs |
✅ Shared, on-demand mode | Agent verifies on-demand instead of full sweep |
WorkerService |
service.rs |
⚠️ Needs refactoring | Extract reusable AgentService or parameterize |
Conclusion: Almost everything is already reusable. The main work is in service.rs, which needs to be parameterized for the two startup modes (proactive vs. lazy).
3.2 Refactor WorkerService for Dual Modes
Instead of duplicating WorkerService, add a configuration enum:
// In service.rs or a new config module
/// Controls how the worker initializes its runtime environment.
pub enum StartupMode {
/// Full worker mode: proactive environment setup, full version
/// verification sweep at startup. Used by `attune-worker`.
Worker,
/// Agent mode: lazy environment setup (on first use), on-demand
/// version verification, auto-detected runtimes. Used by `attune-agent`.
Agent {
/// Runtimes detected by the auto-detection module.
detected_runtimes: Vec<DetectedInterpreter>,
},
}
The WorkerService::start() method checks this mode:
match &self.startup_mode {
StartupMode::Worker => {
// Existing behavior: full version verification sweep
self.verify_all_runtime_versions().await?;
// Existing behavior: proactive environment setup for all packs
self.setup_all_environments().await?;
}
StartupMode::Agent { .. } => {
// Skip proactive setup — will happen lazily on first execution
info!("Agent mode: deferring environment setup to first execution");
}
}
3.3 Lazy Environment Setup
In agent mode, the first execution for a given pack+runtime combination triggers environment setup:
// In executor.rs, within execute_with_cancel()
// Before executing, ensure the runtime environment exists
if !env_dir.exists() {
info!("Creating runtime environment on first use: {}", env_dir.display());
self.env_setup.setup_environment(&pack_ref, &runtime_name, &env_dir).await?;
}
The current worker already handles this partially — the ProcessRuntime::execute() method has auto-repair logic for broken venvs. The lazy setup extends this to handle the case where the env directory doesn't exist at all.
Phase 4: Docker Compose Integration
Goal: Make it trivial to add agent-based workers to docker-compose.yaml.
Effort: 1 day
Dependencies: Phase 1 (agent binary and Dockerfile exist)
4.1 Init Service for Agent Volume
Add to docker-compose.yaml:
services:
# Populates the agent binary volume (runs once)
init-agent:
build:
context: .
dockerfile: docker/Dockerfile.agent
volumes:
- agent_bin:/opt/attune/agent
entrypoint: ["/bin/sh", "-c", "cp /attune-agent /opt/attune/agent/attune-agent && chmod +x /opt/attune/agent/attune-agent"]
restart: "no"
networks:
- attune
volumes:
agent_bin: # Named volume holding the static agent binary
Note: The init-agent service needs a minimal base with /bin/sh for the cp command. Since the agent Dockerfile's final stage is FROM scratch, the init service should use the builder stage or a separate FROM alpine stage.
Revised Dockerfile.agent approach — use Alpine for the init image so it has a shell:
# Stage 1: Build
FROM rust:1.83-bookworm AS builder
# ... (build steps from Phase 1.4)
# Stage 2: Init image (has a shell for cp)
FROM alpine:3.20 AS agent-init
COPY --from=builder /attune-agent /attune-agent
# Default command copies the binary into the mounted volume
CMD ["cp", "/attune-agent", "/opt/attune/agent/attune-agent"]
# Stage 3: Bare binary (for HTTP download or direct use)
FROM scratch AS agent-binary
COPY --from=builder /attune-agent /attune-agent
4.2 Agent-Based Worker Services
Example services that can be added to docker-compose.yaml or a user's docker-compose.override.yaml:
# Ruby worker — uses the official Ruby image
worker-ruby:
image: ruby:3.3-slim
depends_on:
init-agent:
condition: service_completed_successfully
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
entrypoint: ["/opt/attune/agent/attune-agent"]
volumes:
- agent_bin:/opt/attune/agent:ro
- packs_data:/opt/attune/packs:ro
- runtime_envs:/opt/attune/runtime_envs
- artifacts_data:/opt/attune/artifacts
- ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
environment:
ATTUNE_WORKER_NAME: worker-ruby-1
# ATTUNE_WORKER_RUNTIMES omitted — auto-detected as ruby,shell
networks:
- attune
restart: unless-stopped
stop_grace_period: 45s
# R worker — uses the official R base image
worker-r:
image: r-base:4.4.0
depends_on:
init-agent:
condition: service_completed_successfully
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
entrypoint: ["/opt/attune/agent/attune-agent"]
volumes:
- agent_bin:/opt/attune/agent:ro
- packs_data:/opt/attune/packs:ro
- runtime_envs:/opt/attune/runtime_envs
- artifacts_data:/opt/attune/artifacts
- ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
environment:
ATTUNE_WORKER_NAME: worker-r-1
networks:
- attune
restart: unless-stopped
# GPU worker — NVIDIA CUDA image with Python
worker-gpu:
image: nvidia/cuda:12.3.1-runtime-ubuntu22.04
depends_on:
init-agent:
condition: service_completed_successfully
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
entrypoint: ["/opt/attune/agent/attune-agent"]
runtime: nvidia
volumes:
- agent_bin:/opt/attune/agent:ro
- packs_data:/opt/attune/packs:ro
- runtime_envs:/opt/attune/runtime_envs
- artifacts_data:/opt/attune/artifacts
- ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
environment:
ATTUNE_WORKER_NAME: worker-gpu-1
ATTUNE_WORKER_RUNTIMES: python,shell # Manual override (image has python pre-installed)
networks:
- attune
restart: unless-stopped
4.3 User Experience Summary
Adding a new runtime to an Attune deployment becomes a ~12 line addition to docker-compose.override.yaml:
services:
worker-my-runtime:
image: my-org/my-custom-image:latest
depends_on:
init-agent:
condition: service_completed_successfully
postgres:
condition: service_healthy
rabbitmq:
condition: service_healthy
entrypoint: ["/opt/attune/agent/attune-agent"]
volumes:
- agent_bin:/opt/attune/agent:ro
- packs_data:/opt/attune/packs:ro
- runtime_envs:/opt/attune/runtime_envs
- artifacts_data:/opt/attune/artifacts
- ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
networks:
- attune
No Dockerfiles. No rebuilds. No waiting for Rust compilation. Start to finish in seconds.
Phase 5: API Binary Download Endpoint
Goal: Support deployments where shared Docker volumes are impractical (Kubernetes, ECS, remote Docker hosts).
Effort: 1 day
Dependencies: Phase 1 (agent binary exists)
5.1 New API Route
Add to crates/api/src/routes/:
GET /api/v1/agent/binary
GET /api/v1/agent/binary?arch=x86_64 (default)
GET /api/v1/agent/binary?arch=aarch64
Response: application/octet-stream
Headers: Content-Disposition: attachment; filename="attune-agent"
The API serves the binary from a configurable filesystem path (e.g., /opt/attune/agent/attune-agent). The binary can be placed there at build time (baked into the API image) or mounted via volume.
Configuration (config.yaml):
agent:
binary_dir: /opt/attune/agent # Directory containing agent binaries
# Files expected: attune-agent-x86_64, attune-agent-aarch64
OpenAPI documentation via utoipa:
#[utoipa::path(
get,
path = "/api/v1/agent/binary",
params(("arch" = Option<String>, Query, description = "Target architecture (x86_64, aarch64)")),
responses(
(status = 200, description = "Agent binary", content_type = "application/octet-stream"),
(status = 404, description = "Binary not found for requested architecture"),
),
tag = "agent"
)]
Authentication: This endpoint should be unauthenticated or use a simple shared token, since the agent needs to download the binary before it can authenticate. Alternatively, require basic auth or a bootstrap token passed via environment variable.
5.2 Bootstrap Wrapper Script
Provide scripts/attune-agent-wrapper.sh for use as a container entrypoint:
#!/bin/sh
# attune-agent-wrapper.sh — Bootstrap the Attune agent in any container
set -e
AGENT_DIR="${ATTUNE_AGENT_DIR:-/opt/attune/agent}"
AGENT_BIN="$AGENT_DIR/attune-agent"
AGENT_URL="${ATTUNE_AGENT_URL:-http://attune-api:8080/api/v1/agent/binary}"
# Use volume-mounted binary if available, otherwise download
if [ ! -x "$AGENT_BIN" ]; then
echo "[attune] Agent binary not found at $AGENT_BIN, downloading from $AGENT_URL..."
mkdir -p "$AGENT_DIR"
if command -v wget >/dev/null 2>&1; then
wget -q -O "$AGENT_BIN" "$AGENT_URL"
elif command -v curl >/dev/null 2>&1; then
curl -sL "$AGENT_URL" -o "$AGENT_BIN"
else
echo "[attune] ERROR: Neither wget nor curl available. Cannot download agent." >&2
exit 1
fi
chmod +x "$AGENT_BIN"
echo "[attune] Agent binary downloaded successfully."
fi
echo "[attune] Starting agent..."
exec "$AGENT_BIN" "$@"
Usage:
# In docker-compose or K8s — when volume mount isn't available
worker-remote:
image: python:3.12-slim
entrypoint: ["/opt/attune/scripts/attune-agent-wrapper.sh"]
volumes:
- ./scripts/attune-agent-wrapper.sh:/opt/attune/scripts/attune-agent-wrapper.sh:ro
environment:
ATTUNE_AGENT_URL: http://attune-api:8080/api/v1/agent/binary
Phase 6: Database & Runtime Registry Extensions
Goal: Support arbitrary runtimes without requiring every possible runtime to be pre-registered in the DB.
Effort: 1–2 days
Dependencies: Phase 2 (auto-detection working)
6.1 Extended Runtime Detection Metadata
Add a migration to support auto-detected runtimes:
-- Migration: NNNNNN_agent_runtime_detection.sql
-- Track whether a runtime was auto-registered by an agent
ALTER TABLE runtime ADD COLUMN IF NOT EXISTS auto_detected BOOLEAN NOT NULL DEFAULT FALSE;
-- Store detection configuration for auto-discovered runtimes
-- Example: { "binaries": ["ruby", "ruby3.2"], "version_command": "--version",
-- "version_regex": "ruby (\\d+\\.\\d+\\.\\d+)" }
ALTER TABLE runtime ADD COLUMN IF NOT EXISTS detection_config JSONB NOT NULL DEFAULT '{}';
6.2 Runtime Template Packs
Ship pre-configured runtime definitions for common languages in the core pack (or a new runtimes pack). These are registered during pack loading and provide the execution_config that auto-detected interpreters need.
Add runtime YAML files for new languages:
packs/core/runtimes/ruby.yaml
packs/core/runtimes/go.yaml
packs/core/runtimes/java.yaml
packs/core/runtimes/perl.yaml
packs/core/runtimes/r.yaml
Example ruby.yaml:
ref: core.ruby
name: Ruby
label: Ruby Runtime
description: Execute Ruby scripts
execution_config:
interpreter:
binary: ruby
file_extension: .rb
env_vars:
GEM_HOME: "{env_dir}/gems"
GEM_PATH: "{env_dir}/gems"
BUNDLE_PATH: "{env_dir}/gems"
environment:
create_command: "mkdir -p {env_dir}/gems"
install_command: "cd {pack_dir} && GEM_HOME={env_dir}/gems bundle install --quiet 2>/dev/null || true"
dependency_file: Gemfile
6.3 Dynamic Runtime Registration
When the agent detects an interpreter that matches a runtime template (by name/alias) but the runtime doesn't exist in the DB yet, the agent can auto-register it:
- Look up the runtime by name in the DB using alias-aware matching
- If found → use it (existing behavior)
- If not found → check if a runtime template exists in loaded packs
- If template found → register the runtime using the template's
execution_config - If no template → register a minimal runtime with just the detected interpreter binary path
- Mark auto-registered runtimes with
auto_detected = true
This ensures the agent can work with new runtimes immediately, even if the runtime hasn't been explicitly configured.
Phase 7: Kubernetes Support ✅
Status: Complete
Goal: Provide Kubernetes manifests and Helm chart support for agent-based workers.
Effort: 1–2 days
Dependencies: Phase 4 (Docker Compose working), Phase 5 (binary download)
Implemented:
- Helm chart
agent-workers.yamltemplate — creates a Deployment peragentWorkers[]entry - InitContainer pattern (
agent-loader) copies the statically-linked binary viaemptyDirvolume - Full scheduling support:
nodeSelector,tolerations,runtimeClassName(GPU/nvidia) - Runtime auto-detect by default; explicit
runtimeslist override - Custom env vars, resource limits, log level, termination grace period
images.agentadded tovalues.yamlfor registry-aware image resolutionattune-agentimage added to the Gitea Actions publish workflow (agent-inittarget)NOTES.txtupdated to list enabled agent workers on install- Quick-reference docs at
docs/QUICKREF-kubernetes-agent-workers.md
7.1 InitContainer Pattern
The agent maps naturally to Kubernetes using the same Tekton/Argo pattern:
apiVersion: apps/v1
kind: Deployment
metadata:
name: attune-worker-ruby
spec:
replicas: 2
selector:
matchLabels:
app: attune-worker-ruby
template:
metadata:
labels:
app: attune-worker-ruby
spec:
initContainers:
- name: agent-loader
image: attune/agent:latest # Built from Dockerfile.agent, agent-init target
command: ["cp", "/attune-agent", "/opt/attune/agent/attune-agent"]
volumeMounts:
- name: agent-bin
mountPath: /opt/attune/agent
containers:
- name: worker
image: ruby:3.3
command: ["/opt/attune/agent/attune-agent"]
env:
- name: ATTUNE__DATABASE__URL
valueFrom:
secretKeyRef:
name: attune-secrets
key: database-url
- name: ATTUNE__MESSAGE_QUEUE__URL
valueFrom:
secretKeyRef:
name: attune-secrets
key: mq-url
volumeMounts:
- name: agent-bin
mountPath: /opt/attune/agent
readOnly: true
- name: packs
mountPath: /opt/attune/packs
readOnly: true
- name: runtime-envs
mountPath: /opt/attune/runtime_envs
- name: artifacts
mountPath: /opt/attune/artifacts
volumes:
- name: agent-bin
emptyDir: {}
- name: packs
persistentVolumeClaim:
claimName: attune-packs
- name: runtime-envs
persistentVolumeClaim:
claimName: attune-runtime-envs
- name: artifacts
persistentVolumeClaim:
claimName: attune-artifacts
7.2 Helm Chart Values
# values.yaml (future Helm chart)
workers:
- name: ruby
image: ruby:3.3
replicas: 2
runtimes: [] # auto-detect
- name: python-gpu
image: nvidia/cuda:12.3.1-runtime-ubuntu22.04
replicas: 1
runtimes: [python, shell]
resources:
limits:
nvidia.com/gpu: 1
Implementation Order & Effort Summary
| Phase | Description | Effort | Dependencies | Priority |
|---|---|---|---|---|
| Phase 1 | Static binary build infrastructure | 3–5 days | None | Critical |
| Phase 3 | Refactor worker for code reuse | 2–3 days | Phase 1 | Critical |
| Phase 2 | Runtime auto-detection | 1–2 days | Phase 1 | High |
| Phase 4 | Docker Compose integration | 1 day | Phase 1 | High |
| Phase 6 | DB runtime registry extensions | 1–2 days | Phase 2 | Medium |
| Phase 5 | API binary download endpoint | 1 day | Phase 1 | Medium |
| Phase 7 ✅ | Kubernetes manifests | 1–2 days | Phase 4, 5 | Complete |
Total estimated effort: 10–16 days
Phases 2 and 3 can be done in parallel. Phase 4 can start as soon as Phase 1 produces a working binary.
Minimum viable feature: Phases 1 + 3 + 4 (~6–9 days) produce a working agent that can be injected into any container via Docker Compose, with manual ATTUNE_WORKER_RUNTIMES configuration. Auto-detection (Phase 2) and dynamic registration (Phase 6) add polish.
Risks & Mitigations
musl + Crate Compatibility
Risk: Some crates may not compile cleanly with x86_64-unknown-linux-musl due to C library dependencies.
Impact: Build failures or runtime issues.
Mitigation:
- SQLx already uses
rustls(no OpenSSL dependency) ✅ - Switch
reqwestandtokio-tungstenitetorustlsfeatures (Phase 1.1) lapinuses pure Rust AMQP — no C dependencies ✅- Test the musl build early in Phase 1 to surface issues quickly
- If a specific crate is problematic, evaluate alternatives or use
crossfor cross-compilation
DNS Resolution with musl
Risk: musl's DNS resolver behaves differently from glibc (no /etc/nsswitch.conf, limited mDNS support). This can cause DNS resolution failures in Docker networks.
Impact: Agent can't resolve postgres, rabbitmq, etc. by Docker service name.
Mitigation:
- Use
trust-dns(nowhickory-dns) resolver feature in SQLx and reqwest instead of the system resolver - Test DNS resolution in Docker Compose early
- If issues arise, document the workaround: use IP addresses or add
dnsconfiguration to the container
Binary Size
Risk: A full statically-linked binary with all worker deps could be 40MB+.
Impact: Slow volume population, slow download via API.
Mitigation:
- Strip debug symbols (
stripcommand) — typically reduces by 50–70% - Use
opt-level = 'z'andlto = truein release profile - Consider
upxcompression (trades CPU at startup for smaller binary) - Feature-gate unused functionality if size is excessive
- Target: <25MB stripped
Non-root User Conflicts
Risk: Different base images run as different UIDs. The agent needs write access to runtime_envs and artifacts volumes.
Impact: Permission denied errors when the container UID doesn't match the volume owner.
Mitigation:
- Document the UID requirement (current standard: UID 1000)
- Provide guidance for running the agent as root with privilege drop
- Consider adding a
--userflag to the agent that drops privileges after setup - For Kubernetes, use
securityContext.runAsUserin the Pod spec
Existing Workers Must Keep Working
Risk: Refactoring WorkerService (Phase 3) could introduce regressions in existing workers.
Impact: Production workers break.
Mitigation:
- The refactoring is additive — existing code paths don't change behavior
- Run the full test suite after Phase 3
- Both
attune-workerandattune-agentshare the same test infrastructure - The
StartupMode::Workerpath is the existing code path with no behavioral changes
Volume Mount Ordering
Risk: The agent container starts before the init-agent service has populated the volume.
Impact: Agent binary not found, container crashes.
Mitigation:
- Use
depends_on: { init-agent: { condition: service_completed_successfully } }in Docker Compose - The wrapper script (Phase 5.2) retries with a short sleep
- For Kubernetes, the initContainer pattern guarantees ordering
Testing Strategy
Unit Tests
- Auto-detection module: mock filesystem and process execution to test interpreter discovery
StartupMode::Agentcode paths: ensure lazy setup and on-demand verification work correctly- All existing worker tests continue to pass (regression safety net)
Integration Tests
- Build the agent binary with musl and run it in various container images:
ruby:3.3-slim(Ruby + shell)python:3.12-slim(Python + shell)node:20-slim(Node.js + shell)alpine:3.20(shell only)ubuntu:24.04(shell only)debian:bookworm-slim(shell only, matches current worker)
- Verify: agent starts, auto-detects runtimes, registers with correct capabilities, executes a simple action, reports results
- Verify: DNS resolution works for Docker service names
Docker Compose Tests
- Spin up the full stack with agent-based workers alongside traditional workers
- Execute actions that target specific runtimes
- Verify the scheduler routes to the correct worker based on capabilities
- Verify graceful shutdown (SIGTERM handling)
Binary Compatibility Tests
- Test the musl binary on: Alpine, Debian, Ubuntu, CentOS/Rocky, Amazon Linux
- Test on both x86_64 and aarch64 (if multi-arch build is implemented)
- Verify no glibc dependency:
ldd attune-agentshould report "not a dynamic executable"
Future Enhancements
These are not part of the initial implementation but are natural extensions:
-
Per-execution container isolation: Instead of a long-running agent, spawn a fresh container per execution with the agent injected. Provides maximum isolation (each action runs in a clean environment) at the cost of startup latency.
-
Container image selection in action YAML: Allow actions to declare
container: ruby:3.3in their YAML, and have the executor spin up an appropriate container with the agent injected. Similar to GitHub Actions' container actions. -
Warm pool: Pre-start a pool of agent containers for common runtimes to reduce first-execution latency.
-
Agent self-update: The agent periodically checks for a newer version of itself (via the API endpoint) and restarts if updated.
-
Windows support: Cross-compile the agent for Windows (MSVC static linking) to support Windows containers.
-
WebAssembly runtime: Compile actions to WASM and execute them inside the agent using wasmtime, eliminating the need for interpreter binaries entirely.
References
- Tekton Entrypoint: https://github.com/tektoncd/pipeline/tree/main/cmd/entrypoint
- Argo Emissary Executor: https://argoproj.github.io/argo-workflows/workflow-executors/
- GitLab Runner Docker Executor: https://docs.gitlab.com/runner/executors/docker.html
- Current worker containerization:
docs/worker-containerization.md - Current runtime detection:
crates/common/src/runtime_detection.rs - Worker service:
crates/worker/src/service.rs - Process executor:
crates/worker/src/runtime/process_executor.rs - Worker Dockerfile:
docker/Dockerfile.worker.optimized