[wip] universal workers
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# Universal Worker Agent Phase 2: Runtime Detection ↔ Worker Registration Integration
|
||||
|
||||
**Date**: 2026-02-05
|
||||
|
||||
## Summary
|
||||
|
||||
Integrated the Phase 1 runtime auto-detection module with the worker registration system so that `attune-agent` workers register with rich interpreter metadata (binary paths, versions) in their capabilities, enabling the system to distinguish agents from standard workers and know exactly which interpreters are available and where.
|
||||
|
||||
## Changes
|
||||
|
||||
### 1. `crates/worker/src/runtime_detect.rs`
|
||||
- Added `Serialize` and `Deserialize` derives to `DetectedRuntime` so instances can be stored as structured JSON in worker capabilities.
|
||||
|
||||
### 2. `crates/worker/src/registration.rs`
|
||||
- Added `use crate::runtime_detect::DetectedRuntime` import.
|
||||
- Added `set_detected_runtimes(&mut self, runtimes: Vec<DetectedRuntime>)` method that stores detected interpreter metadata under the `detected_interpreters` capability key as a JSON array of `{name, path, version}` objects.
|
||||
- Added `set_agent_mode(&mut self, is_agent: bool)` method that sets an `agent_mode` boolean capability to distinguish agent workers from standard workers.
|
||||
- Both methods are additive — the existing `runtimes` string list capability remains for backward compatibility.
|
||||
|
||||
### 3. `crates/worker/src/service.rs`
|
||||
- Added `detected_runtimes: Option<Vec<DetectedRuntime>>` field to `WorkerService` (initialized to `None` in `new()`).
|
||||
- Added `pub fn with_detected_runtimes(mut self, runtimes: Vec<DetectedRuntime>) -> Self` builder method that stores agent detection results for use during `start()`.
|
||||
- Updated `start()` to call `registration.set_detected_runtimes()` and `registration.set_agent_mode(true)` before `register()` when detected runtimes are present.
|
||||
- Standard `attune-worker` binary is completely unaffected — the field stays `None` and no agent-specific code runs.
|
||||
|
||||
### 4. `crates/worker/src/agent_main.rs`
|
||||
- Added `agent_detected_runtimes: Option<Vec<DetectedRuntime>>` variable to stash detection results.
|
||||
- After auto-detection runs and sets `ATTUNE_WORKER_RUNTIMES`, the detected `Vec` is saved into `agent_detected_runtimes`.
|
||||
- After `WorkerService::new()`, calls `.with_detected_runtimes(detected)` if auto-detection ran, so the registration includes full interpreter metadata.
|
||||
|
||||
### 5. `crates/worker/src/lib.rs`
|
||||
- Added `pub use runtime_detect::DetectedRuntime` re-export for convenient access.
|
||||
|
||||
## Capability Format
|
||||
|
||||
After Phase 2, an agent worker's `capabilities` JSON in the `worker` table looks like:
|
||||
|
||||
```json
|
||||
{
|
||||
"runtimes": ["shell", "python", "node"],
|
||||
"max_concurrent_executions": 10,
|
||||
"worker_version": "0.1.0",
|
||||
"agent_mode": true,
|
||||
"detected_interpreters": [
|
||||
{"name": "shell", "path": "/bin/bash", "version": "5.2.15"},
|
||||
{"name": "python", "path": "/usr/bin/python3", "version": "3.12.1"},
|
||||
{"name": "node", "path": "/usr/bin/node", "version": "20.11.0"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Standard `attune-worker` instances do NOT have `agent_mode` or `detected_interpreters` keys.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
- **Builder pattern** (`with_detected_runtimes`) rather than a separate constructor — keeps the API surface minimal and avoids duplicating `new()` logic.
|
||||
- **Explicit `set_agent_mode`** separate from `set_detected_runtimes` — allows independent control, though in practice they're always called together for agents.
|
||||
- **JSON serialization via `serde_json::json!()` macro** rather than `serde_json::to_value(&runtimes)` — gives explicit control over the capability shape and avoids coupling the DB format to the Rust struct layout.
|
||||
- **No changes to the `Worker` model or database schema** — `detected_interpreters` and `agent_mode` are stored inside the existing `capabilities` JSONB column.
|
||||
|
||||
## Verification
|
||||
|
||||
- `cargo check --workspace` — zero errors, zero warnings
|
||||
- `cargo test -p attune-worker` — 139 tests pass (105 unit + 17 dependency isolation + 8 log truncation + 7 security + 2 doc-tests)
|
||||
- Standard `attune-worker` binary path is completely unchanged
|
||||
69
work-summary/2026-02-05-universal-agent-phase2.md
Normal file
69
work-summary/2026-02-05-universal-agent-phase2.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Universal Worker Agent — Phase 2: Runtime Auto-Detection Integration
|
||||
|
||||
**Date**: 2026-02-05
|
||||
|
||||
## Overview
|
||||
|
||||
Implemented Phase 2 of the Universal Worker Agent plan (`docs/plans/universal-worker-agent.md`), which integrates the runtime auto-detection module (built in Phase 1) with the worker registration system. Agent workers now register with rich interpreter metadata — binary paths and versions — alongside the simple runtime name list used for backward compatibility.
|
||||
|
||||
## Changes
|
||||
|
||||
### 1. `crates/worker/src/runtime_detect.rs`
|
||||
|
||||
- Added `Serialize` and `Deserialize` derives to `DetectedRuntime` so detection results can be stored as JSON in worker capabilities.
|
||||
|
||||
### 2. `crates/worker/src/registration.rs`
|
||||
|
||||
- Added `use crate::runtime_detect::DetectedRuntime` import.
|
||||
- **`set_detected_runtimes(runtimes: Vec<DetectedRuntime>)`** — Stores interpreter metadata under the `detected_interpreters` capability key as a structured JSON array (each entry has `name`, `path`, `version`). This supplements the existing `runtimes` string list for backward compatibility.
|
||||
- **`set_agent_mode(is_agent: bool)`** — Sets an `agent_mode` boolean capability so the system can distinguish agent-mode workers from standard workers.
|
||||
|
||||
### 3. `crates/worker/src/service.rs`
|
||||
|
||||
- Added `detected_runtimes: Option<Vec<DetectedRuntime>>` field to `WorkerService` (defaults to `None`).
|
||||
- **`with_detected_runtimes(self, runtimes) -> Self`** — Builder method to pass agent-detected runtimes into the service. No-op for standard `attune-worker`.
|
||||
- Updated `start()` to call `set_detected_runtimes()` + `set_agent_mode(true)` on the registration before `register()` when detected runtimes are present.
|
||||
|
||||
### 4. `crates/worker/src/agent_main.rs`
|
||||
|
||||
- Added `agent_detected_runtimes: Option<Vec<DetectedRuntime>>` variable to stash detection results.
|
||||
- After auto-detection runs, the detected runtimes are saved (previously they were consumed by the env var setup and discarded).
|
||||
- After `WorkerService::new()`, chains `.with_detected_runtimes(detected)` to pass the metadata through.
|
||||
|
||||
### 5. `crates/worker/src/lib.rs`
|
||||
|
||||
- Re-exported `DetectedRuntime` from `runtime_detect` module for external use.
|
||||
|
||||
## Worker Capabilities (Agent Mode)
|
||||
|
||||
When an `attune-agent` registers, its capabilities JSON now includes:
|
||||
|
||||
```json
|
||||
{
|
||||
"runtimes": ["shell", "python", "node"],
|
||||
"detected_interpreters": [
|
||||
{"name": "shell", "path": "/bin/bash", "version": "5.2.15"},
|
||||
{"name": "python", "path": "/usr/bin/python3", "version": "3.12.1"},
|
||||
{"name": "node", "path": "/usr/bin/node", "version": "20.11.0"}
|
||||
],
|
||||
"agent_mode": true,
|
||||
"max_concurrent_executions": 10,
|
||||
"worker_version": "0.1.0"
|
||||
}
|
||||
```
|
||||
|
||||
Standard `attune-worker` registrations are unchanged — no `detected_interpreters` or `agent_mode` keys.
|
||||
|
||||
## Phase 2 Sub-tasks
|
||||
|
||||
| Sub-task | Status | Notes |
|
||||
|----------|--------|-------|
|
||||
| 2.1 Interpreter Discovery Module | ✅ Done (Phase 1) | `runtime_detect.rs` already existed |
|
||||
| 2.2 Integration with Worker Registration | ✅ Done | Rich capabilities + agent_mode flag |
|
||||
| 2.3 Runtime Hints File | ⏭️ Deferred | Optional enhancement, not needed yet |
|
||||
|
||||
## Verification
|
||||
|
||||
- `cargo check --workspace` — zero errors, zero warnings
|
||||
- `cargo test -p attune-worker` — all tests pass (unit, integration, doc-tests)
|
||||
- No breaking changes to the standard `attune-worker` binary
|
||||
72
work-summary/2026-02-05-universal-worker-agent-phase3.md
Normal file
72
work-summary/2026-02-05-universal-worker-agent-phase3.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Universal Worker Agent — Phase 3: WorkerService Dual-Mode Refactor
|
||||
|
||||
**Date**: 2026-02-05
|
||||
|
||||
## Overview
|
||||
|
||||
Implemented Phase 3 of the Universal Worker Agent plan (`docs/plans/universal-worker-agent.md`): refactoring `WorkerService` for clean code reuse between the full `attune-worker` and the `attune-agent` binary, without code duplication.
|
||||
|
||||
## Changes
|
||||
|
||||
### 1. `StartupMode` Enum (`crates/worker/src/service.rs`)
|
||||
|
||||
Added a `StartupMode` enum that controls how the worker initializes its runtime environment:
|
||||
|
||||
- **`Worker`** — Full worker mode with proactive environment setup and full version verification sweep at startup. This is the existing behavior used by `attune-worker`.
|
||||
- **`Agent { detected_runtimes }`** — Agent mode with lazy (on-demand) environment setup and deferred version verification. Used by `attune-agent`. Carries the auto-detected runtimes from Phase 2.
|
||||
|
||||
### 2. `WorkerService` Struct Refactoring (`crates/worker/src/service.rs`)
|
||||
|
||||
- Replaced the `detected_runtimes: Option<Vec<DetectedRuntime>>` field with `startup_mode: StartupMode`
|
||||
- `new()` defaults to `StartupMode::Worker`
|
||||
- `with_detected_runtimes()` now sets `StartupMode::Agent { detected_runtimes }` — the method signature is unchanged, so `agent_main.rs` requires no modifications
|
||||
|
||||
### 3. Conditional Startup in `start()` (`crates/worker/src/service.rs`)
|
||||
|
||||
The `start()` method now branches on `self.startup_mode`:
|
||||
|
||||
- **Worker mode**: Runs `verify_runtime_versions()` and `scan_and_setup_environments()` proactively (existing behavior, unchanged)
|
||||
- **Agent mode**: Skips both with an info log — environments will be created lazily on first execution
|
||||
|
||||
Agent capability registration (`set_detected_runtimes()`, `set_agent_mode()`) also uses the `StartupMode` match instead of the old `Option` check.
|
||||
|
||||
### 4. Lazy Environment Setup (`crates/worker/src/runtime/process.rs`)
|
||||
|
||||
Updated `ProcessRuntime::execute()` to perform on-demand environment creation when the env directory is missing. Previously, a missing env dir produced a warning and fell back to the system interpreter. Now it:
|
||||
|
||||
1. Logs an info message about lazy setup
|
||||
2. Creates a temporary `ProcessRuntime` with the effective config
|
||||
3. Calls `setup_pack_environment()` to create the environment (venv, node_modules, etc.)
|
||||
4. Falls back to the system interpreter only if creation fails
|
||||
|
||||
This is the primary code path for agent mode (where proactive startup setup is skipped) but also serves as a safety net for standard workers.
|
||||
|
||||
### 5. Re-export (`crates/worker/src/lib.rs`)
|
||||
|
||||
`StartupMode` is re-exported from the `attune_worker` crate root for external use.
|
||||
|
||||
## Files Modified
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `crates/worker/src/service.rs` | Added `StartupMode` enum, replaced `detected_runtimes` field, conditional startup logic |
|
||||
| `crates/worker/src/runtime/process.rs` | Lazy on-demand environment creation in `execute()` |
|
||||
| `crates/worker/src/lib.rs` | Re-export `StartupMode` |
|
||||
| `AGENTS.md` | Updated development status (Phase 3 complete, Phases 4–7 in progress) |
|
||||
|
||||
## Test Results
|
||||
|
||||
All 139 tests pass:
|
||||
- 105 unit tests
|
||||
- 17 dependency isolation tests
|
||||
- 8 log truncation tests
|
||||
- 7 security tests
|
||||
- 2 doc-tests
|
||||
|
||||
Zero compiler warnings across the workspace.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
- **No code duplication**: The `StartupMode` enum parameterizes `WorkerService` rather than creating a separate `AgentService`. All execution machinery (runtimes, consumers, heartbeat, cancellation) is shared.
|
||||
- **Lazy setup as safety net**: The on-demand environment creation in `ProcessRuntime::execute()` benefits both modes — agents rely on it as the primary path, while standard workers get it as a fallback if proactive setup missed something.
|
||||
- **Backward compatible API**: `with_detected_runtimes()` keeps its signature, so `agent_main.rs` needed no changes.
|
||||
77
work-summary/2026-02-agent-docker-compose.md
Normal file
77
work-summary/2026-02-agent-docker-compose.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Universal Worker Agent: Phase 4 — Docker Compose Integration
|
||||
|
||||
**Date**: 2026-02-05
|
||||
**Phase**: 4 of 7
|
||||
**Scope**: Docker Compose integration for agent-based workers
|
||||
|
||||
## Summary
|
||||
|
||||
Added Docker Compose infrastructure to make it trivial to add agent-based workers to an Attune deployment. Users can now inject the statically-linked `attune-agent` binary into any container image via a shared volume, turning it into a fully functional Attune worker with auto-detected runtimes — no Dockerfiles, no Rust compilation.
|
||||
|
||||
## Changes
|
||||
|
||||
### docker-compose.yaml
|
||||
- Added `init-agent` service between `init-packs` and `rabbitmq`
|
||||
- Builds from `docker/Dockerfile.agent` (target: `agent-init`)
|
||||
- Copies the statically-linked binary to the `agent_bin` volume at `/opt/attune/agent/attune-agent`
|
||||
- Runs once (`restart: "no"`) and completes immediately
|
||||
- Added `agent_bin` named volume to the volumes section
|
||||
|
||||
### docker-compose.agent.yaml (new)
|
||||
- Override file with example agent-based worker services
|
||||
- **Active (uncommented)**: `worker-ruby` using `ruby:3.3-slim`
|
||||
- **Commented templates**: Python 3.12, NVIDIA CUDA GPU, and custom image workers
|
||||
- All workers follow the same pattern: mount `agent_bin` read-only, use `attune-agent` as entrypoint, share standard volumes
|
||||
|
||||
### Makefile
|
||||
- Added `docker-up-agent` target: `docker compose -f docker-compose.yaml -f docker-compose.agent.yaml up -d`
|
||||
- Added `docker-down-agent` target: corresponding `down` command
|
||||
- Updated `.PHONY` and help text
|
||||
|
||||
### docs/QUICKREF-agent-workers.md (new)
|
||||
- Quick-reference guide for adding agent-based workers
|
||||
- Covers: how it works, quick start (override file vs docker-compose.override.yaml), required volumes, required environment variables, runtime auto-detection, testing detection, examples (Ruby, Node.js, GPU, multi-runtime), comparison table (traditional vs agent workers), troubleshooting
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Start everything including the Ruby agent worker
|
||||
make docker-up-agent
|
||||
|
||||
# Or manually
|
||||
docker compose -f docker-compose.yaml -f docker-compose.agent.yaml up -d
|
||||
|
||||
# Stop
|
||||
make docker-down-agent
|
||||
```
|
||||
|
||||
Adding a new runtime worker is ~12 lines of YAML in `docker-compose.override.yaml`:
|
||||
```yaml
|
||||
services:
|
||||
worker-my-runtime:
|
||||
image: my-org/my-image:latest
|
||||
depends_on:
|
||||
init-agent:
|
||||
condition: service_completed_successfully
|
||||
# ... standard health checks
|
||||
entrypoint: ["/opt/attune/agent/attune-agent"]
|
||||
volumes:
|
||||
- agent_bin:/opt/attune/agent:ro
|
||||
- packs_data:/opt/attune/packs:ro
|
||||
- runtime_envs:/opt/attune/runtime_envs
|
||||
- artifacts_data:/opt/attune/artifacts
|
||||
- ${ATTUNE_DOCKER_CONFIG_PATH:-./config.docker.yaml}:/opt/attune/config/config.yaml:ro
|
||||
networks:
|
||||
- attune-network
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Requires**: Phase 1 (agent binary build infrastructure) — `docker/Dockerfile.agent` must exist
|
||||
- **Requires**: Phase 3 (WorkerService dual-mode refactor) — agent auto-detection and lazy env setup
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **Phase 5**: API binary download endpoint (`GET /api/v1/agent/binary`)
|
||||
- **Phase 6**: Database runtime registry extensions (runtime template packs)
|
||||
- **Phase 7**: Kubernetes support (InitContainer pattern, Helm chart)
|
||||
100
work-summary/2026-02-universal-agent-phase6.md
Normal file
100
work-summary/2026-02-universal-agent-phase6.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Universal Worker Agent — Phase 6: Database & Runtime Registry Extensions
|
||||
|
||||
**Date**: 2026-02
|
||||
**Phase**: 6 of 7 (Universal Worker Agent)
|
||||
**Plan**: `docs/plans/universal-worker-agent.md`
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 6 extends the runtime registry so that the universal worker agent (`attune-agent`) can work with arbitrary runtimes — including languages like Ruby, Go, Java, Perl, and R — without requiring every possible runtime to be pre-registered in the database by an administrator.
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 6.1 Extended Runtime Detection Metadata
|
||||
|
||||
**Migration** (`migrations/20250101000012_agent_runtime_detection.sql`):
|
||||
- Added `auto_detected BOOLEAN NOT NULL DEFAULT FALSE` column to `runtime` table — distinguishes agent-created runtimes from pack-loaded ones
|
||||
- Added `detection_config JSONB NOT NULL DEFAULT '{}'` column — stores detection metadata (detected binary path, version, runtime name)
|
||||
- Added index `idx_runtime_auto_detected` for efficient filtering
|
||||
|
||||
**Rust Model** (`crates/common/src/models.rs`):
|
||||
- Added `auto_detected: bool` and `detection_config: JsonDict` fields to the `Runtime` struct
|
||||
|
||||
**Repository** (`crates/common/src/repositories/runtime.rs`):
|
||||
- Added `SELECT_COLUMNS` constant centralising the column list for all runtime queries
|
||||
- Added `auto_detected` and `detection_config` to `CreateRuntimeInput` and `UpdateRuntimeInput`
|
||||
- Updated ALL 7 SELECT queries, 2 RETURNING clauses, and the INSERT statement to include the new columns
|
||||
- Updated the `update` method to support setting `auto_detected` and `detection_config`
|
||||
|
||||
**External query sites updated**:
|
||||
- `crates/common/src/runtime_detection.rs` — `detect_from_database()`
|
||||
- `crates/common/src/pack_environment.rs` — `get_runtime()`
|
||||
- `crates/worker/src/executor.rs` — `prepare_execution_context()`
|
||||
|
||||
**All `CreateRuntimeInput` construction sites updated** (7 files):
|
||||
- `crates/api/src/routes/runtimes.rs`
|
||||
- `crates/common/src/pack_registry/loader.rs`
|
||||
- `crates/common/tests/helpers.rs`
|
||||
- `crates/common/tests/repository_runtime_tests.rs`
|
||||
- `crates/common/tests/repository_worker_tests.rs`
|
||||
- `crates/executor/tests/fifo_ordering_integration_test.rs`
|
||||
- `crates/executor/tests/policy_enforcer_tests.rs`
|
||||
|
||||
### 6.2 Runtime Template Packs
|
||||
|
||||
Added 5 new runtime YAML definitions in `packs/core/runtimes/`:
|
||||
|
||||
| File | Ref | Interpreter | Environment | Dependencies |
|
||||
|------|-----|-------------|-------------|--------------|
|
||||
| `ruby.yaml` | `core.ruby` | `ruby` (.rb) | GEM_HOME isolation | Gemfile → bundle install |
|
||||
| `go.yaml` | `core.go` | `go run` (.go) | GOPATH isolation | go.mod → go mod download |
|
||||
| `java.yaml` | `core.java` | `java` (.java) | None (simple) | None |
|
||||
| `perl.yaml` | `core.perl` | `perl` (.pl) | local::lib isolation | cpanfile → cpanm |
|
||||
| `r.yaml` | `core.r` | `Rscript --vanilla` (.R) | renv isolation | renv.lock → renv::restore() |
|
||||
|
||||
Each includes verification commands matching the auto-detection module's probe strategy.
|
||||
|
||||
### 6.3 Dynamic Runtime Registration
|
||||
|
||||
**New module** (`crates/worker/src/dynamic_runtime.rs`):
|
||||
- `auto_register_detected_runtimes(pool, detected)` — main entry point called from `agent_main.rs` BEFORE `WorkerService::new()`
|
||||
- For each detected runtime:
|
||||
1. Alias-aware lookup in existing DB runtimes (via `normalize_runtime_name`)
|
||||
2. If not found, looks for a template runtime by ref pattern `core.<name>`
|
||||
3. If template found, clones it with `auto_detected = true` and substitutes the detected binary path
|
||||
4. If no template, creates a minimal runtime with just the interpreter binary and file extension
|
||||
5. Auto-registered runtimes use ref format `auto.<name>` (e.g., `auto.ruby`)
|
||||
- Helper functions: `build_detection_config()`, `build_execution_config_from_template()`, `build_minimal_execution_config()`, `build_minimal_distributions()`, `capitalize_runtime_name()`
|
||||
- 8 unit tests covering all helpers
|
||||
|
||||
**Agent entrypoint** (`crates/worker/src/agent_main.rs`):
|
||||
- Added Phase 2b between config loading and `WorkerService::new()`
|
||||
- Creates a temporary DB connection and calls `auto_register_detected_runtimes()` for all detected runtimes
|
||||
- Non-fatal: registration failures are logged as warnings, agent continues
|
||||
|
||||
**Runtime name normalization** (`crates/common/src/runtime_detection.rs`):
|
||||
- Extended `normalize_runtime_name()` with 5 new alias groups:
|
||||
- `ruby`/`rb` → `ruby`
|
||||
- `go`/`golang` → `go`
|
||||
- `java`/`jdk`/`openjdk` → `java`
|
||||
- `perl`/`perl5` → `perl`
|
||||
- `r`/`rscript` → `r`
|
||||
- Added 5 new unit tests + 6 new assertions in existing filter tests
|
||||
|
||||
## Architecture Decisions
|
||||
|
||||
1. **Dynamic registration before WorkerService::new()**: The `WorkerService` constructor loads runtimes from the DB into an immutable `RuntimeRegistry` wrapped in `Arc`. Rather than restructuring this, dynamic registration runs beforehand so the normal loading pipeline picks up the new entries.
|
||||
|
||||
2. **Template-based cloning**: Auto-detected runtimes clone their execution config from pack templates (e.g., `core.ruby`) when available, inheriting environment management, dependency installation, and env_vars configuration. Only the interpreter binary path is substituted with the actual detected path.
|
||||
|
||||
3. **Minimal fallback**: When no template exists, a bare-minimum runtime entry is created with just the interpreter binary. This enables immediate script execution without environment/dependency management.
|
||||
|
||||
4. **`auto.` ref prefix**: Auto-detected runtimes use `auto.<name>` refs to avoid collisions with pack-registered templates (which use `core.<name>` or `<pack>.<name>`).
|
||||
|
||||
## Test Results
|
||||
|
||||
- **Worker crate**: 114 passed, 0 failed, 3 ignored
|
||||
- **Common crate**: 321 passed, 0 failed
|
||||
- **API crate**: 110 passed, 0 failed, 1 ignored
|
||||
- **Executor crate**: 115 passed, 0 failed, 1 ignored
|
||||
- **Workspace check**: Zero errors, zero warnings
|
||||
62
work-summary/2026-03-21-agent-binary-download-endpoint.md
Normal file
62
work-summary/2026-03-21-agent-binary-download-endpoint.md
Normal file
@@ -0,0 +1,62 @@
|
||||
# Universal Worker Agent Phase 5: API Binary Download Endpoint
|
||||
|
||||
**Date**: 2026-03-21
|
||||
**Phase**: Universal Worker Agent Phase 5
|
||||
**Status**: Complete
|
||||
|
||||
## Overview
|
||||
|
||||
Implemented the API binary download endpoint for the Attune universal worker agent. This enables deployments where shared Docker volumes are impractical (Kubernetes, ECS, remote Docker hosts) by allowing containers to download the agent binary directly from the Attune API at startup.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Files
|
||||
|
||||
- **`crates/api/src/routes/agent.rs`** — Two new unauthenticated API endpoints:
|
||||
- `GET /api/v1/agent/binary` — Streams the statically-linked `attune-agent` binary as `application/octet-stream`. Supports `?arch=x86_64|aarch64|arm64` query parameter (defaults to `x86_64`). Tries arch-specific binary (`attune-agent-{arch}`) first, falls back to generic (`attune-agent`). Uses `ReaderStream` for memory-efficient streaming. Optional bootstrap token authentication via `X-Agent-Token` header or `token` query parameter.
|
||||
- `GET /api/v1/agent/info` — Returns JSON metadata about available agent binaries (architectures, sizes, availability status, version).
|
||||
|
||||
- **`scripts/attune-agent-wrapper.sh`** — Bootstrap entrypoint script for containers without volume-mounted agent binary. Features:
|
||||
- Auto-detects host architecture via `uname -m`
|
||||
- Checks for volume-mounted binary first (zero-overhead fast path)
|
||||
- Downloads from API with retry logic (10 attempts, 5s delay) using `curl` or `wget`
|
||||
- Supports bootstrap token via `ATTUNE_AGENT_TOKEN` env var
|
||||
- Verifies downloaded binary compatibility
|
||||
- Configurable via `ATTUNE_AGENT_DIR`, `ATTUNE_AGENT_URL`, `ATTUNE_AGENT_ARCH` env vars
|
||||
|
||||
### Modified Files
|
||||
|
||||
- **`crates/common/src/config.rs`** — Added `AgentConfig` struct with `binary_dir` (path to agent binaries) and `bootstrap_token` (optional auth). Added `agent: Option<AgentConfig>` field to `Config`.
|
||||
|
||||
- **`crates/api/src/routes/mod.rs`** — Added `pub mod agent` and `pub use agent::routes as agent_routes`.
|
||||
|
||||
- **`crates/api/src/server.rs`** — Added `.merge(routes::agent_routes())` to the API v1 router.
|
||||
|
||||
- **`crates/api/src/openapi.rs`** — Registered both endpoints in OpenAPI paths, added `AgentBinaryInfo` and `AgentArchInfo` schemas, added `"agent"` tag. Updated endpoint count test assertions (+2 paths, +2 operations).
|
||||
|
||||
- **`config.docker.yaml`** — Added `agent.binary_dir: /opt/attune/agent` configuration.
|
||||
|
||||
- **`config.development.yaml`** — Added commented-out agent config pointing to local musl build output.
|
||||
|
||||
- **`docker-compose.yaml`** — API service now mounts `agent_bin` volume read-only at `/opt/attune/agent` and depends on `init-agent` service completing successfully.
|
||||
|
||||
- **`AGENTS.md`** — Updated development status (Phase 5 complete), updated agent_bin volume description, added agent config to Key Settings.
|
||||
|
||||
## Architecture Decisions
|
||||
|
||||
1. **Unauthenticated endpoint** — The agent needs to download its binary before it can authenticate with JWT. An optional lightweight bootstrap token (`agent.bootstrap_token`) provides security when needed.
|
||||
|
||||
2. **Streaming response** — Uses `tokio_util::io::ReaderStream` to stream the ~20MB binary without loading it entirely into memory.
|
||||
|
||||
3. **Architecture whitelist** — Only `x86_64`, `aarch64`, and `arm64` (alias) are accepted, preventing path traversal attacks.
|
||||
|
||||
4. **Graceful fallback** — Arch-specific binary (`attune-agent-x86_64`) → generic binary (`attune-agent`) → 404. This supports both multi-arch and single-arch deployments.
|
||||
|
||||
5. **Volume-first strategy** — The wrapper script checks for a volume-mounted binary before attempting download, so Docker Compose deployments with the `agent_bin` volume pay zero network overhead.
|
||||
|
||||
## Testing
|
||||
|
||||
- All 4 OpenAPI tests pass (including updated endpoint count: 59 paths, 83 operations)
|
||||
- All 21 config tests pass (including `AgentConfig` integration)
|
||||
- API crate compiles with zero warnings
|
||||
- Common crate compiles with zero warnings
|
||||
72
work-summary/2026-03-21-universal-worker-agent-phase1.md
Normal file
72
work-summary/2026-03-21-universal-worker-agent-phase1.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Universal Worker Agent — Phase 1: Static Binary Build Infrastructure
|
||||
|
||||
**Date**: 2026-03-21
|
||||
|
||||
## Summary
|
||||
|
||||
Implemented Phase 1 of the Universal Worker Agent plan (`docs/plans/universal-worker-agent.md`), establishing the build infrastructure for a statically-linked `attune-agent` binary that can be injected into any container to turn it into an Attune worker.
|
||||
|
||||
## Problem
|
||||
|
||||
Adding support for new runtime environments (Ruby, Go, Java, R, etc.) required building custom Docker images for each combination. This meant modifying `Dockerfile.worker.optimized`, installing interpreters via apt, managing a combinatorial explosion of worker variants, and rebuilding images (~5 min) for every change.
|
||||
|
||||
## Solution
|
||||
|
||||
Phase 1 lays the groundwork for flipping the model: instead of baking the worker into custom images, a single static binary is injected into **any** container at startup. This phase delivers:
|
||||
|
||||
1. **TLS backend audit** — confirmed the worker crate has zero `native-tls` or `openssl` dependencies, making musl static linking viable without any TLS backend changes
|
||||
2. **New binary target** — `attune-agent` alongside `attune-worker` in the same crate
|
||||
3. **Runtime auto-detection module** — probes container environments for interpreters
|
||||
4. **Dockerfile for static builds** — multi-stage musl cross-compilation
|
||||
5. **Makefile targets** — local and Docker build commands
|
||||
|
||||
## Changes
|
||||
|
||||
### New Files
|
||||
|
||||
- **`crates/worker/src/agent_main.rs`** — Agent entrypoint with three-phase startup: (1) auto-detect runtimes or respect `ATTUNE_WORKER_RUNTIMES` override, (2) load config, (3) run `WorkerService`. Includes `--detect-only` flag for diagnostic probing.
|
||||
|
||||
- **`crates/worker/src/runtime_detect.rs`** — Database-free runtime detection module. Probes 8 interpreter families (shell, python, node, ruby, go, java, r, perl) via `which`-style PATH lookup with fallbacks. Captures version strings. 18 unit tests covering version parsing, display formatting, binary lookup, and detection pipeline.
|
||||
|
||||
- **`docker/Dockerfile.agent`** — Multi-stage Dockerfile:
|
||||
- `builder` stage: cross-compiles with `x86_64-unknown-linux-musl` target, BuildKit cache mounts
|
||||
- `agent-binary` stage: `FROM scratch` with just the static binary
|
||||
- `agent-init` stage: busybox-based for Docker Compose/K8s init container volume population
|
||||
|
||||
### Modified Files
|
||||
|
||||
- **`crates/worker/Cargo.toml`** — Added second `[[bin]]` target for `attune-agent`
|
||||
- **`crates/worker/src/lib.rs`** — Added `pub mod runtime_detect`
|
||||
- **`Makefile`** — Added targets: `build-agent` (local musl build), `docker-build-agent`, `run-agent`, `run-agent-release`
|
||||
- **`docker/Dockerfile.worker.optimized`** — Added `agent_main.rs` stub for second binary target
|
||||
- **`docker/Dockerfile.optimized`** — Added `agent_main.rs` stub
|
||||
- **`docker/Dockerfile.sensor.optimized`** — Added `agent_main.rs` stub
|
||||
- **`docker/Dockerfile.pack-binaries`** — Added `agent_main.rs` stub
|
||||
- **`AGENTS.md`** — Documented agent service, runtime auto-detection, Docker build, Makefile targets
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Same crate, new binary** — The agent lives as a second `[[bin]]` target in `crates/worker` rather than a separate crate. This gives zero code duplication and the same test suite covers both binaries. Can be split into a separate crate later if binary size becomes a concern.
|
||||
|
||||
2. **No TLS changes needed** — The plan anticipated needing to switch from `native-tls` to `rustls` workspace-wide. Audit revealed the worker crate already uses `rustls` exclusively (`native-tls` only enters via `tokio-tungstenite` in CLI and `ldap3` in API, neither of which the worker depends on).
|
||||
|
||||
3. **Database-free detection** — The `runtime_detect` module is deliberately separate from `attune_common::runtime_detection` (which queries the database). The agent must discover runtimes before any DB connectivity, using pure filesystem probing.
|
||||
|
||||
4. **All Dockerfiles updated** — Since the worker crate now has two binary targets, all Dockerfiles that create workspace stubs for `cargo fetch` need a stub for `agent_main.rs`. Missing this would break Docker builds.
|
||||
|
||||
## Verification
|
||||
|
||||
- `cargo check --all-targets --workspace` — zero warnings ✅
|
||||
- `cargo test -p attune-worker` — all 37 tests pass (18 new runtime_detect tests + 19 existing) ✅
|
||||
- `cargo run --bin attune-agent -- --detect-only` — successfully detected 6 runtimes on dev machine ✅
|
||||
- `cargo run --bin attune-agent -- --help` — correct CLI documentation ✅
|
||||
|
||||
## Next Steps (Phases 2–7)
|
||||
|
||||
See `docs/plans/universal-worker-agent.md` for the remaining phases:
|
||||
- **Phase 2**: Integration with worker registration (auto-detected runtimes → DB)
|
||||
- **Phase 3**: Refactor `WorkerService` for dual modes (lazy env setup)
|
||||
- **Phase 4**: Docker Compose init service for agent volume
|
||||
- **Phase 5**: API binary download endpoint
|
||||
- **Phase 6**: Database runtime registry extensions
|
||||
- **Phase 7**: Kubernetes support (init containers, Helm chart)
|
||||
46
work-summary/kubernetes-agent-workers.md
Normal file
46
work-summary/kubernetes-agent-workers.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Universal Worker Agent Phase 7: Kubernetes Support
|
||||
|
||||
**Date**: 2026-02-05
|
||||
|
||||
## Summary
|
||||
|
||||
Implemented Kubernetes support for agent-based workers in the Attune Helm chart, completing Phase 7 of the Universal Worker Agent plan. Users can now deploy the `attune-agent` binary into any container image on Kubernetes using the InitContainer pattern — the same approach used by Tekton and Argo.
|
||||
|
||||
## Changes
|
||||
|
||||
### Helm Chart (`charts/attune/`)
|
||||
|
||||
- **`templates/agent-workers.yaml`** (new): Helm template that iterates over `agentWorkers[]` values and creates a Deployment per entry. Each Deployment includes:
|
||||
- `agent-loader` init container — copies the statically-linked `attune-agent` binary from the `attune-agent` image into an `emptyDir` volume
|
||||
- `wait-for-schema` init container — polls PostgreSQL until the Attune schema is ready
|
||||
- `wait-for-packs` init container — waits for the core pack on the shared PVC
|
||||
- Worker container — runs the user's chosen image with the agent binary as entrypoint
|
||||
- Volumes: `agent-bin` (emptyDir), `config` (ConfigMap), `packs` (PVC, read-only), `runtime-envs` (PVC), `artifacts` (PVC)
|
||||
|
||||
- **`values.yaml`**: Added `images.agent` (repository, tag, pullPolicy) and `agentWorkers: []` with full documentation of supported fields: `name`, `image`, `replicas`, `runtimes`, `resources`, `env`, `imagePullPolicy`, `logLevel`, `runtimeClassName`, `nodeSelector`, `tolerations`, `stopGracePeriod`
|
||||
|
||||
- **`templates/NOTES.txt`**: Updated to list enabled agent workers on install/upgrade
|
||||
|
||||
### CI/CD (`.gitea/workflows/publish.yml`)
|
||||
|
||||
- Added `attune-agent` to the image build matrix (target: `agent-init`, dockerfile: `docker/Dockerfile.agent`) so the agent image is published alongside all other Attune images
|
||||
|
||||
### Documentation
|
||||
|
||||
- **`docs/QUICKREF-kubernetes-agent-workers.md`** (new): Quick-reference guide covering how agent workers work on Kubernetes, all supported Helm values fields, runtime auto-detection table, differences from the standard worker, and troubleshooting steps
|
||||
- **`docs/deployment/gitea-registry-and-helm.md`**: Added `attune-agent` to the published images list
|
||||
- **`docs/plans/universal-worker-agent.md`**: Marked Phase 7 as complete with implementation details
|
||||
|
||||
### AGENTS.md
|
||||
|
||||
- Moved Phase 7 from "In Progress" to "Complete" with a summary of what was implemented
|
||||
|
||||
## Design Decisions
|
||||
|
||||
1. **emptyDir volume** (not PVC) for the agent binary — each pod gets its own copy via the init container. This avoids needing a shared RWX volume just for a single static binary and follows the standard Kubernetes sidecar injection pattern used by Tekton, Argo, and Istio.
|
||||
|
||||
2. **Pod-level scheduling fields** — `runtimeClassName`, `nodeSelector`, and `tolerations` are exposed at the pod spec level (not container level) to support GPU scheduling via NVIDIA RuntimeClass and node affinity for specialized hardware.
|
||||
|
||||
3. **Runtime auto-detect by default** — when `runtimes` is empty (the default), the agent probes the container for interpreters. Users can override with an explicit list to skip detection and limit which runtimes are registered.
|
||||
|
||||
4. **Consistent patterns** — the template reuses the same `wait-for-schema` and `wait-for-packs` init containers, `envFrom` secret injection, and volume mount structure as the existing worker Deployment in `applications.yaml`.
|
||||
Reference in New Issue
Block a user