artifacts!
This commit is contained in:
70
work-summary/2026-03-03-cli-pack-upload.md
Normal file
70
work-summary/2026-03-03-cli-pack-upload.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# CLI Pack Upload Command
|
||||
|
||||
**Date**: 2026-03-03
|
||||
**Scope**: `crates/cli`, `crates/api`
|
||||
|
||||
## Problem
|
||||
|
||||
The `attune pack register` command requires the API server to be able to reach the pack directory at the specified filesystem path. When the API runs inside Docker, this means the path must be inside a known container mount (e.g. `/opt/attune/packs.dev/...`). There was no way to install a pack from an arbitrary local path on the developer's machine into a Dockerized Attune system.
|
||||
|
||||
## Solution
|
||||
|
||||
Added a new `pack upload` CLI command and a corresponding `POST /api/v1/packs/upload` API endpoint. The CLI creates a `.tar.gz` archive of the local pack directory in memory and streams it to the API via `multipart/form-data`. The API extracts the archive and calls the existing `register_pack_internal` function, so all normal registration logic (component loading, workflow sync, MQ notifications) still applies.
|
||||
|
||||
## Changes
|
||||
|
||||
### New API endpoint: `POST /api/v1/packs/upload`
|
||||
- **File**: `crates/api/src/routes/packs.rs`
|
||||
- Accepts `multipart/form-data` with:
|
||||
- `pack` (required) — `.tar.gz` archive of the pack directory
|
||||
- `force` (optional) — `"true"` to overwrite an existing pack
|
||||
- `skip_tests` (optional) — `"true"` to skip test execution
|
||||
- Extracts the archive to a temp directory using `flate2` + `tar`
|
||||
- Locates `pack.yaml` at the archive root or one level deep (handles GitHub-style tarballs)
|
||||
- Reads the pack `ref`, moves the directory to permanent storage, then calls `register_pack_internal`
|
||||
- Added helper: `find_pack_root()` walks up to one level to find `pack.yaml`
|
||||
|
||||
### New CLI command: `attune pack upload <path>`
|
||||
- **File**: `crates/cli/src/commands/pack.rs`
|
||||
- Validates the local path exists and contains `pack.yaml`
|
||||
- Reads `pack.yaml` to extract the pack ref for display messages
|
||||
- Builds an in-memory `.tar.gz` using `tar::Builder` + `flate2::GzEncoder`
|
||||
- Helper `append_dir_to_tar()` recursively archives directory contents with paths relative to the pack root (symlinks are skipped)
|
||||
- Calls `ApiClient::multipart_post()` with the archive bytes
|
||||
- Flags: `--force` / `--skip-tests`
|
||||
|
||||
### New `ApiClient::multipart_post()` method
|
||||
- **File**: `crates/cli/src/client.rs`
|
||||
- Accepts a file field (name, bytes, filename, MIME type) plus a list of extra text fields
|
||||
- Follows the same 401-refresh-then-error pattern as other methods
|
||||
- HTTP client timeout increased from 30s to 300s for uploads
|
||||
|
||||
### `pack register` UX improvement
|
||||
- **File**: `crates/cli/src/commands/pack.rs`
|
||||
- Emits a warning when the supplied path looks like a local filesystem path (not under `/opt/attune/`, `/app/`, etc.), suggesting `pack upload` instead
|
||||
|
||||
### New workspace dependencies
|
||||
- **Workspace** (`Cargo.toml`): `tar = "0.4"`, `flate2 = "1.0"`, `tempfile` moved from testing to runtime
|
||||
- **API** (`crates/api/Cargo.toml`): added `tar`, `flate2`, `tempfile`
|
||||
- **CLI** (`crates/cli/Cargo.toml`): added `tar`, `flate2`; `reqwest` gains `multipart` + `stream` features
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Log in to the dockerized system
|
||||
attune --api-url http://localhost:8080 auth login \
|
||||
--username test@attune.local --password 'TestPass123!'
|
||||
|
||||
# Upload and register a local pack (works from any machine)
|
||||
attune --api-url http://localhost:8080 pack upload ./packs.external/python_example \
|
||||
--skip-tests --force
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
Tested against a live Docker Compose stack:
|
||||
- Pack archive created (~13 KB for `python_example`)
|
||||
- API received, extracted, and stored the pack at `/opt/attune/packs/python_example`
|
||||
- All 5 actions, 1 trigger, and 1 sensor were registered
|
||||
- `pack.registered` MQ event published to trigger worker environment setup
|
||||
- `attune action list` confirmed all components were visible
|
||||
132
work-summary/2026-03-03-cli-wait-notifier-fixes.md
Normal file
132
work-summary/2026-03-03-cli-wait-notifier-fixes.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# CLI `--wait` and Notifier WebSocket Fixes
|
||||
|
||||
**Date**: 2026-03-03
|
||||
**Session type**: Bug investigation and fix
|
||||
|
||||
## Summary
|
||||
|
||||
Investigated and fixed a long-standing hang in `attune action execute --wait` and the underlying root-cause bugs in the notifier service. The `--wait` flag now works reliably, returning within milliseconds of execution completion via WebSocket notifications.
|
||||
|
||||
## Problems Found and Fixed
|
||||
|
||||
### Bug 1: PostgreSQL `PgListener` broken after sequential `listen()` calls (Notifier)
|
||||
|
||||
**File**: `crates/notifier/src/postgres_listener.rs`
|
||||
|
||||
**Symptom**: The notifier service never received any PostgreSQL LISTEN/NOTIFY messages after startup. Direct `pg_notify()` calls from psql also went undelivered.
|
||||
|
||||
**Root cause**: The notifier called `listener.listen(channel)` in a loop — once per channel — totalling 9 separate calls. In sqlx 0.8, each `listen()` call sends a `LISTEN` command and reads a `ReadyForQuery` response. The repeated calls left the connection in an unexpected state where subsequent `recv()` calls would never fire, even though the PostgreSQL backend showed the connection as actively `LISTEN`-ing.
|
||||
|
||||
**Fix**: Replaced the loop with a single `listener.listen_all(NOTIFICATION_CHANNELS.iter().copied()).await` call, which issues all 9 LISTEN commands in one round-trip. Extracted a `create_listener()` helper so the same single-call pattern is used on reconnect.
|
||||
|
||||
```crates/notifier/src/postgres_listener.rs#L93-135
|
||||
async fn create_listener(&self) -> Result<PgListener> {
|
||||
let mut listener = PgListener::connect(&self.database_url)
|
||||
.await
|
||||
.context("Failed to connect PostgreSQL listener")?;
|
||||
|
||||
// Use listen_all for a single round-trip instead of N separate commands
|
||||
listener
|
||||
.listen_all(NOTIFICATION_CHANNELS.iter().copied())
|
||||
.await
|
||||
.context("Failed to LISTEN on notification channels")?;
|
||||
|
||||
Ok(listener)
|
||||
}
|
||||
```
|
||||
|
||||
Also added:
|
||||
- A 60-second heartbeat log (`INFO: PostgreSQL listener heartbeat`) so it's easy to confirm the task is alive during idle periods
|
||||
- `tokio::time::timeout` wrapper on `recv()` so the heartbeat fires even when no notifications arrive
|
||||
- Improved reconnect logging
|
||||
|
||||
### Bug 2: Notifications serialized without the `"type"` field (Notifier → CLI)
|
||||
|
||||
**File**: `crates/notifier/src/websocket_server.rs`
|
||||
|
||||
**Symptom**: Even after fixing Bug 1, the CLI's WebSocket loop received messages but `serde_json::from_str::<ServerMsg>(&txt)` always failed with `missing field 'type'`, silently falling through the `Err(_)` catch-all arm.
|
||||
|
||||
**Root cause**: The outgoing notification task serialized the raw `Notification` struct directly:
|
||||
```rust
|
||||
match serde_json::to_string(¬ification) { ... }
|
||||
```
|
||||
|
||||
The `Notification` struct has no `type` field. The CLI's `ServerMsg` enum uses `#[serde(tag = "type")]`, so it expects `{"type":"notification",...}`. The bare struct produces `{"notification_type":"...","entity_type":"...",...}` — no `"type"` key.
|
||||
|
||||
**Fix**: Wrap the notification in the `ClientMessage` tagged enum before serializing:
|
||||
```rust
|
||||
let envelope = ClientMessage::Notification(notification);
|
||||
match serde_json::to_string(&envelope) { ... }
|
||||
```
|
||||
|
||||
This produces the correct `{"type":"notification","notification_type":"...","entity_type":"...","entity_id":...,"payload":{...}}` format.
|
||||
|
||||
### Bug 3: Polling fallback used exhausted deadline (CLI)
|
||||
|
||||
**File**: `crates/cli/src/wait.rs`
|
||||
|
||||
**Symptom**: When `--wait` fell back to polling (e.g. when WS notifications weren't delivered), the polling would immediately time out even though the execution had long since completed.
|
||||
|
||||
**Root cause**: Both the WebSocket path and the polling fallback shared a single `deadline = Instant::now() + timeout_secs`. The WS path ran until the deadline, leaving 0 time for polling.
|
||||
|
||||
**Fix**: Reserve a minimum polling budget (`MIN_POLL_BUDGET = 10s`) so the WS path exits early enough to leave polling headroom:
|
||||
```rust
|
||||
const MIN_POLL_BUDGET: Duration = Duration::from_secs(10);
|
||||
let ws_deadline = if overall_deadline > Instant::now() + MIN_POLL_BUDGET {
|
||||
overall_deadline - MIN_POLL_BUDGET
|
||||
} else {
|
||||
overall_deadline // very short timeout — skip WS, go straight to polling
|
||||
};
|
||||
```
|
||||
|
||||
Polling always uses `overall_deadline` directly (the full user-specified timeout), so at minimum `MIN_POLL_BUDGET` of polling time is guaranteed.
|
||||
|
||||
### Additional CLI improvement: poll-first in polling loop
|
||||
|
||||
The polling fallback now checks the execution status **immediately** on entry (before sleeping) rather than sleeping first. This catches the common case where the execution already completed while the WS path was running.
|
||||
|
||||
Also improved error handling in the poll loop: REST failures are logged and retried rather than propagating as fatal errors.
|
||||
|
||||
## End-to-End Verification
|
||||
|
||||
```
|
||||
$ attune --profile docker action execute core.echo --param message="Hello!" --wait
|
||||
ℹ Executing action: core.echo
|
||||
ℹ Waiting for execution 51 to complete...
|
||||
[notifier] connected to ws://localhost:8081/ws
|
||||
[notifier] session id client_2
|
||||
[notifier] subscribed to entity:execution:51
|
||||
[notifier] execution_status_changed for execution 51 — status=Some("scheduled")
|
||||
[notifier] execution_status_changed for execution 51 — status=Some("running")
|
||||
[notifier] execution_status_changed for execution 51 — status=Some("completed")
|
||||
✓ Execution 51 completed
|
||||
```
|
||||
|
||||
Three consecutive runs all returned via WebSocket within milliseconds, no polling fallback triggered.
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `crates/notifier/src/postgres_listener.rs` | Replace sequential `listen()` loop with `listen_all()`; add `create_listener()` helper; add heartbeat logging with timeout-wrapped recv |
|
||||
| `crates/notifier/src/websocket_server.rs` | Wrap `Notification` in `ClientMessage::Notification(...)` before serializing for outgoing WS messages |
|
||||
| `crates/notifier/src/service.rs` | Handle `RecvError::Lagged` and `RecvError::Closed` in broadcaster; add `debug` import |
|
||||
| `crates/notifier/src/subscriber_manager.rs` | Scale broadcast result logging back to `debug` level |
|
||||
| `crates/cli/src/wait.rs` | Fix shared-deadline bug with `MIN_POLL_BUDGET`; poll immediately on entry; improve error handling and verbose logging |
|
||||
| `AGENTS.md` | Document notifier WebSocket protocol and the `listen_all` requirement |
|
||||
|
||||
## Key Protocol Facts (for future reference)
|
||||
|
||||
**Notifier WebSocket — server → client message format**:
|
||||
```json
|
||||
{"type":"notification","notification_type":"execution_status_changed","entity_type":"execution","entity_id":42,"user_id":null,"payload":{...execution row...},"timestamp":"..."}
|
||||
```
|
||||
|
||||
**Notifier WebSocket — client → server subscribe format**:
|
||||
```json
|
||||
{"type":"subscribe","filter":"entity:execution:42"}
|
||||
```
|
||||
|
||||
Filter formats supported: `all`, `entity_type:<type>`, `entity:<type>:<id>`, `user:<id>`, `notification_type:<type>`
|
||||
|
||||
**Critical rule**: Always use `PgListener::listen_all()` for subscribing to multiple PostgreSQL channels. Individual `listen()` calls in a loop leave the listener in a broken state in sqlx 0.8.
|
||||
Reference in New Issue
Block a user