migration reorg basically done
This commit is contained in:
359
docs/architecture/queue-ownership.md
Normal file
359
docs/architecture/queue-ownership.md
Normal file
@@ -0,0 +1,359 @@
|
||||
# RabbitMQ Queue Ownership Architecture
|
||||
|
||||
**Last Updated:** 2026-02-05
|
||||
**Status:** Implemented
|
||||
|
||||
## Overview
|
||||
|
||||
Attune uses a **service-specific infrastructure setup** pattern where each service is responsible for declaring only the queues it consumes. This provides clear ownership, reduces redundancy, and makes the system architecture more maintainable.
|
||||
|
||||
## Principle
|
||||
|
||||
**Each service declares the queues it consumes.**
|
||||
|
||||
This follows the principle that the consumer owns the queue declaration, ensuring that:
|
||||
- Queue configuration is co-located with the service that uses it
|
||||
- Services can start in any order (all operations are idempotent)
|
||||
- Ownership is clear from the codebase structure
|
||||
- Changes to queue configuration are localized to the consuming service
|
||||
|
||||
## Infrastructure Layers
|
||||
|
||||
### Common Infrastructure (Shared by All Services)
|
||||
|
||||
**Declared by:** Any service on startup (first-to-start wins, idempotent)
|
||||
**Responsibility:** Ensures basic messaging infrastructure exists
|
||||
|
||||
**Components:**
|
||||
- **Exchanges:**
|
||||
- `attune.events` (topic) - Event routing
|
||||
- `attune.executions` (topic) - Execution lifecycle routing
|
||||
- `attune.notifications` (fanout) - Real-time notifications
|
||||
- **Dead Letter Exchange (DLX):**
|
||||
- `attune.dlx` (direct) - Failed message handling
|
||||
- `attune.dlx.queue` - Dead letter queue bound to DLX
|
||||
|
||||
**Setup Method:** `Connection::setup_common_infrastructure()`
|
||||
|
||||
### Service-Specific Infrastructure
|
||||
|
||||
Each service declares only the queues it consumes:
|
||||
|
||||
## Service Responsibilities
|
||||
|
||||
### Executor Service
|
||||
|
||||
**Role:** Orchestrates execution lifecycle, enforces rules, manages inquiries
|
||||
|
||||
**Queues Owned:**
|
||||
- `attune.enforcements.queue`
|
||||
- Routing: `enforcement.#`
|
||||
- Purpose: Rule enforcement requests
|
||||
- `attune.execution.requests.queue`
|
||||
- Routing: `execution.requested`
|
||||
- Purpose: New execution requests
|
||||
- `attune.execution.status.queue`
|
||||
- Routing: `execution.status.changed`
|
||||
- Purpose: Execution status updates from workers
|
||||
- `attune.execution.completed.queue`
|
||||
- Routing: `execution.completed`
|
||||
- Purpose: Completed execution results
|
||||
- `attune.inquiry.responses.queue`
|
||||
- Routing: `inquiry.responded`
|
||||
- Purpose: Human-in-the-loop responses
|
||||
|
||||
**Setup Method:** `Connection::setup_executor_infrastructure()`
|
||||
|
||||
**Code Location:** `crates/executor/src/service.rs`
|
||||
|
||||
### Worker Service
|
||||
|
||||
**Role:** Execute actions in various runtimes (shell, Python, Node.js, containers)
|
||||
|
||||
**Queues Owned:**
|
||||
- `worker.{id}.executions` (per worker instance)
|
||||
- Routing: `execution.dispatch.worker.{id}`
|
||||
- Purpose: Execution tasks dispatched to this specific worker
|
||||
- Properties: Durable, auto-delete on disconnect
|
||||
|
||||
**Setup Method:** `Connection::setup_worker_infrastructure(worker_id, config)`
|
||||
|
||||
**Code Location:** `crates/worker/src/service.rs`
|
||||
|
||||
**Notes:**
|
||||
- Each worker instance gets its own queue
|
||||
- Worker ID is assigned during registration
|
||||
- Queue is created after successful registration
|
||||
- Multiple workers can exist for load distribution
|
||||
|
||||
### Sensor Service
|
||||
|
||||
**Role:** Monitor for events and generate trigger instances
|
||||
|
||||
**Queues Owned:**
|
||||
- `attune.events.queue`
|
||||
- Routing: `#` (all events)
|
||||
- Purpose: Events generated by sensors and triggers
|
||||
|
||||
**Setup Method:** `Connection::setup_sensor_infrastructure()`
|
||||
|
||||
**Code Location:** `crates/sensor/src/service.rs`
|
||||
|
||||
### Notifier Service
|
||||
|
||||
**Role:** Real-time notifications via WebSockets
|
||||
|
||||
**Queues Owned:**
|
||||
- `attune.notifications.queue`
|
||||
- Routing: `` (fanout, no routing key)
|
||||
- Purpose: System notifications for WebSocket broadcasting
|
||||
|
||||
**Setup Method:** `Connection::setup_notifier_infrastructure()`
|
||||
|
||||
**Code Location:** `crates/notifier/src/service.rs`
|
||||
|
||||
**Notes:**
|
||||
- Uses fanout exchange (broadcasts to all consumers)
|
||||
- Also uses PostgreSQL LISTEN/NOTIFY for database events
|
||||
|
||||
### API Service
|
||||
|
||||
**Role:** HTTP gateway for client interactions
|
||||
|
||||
**Queues Owned:** None (API only publishes, doesn't consume)
|
||||
|
||||
**Setup Method:** `Connection::setup_common_infrastructure()` only
|
||||
|
||||
**Code Location:** `crates/api/src/main.rs`
|
||||
|
||||
**Notes:**
|
||||
- Only needs exchanges to publish messages
|
||||
- Does not consume from any queues
|
||||
- Publishes to various exchanges (events, executions, notifications)
|
||||
|
||||
## Queue Configuration
|
||||
|
||||
All queues are configured with:
|
||||
- **Durable:** `true` (survives broker restarts)
|
||||
- **Exclusive:** `false` (accessible by multiple connections)
|
||||
- **Auto-delete:** `false` (persist even without consumers)
|
||||
- **Dead Letter Exchange:** `attune.dlx` (enabled by default)
|
||||
|
||||
Exception:
|
||||
- Worker-specific queues may have different settings based on worker lifecycle
|
||||
|
||||
## Message Flow Examples
|
||||
|
||||
### Rule Enforcement Flow
|
||||
```
|
||||
Event Created
|
||||
→ `attune.events` exchange
|
||||
→ `attune.events.queue` (consumed by Executor)
|
||||
→ Rule evaluation
|
||||
→ `enforcement.created` published to `attune.executions`
|
||||
→ `attune.enforcements.queue` (consumed by Executor)
|
||||
```
|
||||
|
||||
### Execution Flow
|
||||
```
|
||||
Execution Requested (from API)
|
||||
→ `attune.executions` exchange (routing: execution.requested)
|
||||
→ `attune.execution.requests.queue` (consumed by Executor/Scheduler)
|
||||
→ Executor dispatches to worker
|
||||
→ `execution.dispatch.worker.{id}` to `attune.executions`
|
||||
→ `worker.{id}.executions` (consumed by Worker)
|
||||
→ Worker executes action
|
||||
→ `execution.completed` to `attune.executions`
|
||||
→ `attune.execution.completed.queue` (consumed by Executor)
|
||||
```
|
||||
|
||||
### Notification Flow
|
||||
```
|
||||
System Event Occurs
|
||||
→ `attune.notifications` exchange (fanout)
|
||||
→ `attune.notifications.queue` (consumed by Notifier)
|
||||
→ WebSocket broadcast to connected clients
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Setup Call Order
|
||||
|
||||
Each service follows this pattern:
|
||||
|
||||
```rust
|
||||
// 1. Connect to RabbitMQ
|
||||
let mq_connection = Connection::connect(mq_url).await?;
|
||||
|
||||
// 2. Setup common infrastructure (exchanges, DLX)
|
||||
mq_connection.setup_common_infrastructure(&config).await?;
|
||||
|
||||
// 3. Setup service-specific queues
|
||||
mq_connection.setup_SERVICE_infrastructure(&config).await?;
|
||||
// (where SERVICE is executor, worker, sensor, or notifier)
|
||||
```
|
||||
|
||||
### Idempotency
|
||||
|
||||
All setup operations are idempotent:
|
||||
- Declaring an existing exchange/queue with the same settings succeeds
|
||||
- Multiple services can call `setup_common_infrastructure()` safely
|
||||
- Services can start in any order
|
||||
|
||||
### Error Handling
|
||||
|
||||
Setup failures are logged but not fatal:
|
||||
- Queues may already exist from previous runs
|
||||
- Another service may have created the infrastructure
|
||||
- Only actual consumption failures should stop the service
|
||||
|
||||
## Startup Sequence
|
||||
|
||||
**Typical Docker Compose Startup:**
|
||||
|
||||
1. **PostgreSQL** - Starts first (dependency)
|
||||
2. **RabbitMQ** - Starts first (dependency)
|
||||
3. **Migrations** - Runs database migrations
|
||||
4. **Services start in parallel:**
|
||||
- **API** - Creates common infrastructure
|
||||
- **Executor** - Creates common + executor infrastructure
|
||||
- **Workers** - Each creates common + worker-specific queue
|
||||
- **Sensor** - Creates common + sensor infrastructure
|
||||
- **Notifier** - Creates common + notifier infrastructure
|
||||
|
||||
The first service to start creates the common infrastructure. All subsequent services find it already exists and proceed.
|
||||
|
||||
## Benefits
|
||||
|
||||
✅ **Clear Ownership** - Code inspection shows which service owns which queue
|
||||
✅ **Reduced Redundancy** - Each queue declared exactly once (per service type)
|
||||
✅ **Better Debugging** - Queue issues isolated to specific services
|
||||
✅ **Improved Maintainability** - Changes to queue config localized
|
||||
✅ **Self-Documenting** - Code structure reflects system architecture
|
||||
✅ **Order Independence** - Services can start in any order
|
||||
✅ **Monitoring** - Can track which service created infrastructure
|
||||
|
||||
## Monitoring and Verification
|
||||
|
||||
### RabbitMQ Management UI
|
||||
|
||||
Access at `http://localhost:15672` (credentials: `guest`/`guest`)
|
||||
|
||||
**Expected Queues:**
|
||||
- `attune.dlx.queue` - Dead letter queue
|
||||
- `attune.events.queue` - Events (Sensor)
|
||||
- `attune.enforcements.queue` - Enforcements (Executor)
|
||||
- `attune.execution.requests.queue` - Execution requests (Executor)
|
||||
- `attune.execution.status.queue` - Status updates (Executor)
|
||||
- `attune.execution.completed.queue` - Completions (Executor)
|
||||
- `attune.inquiry.responses.queue` - Inquiry responses (Executor)
|
||||
- `attune.notifications.queue` - Notifications (Notifier)
|
||||
- `worker.{id}.executions` - Worker queues (one per worker)
|
||||
|
||||
### Verification Commands
|
||||
|
||||
```bash
|
||||
# Check which queues exist
|
||||
docker compose exec rabbitmq rabbitmqctl list_queues name messages
|
||||
|
||||
# Check queue bindings
|
||||
docker compose exec rabbitmq rabbitmqctl list_bindings
|
||||
|
||||
# Check who's consuming from queues
|
||||
docker compose exec rabbitmq rabbitmqctl list_consumers
|
||||
```
|
||||
|
||||
### Log Verification
|
||||
|
||||
Each service logs its infrastructure setup:
|
||||
|
||||
```bash
|
||||
# API (common only)
|
||||
docker compose logs api | grep "infrastructure setup"
|
||||
|
||||
# Executor (common + executor)
|
||||
docker compose logs executor | grep "infrastructure setup"
|
||||
|
||||
# Workers (common + worker-specific)
|
||||
docker compose logs worker-shell | grep "infrastructure setup"
|
||||
|
||||
# Sensor (common + sensor)
|
||||
docker compose logs sensor | grep "infrastructure setup"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Queue Already Exists with Different Settings
|
||||
|
||||
**Error:** `PRECONDITION_FAILED - inequivalent arg 'durable' for queue...`
|
||||
|
||||
**Cause:** Queue exists with different configuration than code expects
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Stop services
|
||||
docker compose down
|
||||
|
||||
# Remove RabbitMQ volume to clear all queues
|
||||
docker volume rm attune_rabbitmq_data
|
||||
|
||||
# Restart services
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Service Can't Connect to RabbitMQ
|
||||
|
||||
**Check:** Is RabbitMQ healthy?
|
||||
```bash
|
||||
docker compose ps rabbitmq
|
||||
```
|
||||
|
||||
**Check:** RabbitMQ logs for errors
|
||||
```bash
|
||||
docker compose logs rabbitmq
|
||||
```
|
||||
|
||||
### Messages Not Being Consumed
|
||||
|
||||
1. **Check queue has consumers:**
|
||||
```bash
|
||||
docker compose exec rabbitmq rabbitmqctl list_consumers
|
||||
```
|
||||
|
||||
2. **Check service is running:**
|
||||
```bash
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
3. **Check service logs for consumer startup:**
|
||||
```bash
|
||||
docker compose logs <service-name> | grep "Consumer started"
|
||||
```
|
||||
|
||||
## Migration from Old Architecture
|
||||
|
||||
**Previous Behavior:** All services called `setup_infrastructure()` which created ALL queues
|
||||
|
||||
**New Behavior:** Each service calls its specific setup method
|
||||
|
||||
**Migration Steps:**
|
||||
1. Update to latest code
|
||||
2. Stop all services: `docker compose down`
|
||||
3. Clear RabbitMQ volume: `docker volume rm attune_rabbitmq_data`
|
||||
4. Start services: `docker compose up -d`
|
||||
|
||||
No data loss occurs as message queues are transient infrastructure.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Queue Architecture](queue-architecture.md) - Overall queue design
|
||||
- [RabbitMQ Queues Quick Reference](../../QUICKREF-rabbitmq-queues.md)
|
||||
- [Executor Service](executor-service.md)
|
||||
- [Worker Service](worker-service.md)
|
||||
- [Sensor Service](sensor-service.md)
|
||||
|
||||
## Change History
|
||||
|
||||
| Date | Change | Author |
|
||||
|------|--------|--------|
|
||||
| 2026-02-05 | Initial implementation of service-specific queue ownership | AI Assistant |
|
||||
Reference in New Issue
Block a user