308 lines
11 KiB
Markdown
308 lines
11 KiB
Markdown
# Message Queue Infrastructure Fix
|
|
|
|
**Date:** 2026-01-15
|
|
**Status:** ✅ Completed
|
|
|
|
## Problem
|
|
|
|
The executor service was failing to start with the following error:
|
|
|
|
```
|
|
ERROR ThreadId(22) io_loop: Channel closed channel=2 method=Close {
|
|
reply_code: 404,
|
|
reply_text: ShortString("NOT_FOUND - no queue 'executor.main' in vhost '/'"),
|
|
class_id: 60,
|
|
method_id: 20
|
|
} error=AMQPError {
|
|
kind: Soft(NOTFOUND),
|
|
message: ShortString("NOT_FOUND - no queue 'executor.main' in vhost '/'")
|
|
}
|
|
```
|
|
|
|
This occurred because:
|
|
1. The executor was trying to consume from a hardcoded queue name `"executor.main"` that didn't exist
|
|
2. The RabbitMQ infrastructure (exchanges, queues, bindings) was not being set up automatically
|
|
3. Services expected queues to exist before they could start consuming messages
|
|
|
|
After fixing the queue issue, a second error appeared:
|
|
|
|
```
|
|
ERROR Connection closed channel=0 method=Close {
|
|
reply_code: 530,
|
|
reply_text: ShortString("NOT_ALLOWED - attempt to reuse consumer tag 'executor'"),
|
|
class_id: 60,
|
|
method_id: 20
|
|
}
|
|
```
|
|
|
|
This occurred because all three executor components (enforcement processor, scheduler, execution manager) were attempting to share the same Consumer instance with the same consumer tag.
|
|
|
|
## Root Cause
|
|
|
|
The executor service had two issues:
|
|
|
|
**Issue 1: Missing Queue**
|
|
- Using a hardcoded queue name (`"executor.main"`) instead of the configured queue name
|
|
- Not setting up the RabbitMQ infrastructure on startup
|
|
- Assuming queues would be created externally before service startup
|
|
|
|
**Issue 2: Shared Consumer Tag**
|
|
- All three executor components were sharing a single Consumer instance
|
|
- RabbitMQ requires unique consumer tags per connection/channel
|
|
- Multiple consumers cannot use the same tag on the same connection
|
|
|
|
## Solution
|
|
|
|
Updated the executor service to:
|
|
|
|
### 1. Set Up Infrastructure on Startup
|
|
Added automatic infrastructure setup that creates:
|
|
- **Exchanges**: `attune.events`, `attune.executions`, `attune.notifications`
|
|
- **Queues**: `attune.events.queue`, `attune.executions.queue`, `attune.notifications.queue`
|
|
- **Dead Letter Exchange**: `attune.dlx` for failed message handling
|
|
- **Bindings**: Proper routing between exchanges and queues
|
|
|
|
```rust
|
|
// Setup message queue infrastructure (exchanges, queues, bindings)
|
|
let mq_config = MqConfig::default();
|
|
match mq_connection.setup_infrastructure(&mq_config).await {
|
|
Ok(_) => info!("Message queue infrastructure setup completed"),
|
|
Err(e) => {
|
|
warn!("Failed to setup MQ infrastructure (may already exist): {}", e);
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Create Individual Consumers with Unique Tags
|
|
Changed from sharing a single consumer to creating separate consumers for each component:
|
|
|
|
**Before:**
|
|
```rust
|
|
// Single consumer shared by all components
|
|
let consumer = Consumer::new(
|
|
&mq_connection,
|
|
attune_common::mq::ConsumerConfig {
|
|
queue: "executor.main".to_string(),
|
|
tag: "executor".to_string(), // Same tag for all!
|
|
// ...
|
|
},
|
|
).await?;
|
|
|
|
// All components share the same consumer
|
|
let enforcement_processor = EnforcementProcessor::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
consumer.clone(), // Shared
|
|
);
|
|
let scheduler = ExecutionScheduler::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
consumer.clone(), // Shared
|
|
);
|
|
let execution_manager = ExecutionManager::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
consumer.clone(), // Shared
|
|
);
|
|
```
|
|
|
|
**After:**
|
|
```rust
|
|
// Each component creates its own consumer with unique tag
|
|
let enforcement_consumer = Consumer::new(
|
|
&mq_connection,
|
|
attune_common::mq::ConsumerConfig {
|
|
queue: queue_name.clone(),
|
|
tag: "executor.enforcement".to_string(), // Unique tag
|
|
// ...
|
|
},
|
|
).await?;
|
|
let enforcement_processor = EnforcementProcessor::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
Arc::new(enforcement_consumer),
|
|
);
|
|
|
|
let scheduler_consumer = Consumer::new(
|
|
&mq_connection,
|
|
attune_common::mq::ConsumerConfig {
|
|
queue: queue_name.clone(),
|
|
tag: "executor.scheduler".to_string(), // Unique tag
|
|
// ...
|
|
},
|
|
).await?;
|
|
let scheduler = ExecutionScheduler::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
Arc::new(scheduler_consumer),
|
|
);
|
|
|
|
let manager_consumer = Consumer::new(
|
|
&mq_connection,
|
|
attune_common::mq::ConsumerConfig {
|
|
queue: queue_name.clone(),
|
|
tag: "executor.manager".to_string(), // Unique tag
|
|
// ...
|
|
},
|
|
).await?;
|
|
let execution_manager = ExecutionManager::new(
|
|
pool.clone(),
|
|
publisher.clone(),
|
|
Arc::new(manager_consumer),
|
|
);
|
|
```
|
|
|
|
This implements a **competing consumers pattern** where multiple consumers process messages from the same queue, with RabbitMQ distributing messages among them.
|
|
|
|
### 3. Use Proper Queue Configuration
|
|
Changed from hardcoded queue name to using the configured queue:
|
|
|
|
**Before:**
|
|
```rust
|
|
queue: "executor.main".to_string(),
|
|
```
|
|
|
|
**After:**
|
|
```rust
|
|
let queue_name = mq_config.rabbitmq.queues.executions.name.clone();
|
|
queue: queue_name.clone(),
|
|
```
|
|
|
|
### 4. Added Proper Imports and Error Handling
|
|
- Added `MessageQueueConfig as MqConfig` import from `attune_common::mq`
|
|
- Added `warn` import from `tracing` for logging setup warnings
|
|
- Infrastructure setup errors are logged but don't fail startup (idempotent)
|
|
|
|
## Implementation Details
|
|
|
|
### Files Modified
|
|
|
|
**`crates/executor/src/service.rs`:**
|
|
- Added `MqConfig` import from `attune_common::mq`
|
|
- Added infrastructure setup in `ExecutorService::new()`
|
|
- Changed hardcoded queue name to use config
|
|
- Removed shared `consumer` field from `ExecutorServiceInner`
|
|
- Added `queue_name` field to store configured queue name
|
|
- Updated `start()` method to create individual consumers for each component
|
|
- Each consumer has a unique tag: `executor.enforcement`, `executor.scheduler`, `executor.manager`
|
|
- Removed `consumer()` accessor method (no longer needed)
|
|
- Added informative logging for queue initialization
|
|
|
|
### Infrastructure Created
|
|
|
|
The `setup_infrastructure()` call creates:
|
|
|
|
1. **Dead Letter Exchange**: `attune.dlx` (Fanout)
|
|
- Handles failed messages that exceed retry limits
|
|
|
|
2. **Exchanges**:
|
|
- `attune.events` (Topic) - for sensor-generated events
|
|
- `attune.executions` (Direct) - for execution messages
|
|
- `attune.notifications` (Fanout) - for notification broadcasts
|
|
|
|
3. **Queues** (all durable, with DLX):
|
|
- `attune.events.queue` - bound to events exchange with routing key `#`
|
|
- `attune.executions.queue` - bound to executions exchange with routing key `execution`
|
|
- `attune.notifications.queue` - bound to notifications exchange
|
|
|
|
4. **Dead Letter Queues**:
|
|
- Automatically created for each main queue
|
|
- Messages TTL: 24 hours before expiration
|
|
|
|
## Testing
|
|
|
|
Verified the fix works correctly:
|
|
|
|
1. ✅ **Infrastructure Setup**: Queues and exchanges are created on first run
|
|
```
|
|
INFO Setting up RabbitMQ infrastructure
|
|
INFO Queue 'attune.events.queue' declared with dead letter exchange 'attune.dlx'
|
|
INFO Queue 'attune.executions.queue' declared with dead letter exchange 'attune.dlx'
|
|
INFO Queue 'attune.notifications.queue' declared with dead letter exchange 'attune.dlx'
|
|
```
|
|
|
|
2. ✅ **Idempotent Setup**: Subsequent runs don't fail if infrastructure exists
|
|
```
|
|
WARN Failed to setup MQ infrastructure (may already exist): ...
|
|
```
|
|
|
|
3. ✅ **Consumer Initialization**: Executor successfully connects to the proper queue
|
|
```
|
|
INFO Message queue consumer initialized on queue: attune.executions.queue
|
|
```
|
|
|
|
4. ✅ **Service Startup**: All executor components start without errors
|
|
```
|
|
INFO Starting enforcement processor
|
|
INFO Starting execution scheduler
|
|
INFO Starting execution manager
|
|
INFO Consumer started for queue 'attune.executions.queue' with tag 'executor.enforcement'
|
|
INFO Consumer started for queue 'attune.executions.queue' with tag 'executor.scheduler'
|
|
INFO Consumer started for queue 'attune.executions.queue' with tag 'executor.manager'
|
|
```
|
|
|
|
5. ✅ **Competing Consumers**: Multiple consumers successfully process from same queue
|
|
- Each consumer has unique tag
|
|
- RabbitMQ distributes messages across consumers
|
|
- No consumer tag conflicts
|
|
|
|
## Impact
|
|
|
|
- **Automated Setup**: No manual RabbitMQ queue creation needed
|
|
- **Configuration-Driven**: Queue names come from config, not hardcoded
|
|
- **Idempotent**: Services can start/restart without manual intervention
|
|
- **Better Logging**: Clear visibility into which queues and consumer tags are being used
|
|
- **Production Ready**: Dead letter queues for message failure handling
|
|
- **Scalable**: Competing consumers pattern allows for parallel message processing
|
|
- **Unique Identification**: Each consumer component has its own distinct tag for monitoring
|
|
|
|
## Related Components
|
|
|
|
### Other Services
|
|
The sensor service uses a different abstraction (`MessageQueue` wrapper) that publishes to exchanges but doesn't consume. It may need similar infrastructure setup in the future if it starts consuming messages.
|
|
|
|
### Worker Service
|
|
The worker service will likely need similar changes when implemented, as it will consume from worker-specific queues.
|
|
|
|
## Configuration
|
|
|
|
The infrastructure uses default configuration from `attune_common::mq::MessageQueueConfig`:
|
|
- Queue names: `attune.{events,executions,notifications}.queue`
|
|
- Exchange names: `attune.{events,executions,notifications}`
|
|
- Dead letter exchange: `attune.dlx`
|
|
- DLQ TTL: 24 hours
|
|
|
|
These can be customized by modifying the `MessageQueueConfig::default()` implementation.
|
|
|
|
## Architecture Notes
|
|
|
|
### Competing Consumers Pattern
|
|
The executor now uses a **competing consumers pattern** where multiple consumers read from the same queue:
|
|
- **Benefits**: Load balancing, parallel processing, better resource utilization
|
|
- **RabbitMQ Behavior**: Messages are distributed round-robin among consumers
|
|
- **Unique Tags**: Each consumer must have a unique tag (enforced by RabbitMQ)
|
|
- **Consumer Count**: 3 consumers (enforcement, scheduler, manager) per executor instance
|
|
|
|
### Message Distribution
|
|
With the current setup:
|
|
- All three consumers read from `attune.executions.queue`
|
|
- RabbitMQ distributes incoming messages among the three consumers
|
|
- Each message is delivered to exactly one consumer
|
|
- If message processing fails, it's requeued for another consumer to retry
|
|
|
|
## Next Steps
|
|
|
|
- [ ] Consider adding infrastructure setup to sensor service (if needed)
|
|
- [ ] Add infrastructure setup to worker service (when implemented)
|
|
- [ ] Document RabbitMQ topology in architecture documentation
|
|
- [ ] Consider making infrastructure setup a separate CLI tool/command
|
|
- [ ] Add health checks that verify queue existence
|
|
- [ ] Consider using separate queues for different message types instead of competing consumers
|
|
- [ ] Add monitoring/metrics for consumer performance and message distribution
|
|
|
|
## Notes
|
|
|
|
- Infrastructure setup is designed to be idempotent - running it multiple times is safe
|
|
- If setup fails (e.g., due to permissions), the service logs a warning but continues
|
|
- This allows services to work in environments where infrastructure is managed externally
|
|
- The setup creates durable queues that survive broker restarts |