8.7 KiB
Worker Service Message Queue Infrastructure Fix
Date: 2026-01-16 Status: ✅ Completed
Problem
The worker service was failing to start with the following error:
ERROR ThreadId(22) io_loop: Channel closed channel=2 method=Close {
reply_code: 404,
reply_text: ShortString("NOT_FOUND - no queue 'worker.1.executions' in vhost '/'"),
class_id: 60,
method_id: 20
} error=AMQPError {
kind: Soft(NOTFOUND),
message: ShortString("NOT_FOUND - no queue 'worker.1.executions' in vhost '/'")
}
This occurred because:
- The worker was trying to consume from a dynamically-named queue
worker.{worker_id}.executionsthat didn't exist - The RabbitMQ infrastructure (exchanges, queues, bindings) was not being set up by the worker
- Worker-specific queues need to be created dynamically after worker registration
Root Cause
The worker service was:
- Assuming the RabbitMQ infrastructure already existed
- Attempting to consume from a worker-specific queue without creating it first
- Not declaring or binding the queue before starting the consumer
- Missing infrastructure setup that other services (executor) had
Solution
Updated the worker service to:
1. Set Up Base Infrastructure on Startup
Added automatic infrastructure setup similar to the executor service:
// Setup message queue infrastructure (exchanges, queues, bindings)
let mq_config = MqConfig::default();
match mq_connection.setup_infrastructure(&mq_config).await {
Ok(_) => info!("Message queue infrastructure setup completed"),
Err(e) => {
warn!(
"Failed to setup MQ infrastructure (may already exist): {}",
e
);
}
}
This creates the base infrastructure:
- Exchanges:
attune.events,attune.executions,attune.notifications - Base queues:
attune.events.queue,attune.executions.queue,attune.notifications.queue - Dead letter exchange:
attune.dlx
2. Create Worker-Specific Queue Dynamically
Added dynamic queue creation in start_execution_consumer():
// Create the worker-specific queue
let worker_queue = QueueConfig {
name: queue_name.clone(),
durable: false, // Worker queues are temporary
exclusive: false,
auto_delete: true, // Delete when worker disconnects
};
self.mq_connection
.declare_queue(&worker_queue)
.await?;
Key Design Decisions:
durable: false- Worker queues don't need to survive broker restartsauto_delete: true- Queues are automatically cleaned up when worker disconnects- Queue name format:
worker.{worker_id}.executions
3. Bind Queue to Exchange with Worker-Specific Routing Key
Added proper binding so the scheduler can route executions to specific workers:
// Bind the queue to the executions exchange with worker-specific routing key
self.mq_connection
.bind_queue(
&queue_name,
"attune.executions",
&format!("worker.{}", worker_id),
)
.await?;
This allows the executor/scheduler to target specific workers by publishing messages with routing key worker.{worker_id}.
4. Added Proper Imports
- Added
config::MessageQueueConfig as MqConfigimport - Added
QueueConfigimport for queue configuration - Added
warnimport fromtracingfor setup warnings
Implementation Details
Files Modified
crates/worker/src/service.rs:
- Added
MessageQueueConfigimport fromattune_common::mq::config - Added
QueueConfigimport - Added infrastructure setup in
new()method - Enhanced
start_execution_consumer()to:- Create worker-specific queue dynamically
- Bind queue to exchange with worker-specific routing key
- Add detailed logging for each step
- All operations happen after worker registration (when worker_id is known)
Infrastructure Created
Base Infrastructure (shared with executor)
-
Exchanges:
attune.events(Topic)attune.executions(Direct)attune.notifications(Fanout)
-
Base Queues:
attune.events.queueattune.executions.queueattune.notifications.queue
-
Dead Letter Exchange:
attune.dlx
Worker-Specific Infrastructure (per worker)
-
Worker Queue:
worker.{worker_id}.executions- Non-durable (temporary)
- Auto-delete when worker disconnects
- Bound to
attune.executionsexchange
-
Routing Key:
worker.{worker_id}- Allows targeted message delivery to specific workers
- Scheduler can route executions based on worker capabilities
Testing
Verified the fix works correctly:
-
✅ Base Infrastructure Setup: Common exchanges and queues created
INFO Setting up RabbitMQ infrastructure INFO Queue 'attune.events.queue' declared with dead letter exchange 'attune.dlx' INFO Queue 'attune.executions.queue' declared with dead letter exchange 'attune.dlx' INFO Queue 'attune.notifications.queue' declared with dead letter exchange 'attune.dlx' INFO RabbitMQ infrastructure setup complete -
✅ Worker Registration: Worker successfully registers with database
INFO Worker registered with ID: 1 -
✅ Dynamic Queue Creation: Worker-specific queue is created
INFO Creating worker-specific queue: worker.1.executions INFO Worker queue created: worker.1.executions -
✅ Queue Binding: Queue bound to exchange with proper routing key
INFO Queue 'worker.1.executions' bound to exchange 'attune.executions' with routing key 'worker.1' INFO Queue bound to exchange with routing key 'worker.1' -
✅ Consumer Start: Worker successfully starts consuming messages
INFO Consumer started for queue: worker.1.executions INFO Message queue consumer initialized -
✅ Service Ready: Worker service fully operational
INFO Worker Service started successfully
Impact
- Automated Setup: Workers create their own queues automatically on startup
- Dynamic Infrastructure: Queue creation happens after worker registration
- Clean Shutdown: Auto-delete queues prevent orphaned queues from offline workers
- Targeted Routing: Scheduler can route executions to specific workers
- Idempotent: Base infrastructure setup is safe to run multiple times
- Better Logging: Clear visibility into infrastructure creation steps
Architecture Notes
Worker Queue Lifecycle
- Worker starts and connects to MQ
- Worker sets up base infrastructure (idempotent)
- Worker registers with database, receives worker_id
- Worker creates queue:
worker.{worker_id}.executions - Worker binds queue to
attune.executionsexchange - Worker starts consuming with tag
worker-{worker_id} - On shutdown/disconnect, queue is automatically deleted
Message Routing Flow
Scheduler/Executor
↓ (publishes to)
attune.executions exchange
↓ (routes by key: worker.{id})
worker.{id}.executions queue
↓ (consumed by)
Worker Service
Queue Configuration Rationale
- Non-durable: Worker queues are ephemeral, tied to worker lifetime
- Auto-delete: Prevents accumulation of queues from dead workers
- Non-exclusive: Allows monitoring/management tools to inspect queues
- Named by worker_id: Enables targeted execution assignment
Comparison with Other Services
Executor Service
- Uses persistent queues:
attune.executions.queue - Multiple competing consumers on same queue
- Infrastructure setup in service initialization
Worker Service
- Uses ephemeral per-worker queues:
worker.{id}.executions - One consumer per worker queue
- Queue creation after worker registration
- Auto-cleanup when worker disconnects
Sensor Service
- Uses
MessageQueuewrapper (publish-only) - No consumer setup needed
- May need similar updates if consuming in future
Next Steps
- Add health check that verifies worker queue exists and is bound
- Consider adding queue TTL for additional cleanup safety
- Add metrics for worker queue depth and consumer performance
- Document worker routing patterns in architecture docs
- Consider adding worker queue monitoring/alerting
- Test worker failover scenarios (queue cleanup)
Related Issues
This fix follows the same pattern as:
- 2026-01-15: Executor service MQ infrastructure fix
- 2026-01-15: Configuration URL parsing fix
All services now properly set up their required MQ infrastructure on startup.
Notes
- Worker queues are intentionally temporary and auto-delete
- Each worker gets its own queue for targeted execution delivery
- The routing key pattern
worker.{id}allows flexible execution scheduling - Infrastructure setup is idempotent - safe for multiple workers to run simultaneously
- Queue creation happens after registration to ensure worker_id is available