# Work Summary: Inquiry Queue Separation Fix **Date:** 2026-02-03 **Issues:** - Executor deserialization error: "missing field `inquiry_id`" - Executor deserialization error: "missing field `action_id`" **Status:** ✅ Both Fixed ## Visual Overview ### Before Fix ❌ ``` attune.execution.status.queue ├─ Consumer: CompletionListener (expects ExecutionCompletedPayload) ├─ Consumer: ExecutionManager (expects ExecutionStatusPayload) └─ Consumer: InquiryHandler (expects InquiryRespondedPayload) Incoming Messages: - execution.completed → ExecutionCompletedPayload - execution.status.changed → ExecutionStatusChangedPayload - inquiry.responded → InquiryRespondedPayload Problem: Round-robin distribution causes wrong consumer to receive wrong message type! ``` ### After Fix ✅ ``` attune.execution.completed.queue └─ Consumer: CompletionListener (expects ExecutionCompletedPayload) └─ Message: execution.completed → ExecutionCompletedPayload ✓ attune.execution.status.queue └─ Consumer: ExecutionManager (expects ExecutionStatusPayload) └─ Message: execution.status.changed → ExecutionStatusChangedPayload ✓ attune.inquiry.responses.queue └─ Consumer: InquiryHandler (expects InquiryRespondedPayload) └─ Message: inquiry.responded → InquiryRespondedPayload ✓ Result: Each queue has ONE consumer expecting ONE message type! ``` ## Problem Description The executor service was logging deserialization errors when processing messages from the `execution_status` queue: ``` ERROR ThreadId(13) crates/common/src/mq/consumer.rs:112: Failed to deserialize message: missing field `inquiry_id` at line 1 column 318. Rejecting message. ``` ## Root Cause Analysis The issue was caused by **two different consumers listening to the same RabbitMQ queue** but expecting different message payload types: ### Queue Configuration Issue The `execution_status` queue (`attune.execution.status.queue`) was bound to the `attune.executions` exchange with routing key `"execution.status.changed"`, but it was receiving messages with two different routing keys: 1. **`execution.completed`** → `ExecutionCompletedPayload` (published by Worker service) 2. **`inquiry.responded`** → `InquiryRespondedPayload` (published by API service) ### Competing Consumers Two consumers were configured to read from the same `execution_status` queue: 1. **CompletionListener** (`executor.completion` tag) - Expected: `ExecutionCompletedPayload` - Fields: `execution_id`, `action_id`, `action_ref`, `status`, `result`, `completed_at` 2. **InquiryHandler** (`executor.inquiry` tag) - Expected: `InquiryRespondedPayload` - Fields: `inquiry_id`, `execution_id`, `response`, `responded_by`, `responded_at` ### Message Routing Behavior RabbitMQ distributes messages to consumers on the same queue using **round-robin load balancing**. This meant: - When an `InquiryRespondedPayload` was delivered to `CompletionListener` → **deserialization failed** (missing `inquiry_id`) - When an `ExecutionCompletedPayload` was delivered to `InquiryHandler` → **deserialization failed** (missing `action_id`) The error message specifically mentioned `inquiry_id` because `CompletionListener` tried to deserialize an inquiry response message. ## Solution Implemented ### 1. Created Separate Queue for Inquiry Responses **File:** `attune/crates/common/src/mq/config.rs` Added a new queue configuration: ```rust pub struct QueuesConfig { // ... existing queues ... /// Inquiry responses queue configuration pub inquiry_responses: QueueConfig, } ``` Default configurations: ```rust execution_completed: QueueConfig { name: "attune.execution.completed.queue".to_string(), durable: true, exclusive: false, auto_delete: false, }, inquiry_responses: QueueConfig { name: "attune.inquiry.responses.queue".to_string(), durable: true, exclusive: false, auto_delete: false, } ``` ### 2. Updated Infrastructure Setup **File:** `attune/crates/common/src/mq/connection.rs` Added queue declarations and bindings in `setup_infrastructure()`: ```rust // Declare the new queues with DLX support self.declare_queue_with_dlx(&config.rabbitmq.queues.execution_completed, dlx).await?; self.declare_queue_with_dlx(&config.rabbitmq.queues.inquiry_responses, dlx).await?; // Bind execution_status queue to status changed messages for ExecutionManager self.bind_queue( &config.rabbitmq.queues.execution_status.name, &config.rabbitmq.exchanges.executions.name, "execution.status.changed", ) .await?; // Bind execution_completed queue to completed messages for CompletionListener self.bind_queue( &config.rabbitmq.queues.execution_completed.name, &config.rabbitmq.exchanges.executions.name, "execution.completed", ) .await?; // Bind inquiry_responses queue to inquiry responded messages for InquiryHandler self.bind_queue( &config.rabbitmq.queues.inquiry_responses.name, &config.rabbitmq.exchanges.executions.name, "inquiry.responded", ) .await?; ``` ### 3. Updated Executor Service Configuration **File:** `attune/crates/executor/src/service.rs` Changed `InquiryHandler` and `CompletionListener` to consume from dedicated queues: ```rust // InquiryHandler - Before: let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone(); // InquiryHandler - After: let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.inquiry_responses.name.clone(); // CompletionListener - Before: let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone(); // CompletionListener - After: let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_completed.name.clone(); ``` ## Message Flow After Fix ### Execution Completion Flow ``` Worker → publishes ExecutionCompletedPayload → routing key: "execution.completed" → exchange: "attune.executions" → queue: "attune.execution.completed.queue" → consumer: CompletionListener ✅ Correct payload type received ``` ### Execution Status Change Flow ``` Worker → publishes ExecutionStatusChangedPayload → routing key: "execution.status.changed" → exchange: "attune.executions" → queue: "attune.execution.status.queue" → consumer: ExecutionManager ✅ Correct payload type received ``` ### Inquiry Response Flow ``` API → publishes InquiryRespondedPayload → routing key: "inquiry.responded" → exchange: "attune.executions" → queue: "attune.inquiry.responses.queue" → consumer: InquiryHandler ✅ Correct payload type received ``` ## Benefits 1. **Type Safety**: Each queue receives only one message type, eliminating deserialization errors 2. **Scalability**: Can scale `CompletionListener`, `ExecutionManager`, and `InquiryHandler` independently 3. **Maintainability**: Clear separation of concerns - each queue has a single purpose 4. **Reliability**: No message rejection due to type mismatches 5. **Performance**: No wasted processing from consumers receiving wrong message types ## Queue Separation Summary After both fixes, we now have three dedicated queues for execution-related messages: | Queue | Routing Key | Message Type | Consumer | |-------|-------------|--------------|----------| | `attune.execution.status.queue` | `execution.status.changed` | `ExecutionStatusChangedPayload` | ExecutionManager | | `attune.execution.completed.queue` | `execution.completed` | `ExecutionCompletedPayload` | CompletionListener | | `attune.inquiry.responses.queue` | `inquiry.responded` | `InquiryRespondedPayload` | InquiryHandler | **Result:** Each queue now has exactly one consumer expecting exactly one message type. ✅ ## Testing Recommendations 1. **Restart all services** to recreate the queue infrastructure with new bindings 2. **Verify queue creation** in RabbitMQ management UI: - Check that `attune.inquiry.responses.queue` exists - Check that `attune.execution.completed.queue` exists - Verify bindings on `attune.executions` exchange: - `inquiry.responded` → `attune.inquiry.responses.queue` - `execution.completed` → `attune.execution.completed.queue` - `execution.status.changed` → `attune.execution.status.queue` 3. **Monitor executor logs** for absence of deserialization errors (`inquiry_id` and `action_id`) 4. **Test inquiry workflow**: - Create an action that requests inquiry (`__inquiry` in result) - Respond to inquiry via API - Verify execution resumes correctly 5. **Test execution completion**: - Execute a simple action - Verify completion notification processed without errors ### Files Modified - `attune/crates/common/src/mq/config.rs` - Added `inquiry_responses` and `execution_completed` queues - `attune/crates/common/src/mq/connection.rs` - Added queue declarations and bindings - `attune/crates/executor/src/service.rs` - Updated InquiryHandler and CompletionListener to use new queues ## Migration Notes This is a **breaking change** for existing deployments: 1. Two new queues will be created automatically on service startup: - `attune.inquiry.responses.queue` - `attune.execution.completed.queue` 2. The `execution_status` queue now has **only one binding** (`execution.status.changed`) 3. Existing messages in queues are unaffected 4. No database migrations required 5. **Action Required**: Restart executor service to apply changes ## Related Issues - Original implementation assumed a single queue could handle multiple message types - RabbitMQ round-robin distribution caused non-deterministic deserialization failures - Errors were intermittent because they depended on which consumer received which message - `ExecutionManager` uses local payload struct instead of canonical `ExecutionStatusChangedPayload` (not critical but should be unified in future) ## Lessons Learned 1. **One queue, one message type**: RabbitMQ queues should have a single message schema 2. **One queue, one consumer**: Multiple consumers on the same queue creates competition, not cooperation 3. **Use routing keys effectively**: Topic exchanges with specific routing keys provide better message segregation 4. **Consumer tag awareness**: Consumer tags don't prevent round-robin distribution within the same queue 5. **Type-safe patterns**: Rust's strong typing revealed the issue quickly through deserialization errors 6. **Canonical message types**: Use shared message structs from `attune_common::mq::messages`, not local definitions 7. **Incremental fixes**: Sometimes you discover deeper issues while fixing surface-level problems - fix them all at once 8. **Test thoroughly**: Restart services and monitor logs to catch related issues before they reach production