re-uploading work

2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions
--- a/work-summary/2026-02-03-inquiry-queue-separation.md
+++ b/work-summary/2026-02-03-inquiry-queue-separation.md
@@ -0,0 +1,279 @@
+# Work Summary: Inquiry Queue Separation Fix
+
+**Date:** 2026-02-03  
+**Issues:** 
+- Executor deserialization error: "missing field `inquiry_id`"
+- Executor deserialization error: "missing field `action_id`"
+
+**Status:** ✅ Both Fixed
+
+## Visual Overview
+
+### Before Fix ❌
+```
+attune.execution.status.queue
+    ├─ Consumer: CompletionListener (expects ExecutionCompletedPayload)
+    ├─ Consumer: ExecutionManager (expects ExecutionStatusPayload)
+    └─ Consumer: InquiryHandler (expects InquiryRespondedPayload)
+    
+    Incoming Messages:
+    - execution.completed → ExecutionCompletedPayload
+    - execution.status.changed → ExecutionStatusChangedPayload
+    - inquiry.responded → InquiryRespondedPayload
+    
+    Problem: Round-robin distribution causes wrong consumer to receive wrong message type!
+```
+
+### After Fix ✅
+```
+attune.execution.completed.queue
+    └─ Consumer: CompletionListener (expects ExecutionCompletedPayload)
+    └─ Message: execution.completed → ExecutionCompletedPayload ✓
+
+attune.execution.status.queue
+    └─ Consumer: ExecutionManager (expects ExecutionStatusPayload)
+    └─ Message: execution.status.changed → ExecutionStatusChangedPayload ✓
+
+attune.inquiry.responses.queue
+    └─ Consumer: InquiryHandler (expects InquiryRespondedPayload)
+    └─ Message: inquiry.responded → InquiryRespondedPayload ✓
+    
+    Result: Each queue has ONE consumer expecting ONE message type!
+```
+
+## Problem Description
+
+The executor service was logging deserialization errors when processing messages from the `execution_status` queue:
+
+```
+ERROR ThreadId(13) crates/common/src/mq/consumer.rs:112: Failed to deserialize message: missing field `inquiry_id` at line 1 column 318. Rejecting message.
+```
+
+## Root Cause Analysis
+
+The issue was caused by **two different consumers listening to the same RabbitMQ queue** but expecting different message payload types:
+
+### Queue Configuration Issue
+
+The `execution_status` queue (`attune.execution.status.queue`) was bound to the `attune.executions` exchange with routing key `"execution.status.changed"`, but it was receiving messages with two different routing keys:
+
+1. **`execution.completed`** → `ExecutionCompletedPayload` (published by Worker service)
+2. **`inquiry.responded`** → `InquiryRespondedPayload` (published by API service)
+
+### Competing Consumers
+
+Two consumers were configured to read from the same `execution_status` queue:
+
+1. **CompletionListener** (`executor.completion` tag)
+   - Expected: `ExecutionCompletedPayload` 
+   - Fields: `execution_id`, `action_id`, `action_ref`, `status`, `result`, `completed_at`
+
+2. **InquiryHandler** (`executor.inquiry` tag)
+   - Expected: `InquiryRespondedPayload`
+   - Fields: `inquiry_id`, `execution_id`, `response`, `responded_by`, `responded_at`
+
+### Message Routing Behavior
+
+RabbitMQ distributes messages to consumers on the same queue using **round-robin load balancing**. This meant:
+
+- When an `InquiryRespondedPayload` was delivered to `CompletionListener` → **deserialization failed** (missing `inquiry_id`)
+- When an `ExecutionCompletedPayload` was delivered to `InquiryHandler` → **deserialization failed** (missing `action_id`)
+
+The error message specifically mentioned `inquiry_id` because `CompletionListener` tried to deserialize an inquiry response message.
+
+## Solution Implemented
+
+### 1. Created Separate Queue for Inquiry Responses
+
+**File:** `attune/crates/common/src/mq/config.rs`
+
+Added a new queue configuration:
+
+```rust
+pub struct QueuesConfig {
+    // ... existing queues ...
+    
+    /// Inquiry responses queue configuration
+    pub inquiry_responses: QueueConfig,
+}
+```
+
+Default configurations:
+```rust
+execution_completed: QueueConfig {
+    name: "attune.execution.completed.queue".to_string(),
+    durable: true,
+    exclusive: false,
+    auto_delete: false,
+},
+inquiry_responses: QueueConfig {
+    name: "attune.inquiry.responses.queue".to_string(),
+    durable: true,
+    exclusive: false,
+    auto_delete: false,
+}
+```
+
+### 2. Updated Infrastructure Setup
+
+**File:** `attune/crates/common/src/mq/connection.rs`
+
+Added queue declarations and bindings in `setup_infrastructure()`:
+
+```rust
+// Declare the new queues with DLX support
+self.declare_queue_with_dlx(&config.rabbitmq.queues.execution_completed, dlx).await?;
+self.declare_queue_with_dlx(&config.rabbitmq.queues.inquiry_responses, dlx).await?;
+
+// Bind execution_status queue to status changed messages for ExecutionManager
+self.bind_queue(
+    &config.rabbitmq.queues.execution_status.name,
+    &config.rabbitmq.exchanges.executions.name,
+    "execution.status.changed",
+)
+.await?;
+
+// Bind execution_completed queue to completed messages for CompletionListener
+self.bind_queue(
+    &config.rabbitmq.queues.execution_completed.name,
+    &config.rabbitmq.exchanges.executions.name,
+    "execution.completed",
+)
+.await?;
+
+// Bind inquiry_responses queue to inquiry responded messages for InquiryHandler
+self.bind_queue(
+    &config.rabbitmq.queues.inquiry_responses.name,
+    &config.rabbitmq.exchanges.executions.name,
+    "inquiry.responded",
+)
+.await?;
+```
+
+### 3. Updated Executor Service Configuration
+
+**File:** `attune/crates/executor/src/service.rs`
+
+Changed `InquiryHandler` and `CompletionListener` to consume from dedicated queues:
+
+```rust
+// InquiryHandler - Before:
+let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone();
+
+// InquiryHandler - After:
+let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.inquiry_responses.name.clone();
+
+// CompletionListener - Before:
+let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone();
+
+// CompletionListener - After:
+let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_completed.name.clone();
+```
+
+## Message Flow After Fix
+
+### Execution Completion Flow
+```
+Worker → publishes ExecutionCompletedPayload
+       → routing key: "execution.completed"
+       → exchange: "attune.executions"
+       → queue: "attune.execution.completed.queue"
+       → consumer: CompletionListener
+       ✅ Correct payload type received
+```
+
+### Execution Status Change Flow
+```
+Worker → publishes ExecutionStatusChangedPayload
+       → routing key: "execution.status.changed"
+       → exchange: "attune.executions"
+       → queue: "attune.execution.status.queue"
+       → consumer: ExecutionManager
+       ✅ Correct payload type received
+```
+
+### Inquiry Response Flow
+```
+API → publishes InquiryRespondedPayload
+    → routing key: "inquiry.responded"
+    → exchange: "attune.executions"
+    → queue: "attune.inquiry.responses.queue"
+    → consumer: InquiryHandler
+    ✅ Correct payload type received
+```
+
+## Benefits
+
+1. **Type Safety**: Each queue receives only one message type, eliminating deserialization errors
+2. **Scalability**: Can scale `CompletionListener`, `ExecutionManager`, and `InquiryHandler` independently
+3. **Maintainability**: Clear separation of concerns - each queue has a single purpose
+4. **Reliability**: No message rejection due to type mismatches
+5. **Performance**: No wasted processing from consumers receiving wrong message types
+
+## Queue Separation Summary
+
+After both fixes, we now have three dedicated queues for execution-related messages:
+
+| Queue | Routing Key | Message Type | Consumer |
+|-------|-------------|--------------|----------|
+| `attune.execution.status.queue` | `execution.status.changed` | `ExecutionStatusChangedPayload` | ExecutionManager |
+| `attune.execution.completed.queue` | `execution.completed` | `ExecutionCompletedPayload` | CompletionListener |
+| `attune.inquiry.responses.queue` | `inquiry.responded` | `InquiryRespondedPayload` | InquiryHandler |
+
+**Result:** Each queue now has exactly one consumer expecting exactly one message type. ✅
+
+## Testing Recommendations
+
+1. **Restart all services** to recreate the queue infrastructure with new bindings
+2. **Verify queue creation** in RabbitMQ management UI:
+   - Check that `attune.inquiry.responses.queue` exists
+   - Check that `attune.execution.completed.queue` exists
+   - Verify bindings on `attune.executions` exchange:
+     - `inquiry.responded` → `attune.inquiry.responses.queue`
+     - `execution.completed` → `attune.execution.completed.queue`
+     - `execution.status.changed` → `attune.execution.status.queue`
+3. **Monitor executor logs** for absence of deserialization errors (`inquiry_id` and `action_id`)
+4. **Test inquiry workflow**:
+   - Create an action that requests inquiry (`__inquiry` in result)
+   - Respond to inquiry via API
+   - Verify execution resumes correctly
+5. **Test execution completion**:
+   - Execute a simple action
+   - Verify completion notification processed without errors
+
+
+### Files Modified
+
+- `attune/crates/common/src/mq/config.rs` - Added `inquiry_responses` and `execution_completed` queues
+- `attune/crates/common/src/mq/connection.rs` - Added queue declarations and bindings
+- `attune/crates/executor/src/service.rs` - Updated InquiryHandler and CompletionListener to use new queues
+
+## Migration Notes
+
+This is a **breaking change** for existing deployments:
+
+1. Two new queues will be created automatically on service startup:
+   - `attune.inquiry.responses.queue`
+   - `attune.execution.completed.queue`
+2. The `execution_status` queue now has **only one binding** (`execution.status.changed`)
+3. Existing messages in queues are unaffected
+4. No database migrations required
+5. **Action Required**: Restart executor service to apply changes
+
+## Related Issues
+
+- Original implementation assumed a single queue could handle multiple message types
+- RabbitMQ round-robin distribution caused non-deterministic deserialization failures
+- Errors were intermittent because they depended on which consumer received which message
+- `ExecutionManager` uses local payload struct instead of canonical `ExecutionStatusChangedPayload` (not critical but should be unified in future)
+
+## Lessons Learned
+
+1. **One queue, one message type**: RabbitMQ queues should have a single message schema
+2. **One queue, one consumer**: Multiple consumers on the same queue creates competition, not cooperation
+3. **Use routing keys effectively**: Topic exchanges with specific routing keys provide better message segregation
+4. **Consumer tag awareness**: Consumer tags don't prevent round-robin distribution within the same queue
+5. **Type-safe patterns**: Rust's strong typing revealed the issue quickly through deserialization errors
+6. **Canonical message types**: Use shared message structs from `attune_common::mq::messages`, not local definitions
+7. **Incremental fixes**: Sometimes you discover deeper issues while fixing surface-level problems - fix them all at once
+8. **Test thoroughly**: Restart services and monitor logs to catch related issues before they reach production