attune-system/attune

Fork 0

Files

David Culbreth 3b14c65998 re-uploading work

2026-02-04 17:46:30 -06:00

11 KiB

Raw Blame History

Work Summary: Inquiry Queue Separation Fix

Date: 2026-02-03
Issues:

Executor deserialization error: "missing field inquiry_id"
Executor deserialization error: "missing field action_id"

Status: ✅ Both Fixed

Visual Overview

Before Fix ❌

attune.execution.status.queue
    ├─ Consumer: CompletionListener (expects ExecutionCompletedPayload)
    ├─ Consumer: ExecutionManager (expects ExecutionStatusPayload)
    └─ Consumer: InquiryHandler (expects InquiryRespondedPayload)
    
    Incoming Messages:
    - execution.completed → ExecutionCompletedPayload
    - execution.status.changed → ExecutionStatusChangedPayload
    - inquiry.responded → InquiryRespondedPayload
    
    Problem: Round-robin distribution causes wrong consumer to receive wrong message type!

After Fix ✅

attune.execution.completed.queue
    └─ Consumer: CompletionListener (expects ExecutionCompletedPayload)
    └─ Message: execution.completed → ExecutionCompletedPayload ✓

attune.execution.status.queue
    └─ Consumer: ExecutionManager (expects ExecutionStatusPayload)
    └─ Message: execution.status.changed → ExecutionStatusChangedPayload ✓

attune.inquiry.responses.queue
    └─ Consumer: InquiryHandler (expects InquiryRespondedPayload)
    └─ Message: inquiry.responded → InquiryRespondedPayload ✓
    
    Result: Each queue has ONE consumer expecting ONE message type!

Problem Description

The executor service was logging deserialization errors when processing messages from the execution_status queue:

ERROR ThreadId(13) crates/common/src/mq/consumer.rs:112: Failed to deserialize message: missing field `inquiry_id` at line 1 column 318. Rejecting message.

Root Cause Analysis

The issue was caused by two different consumers listening to the same RabbitMQ queue but expecting different message payload types:

Queue Configuration Issue

The execution_status queue (attune.execution.status.queue) was bound to the attune.executions exchange with routing key "execution.status.changed", but it was receiving messages with two different routing keys:

execution.completed → ExecutionCompletedPayload (published by Worker service)
inquiry.responded → InquiryRespondedPayload (published by API service)

Competing Consumers

Two consumers were configured to read from the same execution_status queue:

CompletionListener (executor.completion tag)
- Expected: ExecutionCompletedPayload
- Fields: execution_id, action_id, action_ref, status, result, completed_at
InquiryHandler (executor.inquiry tag)
- Expected: InquiryRespondedPayload
- Fields: inquiry_id, execution_id, response, responded_by, responded_at

Message Routing Behavior

RabbitMQ distributes messages to consumers on the same queue using round-robin load balancing. This meant:

When an InquiryRespondedPayload was delivered to CompletionListener → deserialization failed (missing inquiry_id)
When an ExecutionCompletedPayload was delivered to InquiryHandler → deserialization failed (missing action_id)

The error message specifically mentioned inquiry_id because CompletionListener tried to deserialize an inquiry response message.

Solution Implemented

1. Created Separate Queue for Inquiry Responses

File: attune/crates/common/src/mq/config.rs

Added a new queue configuration:

pub struct QueuesConfig {
    // ... existing queues ...
    
    /// Inquiry responses queue configuration
    pub inquiry_responses: QueueConfig,
}

Default configurations:

execution_completed: QueueConfig {
    name: "attune.execution.completed.queue".to_string(),
    durable: true,
    exclusive: false,
    auto_delete: false,
},
inquiry_responses: QueueConfig {
    name: "attune.inquiry.responses.queue".to_string(),
    durable: true,
    exclusive: false,
    auto_delete: false,
}

2. Updated Infrastructure Setup

File: attune/crates/common/src/mq/connection.rs

Added queue declarations and bindings in setup_infrastructure():

// Declare the new queues with DLX support
self.declare_queue_with_dlx(&config.rabbitmq.queues.execution_completed, dlx).await?;
self.declare_queue_with_dlx(&config.rabbitmq.queues.inquiry_responses, dlx).await?;

// Bind execution_status queue to status changed messages for ExecutionManager
self.bind_queue(
    &config.rabbitmq.queues.execution_status.name,
    &config.rabbitmq.exchanges.executions.name,
    "execution.status.changed",
)
.await?;

// Bind execution_completed queue to completed messages for CompletionListener
self.bind_queue(
    &config.rabbitmq.queues.execution_completed.name,
    &config.rabbitmq.exchanges.executions.name,
    "execution.completed",
)
.await?;

// Bind inquiry_responses queue to inquiry responded messages for InquiryHandler
self.bind_queue(
    &config.rabbitmq.queues.inquiry_responses.name,
    &config.rabbitmq.exchanges.executions.name,
    "inquiry.responded",
)
.await?;

3. Updated Executor Service Configuration

File: attune/crates/executor/src/service.rs

Changed InquiryHandler and CompletionListener to consume from dedicated queues:

// InquiryHandler - Before:
let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone();

// InquiryHandler - After:
let inquiry_response_queue = self.inner.mq_config.rabbitmq.queues.inquiry_responses.name.clone();

// CompletionListener - Before:
let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_status.name.clone();

// CompletionListener - After:
let execution_completed_queue = self.inner.mq_config.rabbitmq.queues.execution_completed.name.clone();

Message Flow After Fix

Execution Completion Flow

Worker → publishes ExecutionCompletedPayload
       → routing key: "execution.completed"
       → exchange: "attune.executions"
       → queue: "attune.execution.completed.queue"
       → consumer: CompletionListener
       ✅ Correct payload type received

Execution Status Change Flow

Worker → publishes ExecutionStatusChangedPayload
       → routing key: "execution.status.changed"
       → exchange: "attune.executions"
       → queue: "attune.execution.status.queue"
       → consumer: ExecutionManager
       ✅ Correct payload type received

Inquiry Response Flow

API → publishes InquiryRespondedPayload
    → routing key: "inquiry.responded"
    → exchange: "attune.executions"
    → queue: "attune.inquiry.responses.queue"
    → consumer: InquiryHandler
    ✅ Correct payload type received

Benefits

Type Safety: Each queue receives only one message type, eliminating deserialization errors
Scalability: Can scale CompletionListener, ExecutionManager, and InquiryHandler independently
Maintainability: Clear separation of concerns - each queue has a single purpose
Reliability: No message rejection due to type mismatches
Performance: No wasted processing from consumers receiving wrong message types

Queue Separation Summary

After both fixes, we now have three dedicated queues for execution-related messages:

Queue	Routing Key	Message Type	Consumer
`attune.execution.status.queue`	`execution.status.changed`	`ExecutionStatusChangedPayload`	ExecutionManager
`attune.execution.completed.queue`	`execution.completed`	`ExecutionCompletedPayload`	CompletionListener
`attune.inquiry.responses.queue`	`inquiry.responded`	`InquiryRespondedPayload`	InquiryHandler

Result: Each queue now has exactly one consumer expecting exactly one message type. ✅

Testing Recommendations

Restart all services to recreate the queue infrastructure with new bindings
Verify queue creation in RabbitMQ management UI:
- Check that attune.inquiry.responses.queue exists
- Check that attune.execution.completed.queue exists
- Verify bindings on attune.executions exchange:
  - inquiry.responded → attune.inquiry.responses.queue
  - execution.completed → attune.execution.completed.queue
  - execution.status.changed → attune.execution.status.queue
Monitor executor logs for absence of deserialization errors (inquiry_id and action_id)
Test inquiry workflow:
- Create an action that requests inquiry (__inquiry in result)
- Respond to inquiry via API
- Verify execution resumes correctly
Test execution completion:
- Execute a simple action
- Verify completion notification processed without errors

Files Modified

attune/crates/common/src/mq/config.rs - Added inquiry_responses and execution_completed queues
attune/crates/common/src/mq/connection.rs - Added queue declarations and bindings
attune/crates/executor/src/service.rs - Updated InquiryHandler and CompletionListener to use new queues

Migration Notes

This is a breaking change for existing deployments:

Two new queues will be created automatically on service startup:
- attune.inquiry.responses.queue
- attune.execution.completed.queue
The execution_status queue now has only one binding (execution.status.changed)
Existing messages in queues are unaffected
No database migrations required
Action Required: Restart executor service to apply changes

Original implementation assumed a single queue could handle multiple message types
RabbitMQ round-robin distribution caused non-deterministic deserialization failures
Errors were intermittent because they depended on which consumer received which message
ExecutionManager uses local payload struct instead of canonical ExecutionStatusChangedPayload (not critical but should be unified in future)

Lessons Learned

One queue, one message type: RabbitMQ queues should have a single message schema
One queue, one consumer: Multiple consumers on the same queue creates competition, not cooperation
Use routing keys effectively: Topic exchanges with specific routing keys provide better message segregation
Consumer tag awareness: Consumer tags don't prevent round-robin distribution within the same queue
Type-safe patterns: Rust's strong typing revealed the issue quickly through deserialization errors
Canonical message types: Use shared message structs from attune_common::mq::messages, not local definitions
Incremental fixes: Sometimes you discover deeper issues while fixing surface-level problems - fix them all at once
Test thoroughly: Restart services and monitor logs to catch related issues before they reach production

11 KiB Raw Blame History