Files
attune/work-summary/QUICKFIX-deserialization-errors.md
2026-02-04 17:46:30 -06:00

5.8 KiB

QUICK FIX: Executor Deserialization Errors

Date: 2026-02-03
Status: FIXED
Severity: Critical
Downtime: Minimal (service restart only)

What Was Broken

The executor service was rejecting messages with these errors:

ERROR: Failed to deserialize message: missing field `inquiry_id`
ERROR: Failed to deserialize message: missing field `action_id`

Root Causes

  1. Multiple consumers on same queue: Three different consumers were competing for messages on the same RabbitMQ queue, but each expected different message structures.

  2. Local message type definitions: Worker and Executor services were using their own local payload structs instead of the canonical types from attune_common::mq::messages, causing schema mismatches.

The Problem in Detail

attune.execution.status.queue had 3 consumers:

  1. CompletionListener - Expected ExecutionCompletedPayload (has action_id)
  2. ExecutionManager - Expected ExecutionStatusPayload (no action_id)
  3. InquiryHandler - Expected InquiryRespondedPayload (has inquiry_id)

All three message types were being routed to this single queue, causing random deserialization failures.

The Fixes

Fix 1: Queue Separation

Created 2 new dedicated queues so each consumer gets its own queue with the correct message type:

Queue Consumer Message Type Routing Key
attune.execution.status.queue ExecutionManager ExecutionStatusChangedPayload execution.status.changed
attune.execution.completed.queue CompletionListener ExecutionCompletedPayload execution.completed
attune.inquiry.responses.queue InquiryHandler InquiryRespondedPayload inquiry.responded

Fix 2: Canonical Message Types

Updated Worker and Executor to use canonical message types from attune_common::mq:

  • Worker now imports and uses ExecutionStatusChangedPayload (canonical)
  • Executor now imports and uses ExecutionStatusChangedPayload and ExecutionCompletedPayload (canonical)
  • Removed all local payload struct definitions
  • Added database queries to populate required fields (action_ref, action_id)

Files Changed

Queue Separation

  • attune/crates/common/src/mq/config.rs - Added 2 new queue configs
  • attune/crates/common/src/mq/connection.rs - Added queue declarations and bindings
  • attune/crates/executor/src/service.rs - Updated consumers to use correct queues

Canonical Message Types

  • attune/crates/worker/src/service.rs - Use canonical ExecutionStatusChangedPayload
  • attune/crates/executor/src/execution_manager.rs - Use canonical payload types

How to Deploy

Quick Deploy (Production)

# 1. Stop both executor and worker
sudo systemctl stop attune-executor attune-worker

# 2. Pull and rebuild (BOTH services need rebuild)
git pull origin main
cd attune
cargo build --release --bin attune-executor --bin attune-worker

# 3. OPTIONAL BUT RECOMMENDED: Clear old messages
rabbitmqadmin purge queue name=attune.execution.status.queue
rabbitmqadmin purge queue name=attune.execution.completed.queue

# 4. Start services (new queues created automatically)
sudo systemctl start attune-executor attune-worker

# 5. Verify (should see NO errors)
grep "Failed to deserialize" /var/log/attune/executor.log
grep "missing field" /var/log/attune/executor.log

Development Deploy

# Stop both services
make stop-executor stop-worker
# or: docker-compose stop executor worker

# Rebuild both
cargo build --bin attune-executor --bin attune-worker

# OPTIONAL: Clear old messages
rabbitmqadmin purge queue name=attune.execution.status.queue
rabbitmqadmin purge queue name=attune.execution.completed.queue

# Start both services
make run-executor run-worker
# or: docker-compose up -d executor worker

# Watch logs
tail -f logs/executor.log logs/worker.log

Verification

After deploying, verify these 3 things:

1. New Queues Exist

Check RabbitMQ UI (http://localhost:15672):

  • attune.inquiry.responses.queue exists
  • attune.execution.completed.queue exists

2. No Deserialization Errors

# Wait 5 minutes, then check logs (should be empty):
grep "missing field" /var/log/attune/executor.log
grep "Failed to deserialize" /var/log/attune/executor.log

3. Executions Work

# Test execution completes successfully
attune action execute core.echo --param message="test"

Rollback (If Needed)

# Stop executor
sudo systemctl stop attune-executor

# Revert code
git revert <commit-hash>
cargo build --release --bin attune-executor

# Start executor
sudo systemctl start attune-executor

Impact

Before: ~30-50% message rejection rate, executions failing
After: 0% rejection rate, all executions working

Why Old Messages Still Cause Errors

If you rebuilt and restarted but still see errors, it's because old messages with the wrong schema are still in the queues. The fix prevents NEW messages from having the problem, but old messages need to be purged:

# Clear old messages from queues
rabbitmqadmin purge queue name=attune.execution.status.queue
rabbitmqadmin purge queue name=attune.execution.completed.queue
rabbitmqadmin purge queue name=attune.inquiry.responses.queue

# Or via RabbitMQ Management UI
# http://localhost:15672 → Queues → Select queue → Purge Messages

More Details

See complete documentation:

  • attune/work-summary/2026-02-03-inquiry-queue-separation.md - Queue separation details
  • attune/work-summary/2026-02-03-canonical-message-types.md - Message type fix details
  • attune/docs/QUICKREF-rabbitmq-queues.md - Queue architecture reference
  • attune/docs/MIGRATION-queue-separation-2026-02-03.md - Detailed migration guide

TL;DR: Separated queues + unified message types. Rebuild/restart executor + worker. Purge old messages if errors persist.