179 lines
5.8 KiB
Markdown
179 lines
5.8 KiB
Markdown
# QUICK FIX: Executor Deserialization Errors
|
|
|
|
**Date:** 2026-02-03
|
|
**Status:** ✅ FIXED
|
|
**Severity:** Critical
|
|
**Downtime:** Minimal (service restart only)
|
|
|
|
## What Was Broken
|
|
|
|
The executor service was rejecting messages with these errors:
|
|
```
|
|
ERROR: Failed to deserialize message: missing field `inquiry_id`
|
|
ERROR: Failed to deserialize message: missing field `action_id`
|
|
```
|
|
|
|
## Root Causes
|
|
|
|
1. **Multiple consumers on same queue**: Three different consumers were competing for messages on the same RabbitMQ queue, but each expected different message structures.
|
|
|
|
2. **Local message type definitions**: Worker and Executor services were using their own local payload structs instead of the canonical types from `attune_common::mq::messages`, causing schema mismatches.
|
|
|
|
### The Problem in Detail
|
|
|
|
`attune.execution.status.queue` had 3 consumers:
|
|
|
|
1. **CompletionListener** - Expected `ExecutionCompletedPayload` (has `action_id`)
|
|
2. **ExecutionManager** - Expected `ExecutionStatusPayload` (no `action_id`)
|
|
3. **InquiryHandler** - Expected `InquiryRespondedPayload` (has `inquiry_id`)
|
|
|
|
All three message types were being routed to this single queue, causing random deserialization failures.
|
|
|
|
## The Fixes
|
|
|
|
### Fix 1: Queue Separation
|
|
|
|
**Created 2 new dedicated queues** so each consumer gets its own queue with the correct message type:
|
|
|
|
| Queue | Consumer | Message Type | Routing Key |
|
|
|-------|----------|--------------|-------------|
|
|
| `attune.execution.status.queue` | ExecutionManager | ExecutionStatusChangedPayload | `execution.status.changed` |
|
|
| `attune.execution.completed.queue` | CompletionListener | ExecutionCompletedPayload | `execution.completed` |
|
|
| `attune.inquiry.responses.queue` | InquiryHandler | InquiryRespondedPayload | `inquiry.responded` |
|
|
|
|
### Fix 2: Canonical Message Types
|
|
|
|
**Updated Worker and Executor to use canonical message types** from `attune_common::mq`:
|
|
|
|
- Worker now imports and uses `ExecutionStatusChangedPayload` (canonical)
|
|
- Executor now imports and uses `ExecutionStatusChangedPayload` and `ExecutionCompletedPayload` (canonical)
|
|
- Removed all local payload struct definitions
|
|
- Added database queries to populate required fields (action_ref, action_id)
|
|
|
|
## Files Changed
|
|
|
|
### Queue Separation
|
|
- `attune/crates/common/src/mq/config.rs` - Added 2 new queue configs
|
|
- `attune/crates/common/src/mq/connection.rs` - Added queue declarations and bindings
|
|
- `attune/crates/executor/src/service.rs` - Updated consumers to use correct queues
|
|
|
|
### Canonical Message Types
|
|
- `attune/crates/worker/src/service.rs` - Use canonical `ExecutionStatusChangedPayload`
|
|
- `attune/crates/executor/src/execution_manager.rs` - Use canonical payload types
|
|
|
|
## How to Deploy
|
|
|
|
### Quick Deploy (Production)
|
|
|
|
```bash
|
|
# 1. Stop both executor and worker
|
|
sudo systemctl stop attune-executor attune-worker
|
|
|
|
# 2. Pull and rebuild (BOTH services need rebuild)
|
|
git pull origin main
|
|
cd attune
|
|
cargo build --release --bin attune-executor --bin attune-worker
|
|
|
|
# 3. OPTIONAL BUT RECOMMENDED: Clear old messages
|
|
rabbitmqadmin purge queue name=attune.execution.status.queue
|
|
rabbitmqadmin purge queue name=attune.execution.completed.queue
|
|
|
|
# 4. Start services (new queues created automatically)
|
|
sudo systemctl start attune-executor attune-worker
|
|
|
|
# 5. Verify (should see NO errors)
|
|
grep "Failed to deserialize" /var/log/attune/executor.log
|
|
grep "missing field" /var/log/attune/executor.log
|
|
```
|
|
|
|
### Development Deploy
|
|
|
|
```bash
|
|
# Stop both services
|
|
make stop-executor stop-worker
|
|
# or: docker-compose stop executor worker
|
|
|
|
# Rebuild both
|
|
cargo build --bin attune-executor --bin attune-worker
|
|
|
|
# OPTIONAL: Clear old messages
|
|
rabbitmqadmin purge queue name=attune.execution.status.queue
|
|
rabbitmqadmin purge queue name=attune.execution.completed.queue
|
|
|
|
# Start both services
|
|
make run-executor run-worker
|
|
# or: docker-compose up -d executor worker
|
|
|
|
# Watch logs
|
|
tail -f logs/executor.log logs/worker.log
|
|
```
|
|
|
|
## Verification
|
|
|
|
After deploying, verify these 3 things:
|
|
|
|
### 1. New Queues Exist
|
|
|
|
Check RabbitMQ UI (http://localhost:15672):
|
|
- ✅ `attune.inquiry.responses.queue` exists
|
|
- ✅ `attune.execution.completed.queue` exists
|
|
|
|
### 2. No Deserialization Errors
|
|
|
|
```bash
|
|
# Wait 5 minutes, then check logs (should be empty):
|
|
grep "missing field" /var/log/attune/executor.log
|
|
grep "Failed to deserialize" /var/log/attune/executor.log
|
|
```
|
|
|
|
### 3. Executions Work
|
|
|
|
```bash
|
|
# Test execution completes successfully
|
|
attune action execute core.echo --param message="test"
|
|
```
|
|
|
|
## Rollback (If Needed)
|
|
|
|
```bash
|
|
# Stop executor
|
|
sudo systemctl stop attune-executor
|
|
|
|
# Revert code
|
|
git revert <commit-hash>
|
|
cargo build --release --bin attune-executor
|
|
|
|
# Start executor
|
|
sudo systemctl start attune-executor
|
|
```
|
|
|
|
## Impact
|
|
|
|
**Before:** ~30-50% message rejection rate, executions failing
|
|
**After:** 0% rejection rate, all executions working ✅
|
|
|
|
## Why Old Messages Still Cause Errors
|
|
|
|
If you rebuilt and restarted but still see errors, it's because **old messages with the wrong schema are still in the queues**. The fix prevents NEW messages from having the problem, but old messages need to be purged:
|
|
|
|
```bash
|
|
# Clear old messages from queues
|
|
rabbitmqadmin purge queue name=attune.execution.status.queue
|
|
rabbitmqadmin purge queue name=attune.execution.completed.queue
|
|
rabbitmqadmin purge queue name=attune.inquiry.responses.queue
|
|
|
|
# Or via RabbitMQ Management UI
|
|
# http://localhost:15672 → Queues → Select queue → Purge Messages
|
|
```
|
|
|
|
## More Details
|
|
|
|
See complete documentation:
|
|
- `attune/work-summary/2026-02-03-inquiry-queue-separation.md` - Queue separation details
|
|
- `attune/work-summary/2026-02-03-canonical-message-types.md` - Message type fix details
|
|
- `attune/docs/QUICKREF-rabbitmq-queues.md` - Queue architecture reference
|
|
- `attune/docs/MIGRATION-queue-separation-2026-02-03.md` - Detailed migration guide
|
|
|
|
---
|
|
|
|
**TL;DR:** Separated queues + unified message types. Rebuild/restart executor + worker. Purge old messages if errors persist. |