Files
attune/docs/MIGRATION-queue-separation-2026-02-03.md
2026-02-04 17:46:30 -06:00

7.5 KiB

Migration Guide: Queue Separation Fix (2026-02-03)

Issue: Deserialization errors in executor service
Urgency: High - Critical bug causing message rejection
Downtime Required: Yes (brief - service restart only)

Overview

This migration separates competing consumers on shared RabbitMQ queues into dedicated queues, fixing deserialization errors:

  • missing field 'inquiry_id'
  • missing field 'action_id'

Changes Summary

New Queues Created

  1. attune.inquiry.responses.queue - For inquiry response messages
  2. attune.execution.completed.queue - For execution completion messages

Queue Bindings Modified

  • attune.execution.status.queue - Now only receives execution.status.changed messages
  • attune.execution.completed.queue - Now receives execution.completed messages
  • attune.inquiry.responses.queue - Now receives inquiry.responded messages

Services Affected

  • Executor Service - Requires restart (consumers reconfigured)
  • Worker Service - No changes required (publishers work automatically)
  • API Service - No changes required (publishers work automatically)

Pre-Migration Checklist

  • Backup current RabbitMQ configuration
  • Note current queue depths in RabbitMQ management UI
  • Verify all services are running and healthy
  • Review recent executor logs for deserialization errors
  • Ensure you have access to restart the executor service

Migration Steps

Step 1: Stop the Executor Service

# Using systemd
sudo systemctl stop attune-executor

# Using docker-compose
docker-compose stop executor

# Or kill the process
pkill -f attune-executor

Step 2: Deploy Updated Code

# Pull latest code
git pull origin main

# Rebuild executor (and common library)
cd attune
cargo build --release --bin attune-executor

Step 3: Verify RabbitMQ Queue Creation

The new queues will be created automatically when the executor starts, but you can verify the configuration:

# Check that the code is updated
grep -r "inquiry_responses" crates/common/src/mq/config.rs
grep -r "execution_completed" crates/common/src/mq/config.rs

Step 4: Start the Executor Service

# Using systemd
sudo systemctl start attune-executor

# Using docker-compose
docker-compose start executor

# Or directly
./target/release/attune-executor --config config.production.yaml

Step 5: Verify Queue Creation in RabbitMQ

Check RabbitMQ Management UI (http://localhost:15672):

Queues Tab:

  • attune.inquiry.responses.queue exists
  • attune.execution.completed.queue exists
  • attune.execution.status.queue still exists

Exchanges Tab → attune.executions → Bindings:

  • inquiry.respondedattune.inquiry.responses.queue
  • execution.completedattune.execution.completed.queue
  • execution.status.changedattune.execution.status.queue

Step 6: Monitor Executor Logs

# Watch for successful startup
tail -f /var/log/attune/executor.log

# Or with journalctl
journalctl -u attune-executor -f

# Or with docker
docker logs -f attune-executor

Expected log messages:

INFO Starting Executor Service
INFO Message queue connection established
INFO Queue manager initialized with database persistence
INFO Starting event processor...
INFO Starting completion listener...
INFO Starting enforcement processor...
INFO Starting execution scheduler...
INFO Starting execution manager...
INFO Starting inquiry handler...
INFO Executor Service started successfully

Step 7: Verify No Deserialization Errors

# Check for the specific errors (should be NONE)
grep "missing field.*inquiry_id" /var/log/attune/executor.log
grep "missing field.*action_id" /var/log/attune/executor.log
grep "Failed to deserialize message" /var/log/attune/executor.log

If no output, the fix is working!

Step 8: Functional Testing

Test Execution Completion:

# Execute a simple action
attune action execute core.echo --param message="test"

# Verify execution completes without errors in logs

Test Inquiry Workflow (if applicable):

# Create an action that requests inquiry
# Respond to the inquiry via API
# Verify execution resumes

Test Status Updates:

# Execute a longer-running action
# Verify status updates are processed correctly

Rollback Procedure

If issues occur, you can rollback:

Step 1: Stop Executor

sudo systemctl stop attune-executor

Step 2: Revert Code

git revert <commit-hash>
cargo build --release --bin attune-executor

Step 3: Remove New Queues (Optional)

# Via RabbitMQ Management API
curl -u guest:guest -X DELETE http://localhost:15672/api/queues/%2F/attune.inquiry.responses.queue
curl -u guest:guest -X DELETE http://localhost:15672/api/queues/%2F/attune.execution.completed.queue

Step 4: Restart Executor

sudo systemctl start attune-executor

Post-Migration Verification

  • Executor service is running and healthy
  • No deserialization errors in logs for 15+ minutes
  • Test executions complete successfully
  • Inquiries (if used) work correctly
  • All three new queue bindings show in RabbitMQ UI
  • Queue message rates look normal
  • No messages in dead letter queues

Monitoring Points

Watch these metrics for 24 hours post-migration:

  1. Executor Error Rate - Should drop to near zero
  2. Queue Depths - Should remain stable/low
  3. Message Delivery Rate - Should remain consistent
  4. Dead Letter Queue Depth - Should not increase

Troubleshooting

Issue: New queues not created

Symptoms: Queues don't appear in RabbitMQ UI

Solution:

# Check executor logs for connection errors
grep "Failed to declare queue" /var/log/attune/executor.log

# Verify RabbitMQ permissions
rabbitmqctl list_user_permissions attune_user

Issue: Still seeing deserialization errors

Symptoms: Errors persist after restart

Solution:

# 1. Verify code was rebuilt
attune-executor --version

# 2. Check which queues consumers are using
grep "Starting.*listener" /var/log/attune/executor.log

# 3. Verify bindings in RabbitMQ UI match expected configuration

# 4. Restart ALL services to ensure workers/API use new bindings
sudo systemctl restart attune-worker attune-api attune-executor

Issue: Messages stuck in old queue

Symptoms: Old execution.status.queue has growing backlog

Solution:

# Check what messages are in the queue
rabbitmqadmin get queue=attune.execution.status.queue count=5

# If they're completion messages, manually move them:
# 1. Temporarily stop executor
# 2. Purge old queue
# 3. Restart executor (messages will be redelivered after TTL)

Impact Assessment

Before Fix:

  • ~30-50% of messages rejected due to deserialization errors
  • Executions not completing properly
  • Inquiries not being processed
  • Resource waste from redelivery attempts

After Fix:

  • 100% message delivery success rate
  • All executions complete correctly
  • Inquiries processed immediately
  • Reduced message queue load

Questions?

Contact the platform team or refer to:

  • attune/work-summary/2026-02-03-inquiry-queue-separation.md - Technical details
  • attune/docs/QUICKREF-rabbitmq-queues.md - Queue architecture reference
  • attune/docs/architecture/queue-architecture.md - Overall architecture

Migration Completed: __________ (date/time)
Performed By: __________
Issues Encountered: __________
Notes: __________