16 KiB
16 KiB
Bug Fix: Duplicate Completion Notifications & Unnecessary Database Updates
Date: 2026-02-09
Component: Executor Service (ExecutionManager)
Issue Type: Performance & Correctness
Overview
Fixed two related inefficiencies in the executor service:
- Duplicate completion notifications causing queue manager warnings
- Unnecessary database updates writing unchanged status values
Problem 1: Duplicate Completion Notifications
Symptom
WARN crates/executor/src/queue_manager.rs:320:
Completion notification for action 3 but active_count is 0
Before Fix - Message Flow
┌─────────────────────────────────────────────────────────────────┐
│ Worker Service │
│ │
│ 1. Completes action execution │
│ 2. Updates DB: status = "Completed" │
│ 3. Publishes: execution.status_changed (status: "completed") │
│ 4. Publishes: execution.completed ────────────┐ │
└─────────────────────────────────────────────────┼───────────────┘
│
┌────────────────────────────────┼───────────────┐
│ │ │
▼ ▼ │
┌─────────────────────────────┐ ┌──────────────────────────────┤
│ ExecutionManager │ │ CompletionListener │
│ │ │ │
│ Receives: │ │ Receives: execution.completed│
│ execution.status_changed │ │ │
│ │ │ → notify_completion() │
│ → handle_completion() │ │ → Decrements active_count ✅ │
│ → publish_completion_notif()│ └──────────────────────────────┘
│ │
│ Publishes: execution.completed ───────┐
└─────────────────────────────┘ │
│
┌─────────────────────┘
│
▼
┌────────────────────────────┐
│ CompletionListener (again) │
│ │
│ Receives: execution.completed (2nd time!)
│ │
│ → notify_completion() │
│ → active_count already 0 │
│ → ⚠️ WARNING LOGGED │
└────────────────────────────┘
Result: 2x completion notifications, 1x warning
After Fix - Message Flow
┌─────────────────────────────────────────────────────────────────┐
│ Worker Service │
│ │
│ 1. Completes action execution │
│ 2. Updates DB: status = "Completed" │
│ 3. Publishes: execution.status_changed (status: "completed") │
│ 4. Publishes: execution.completed ────────────┐ │
└─────────────────────────────────────────────────┼───────────────┘
│
┌────────────────────────────────┼───────────────┐
│ │ │
▼ ▼ │
┌─────────────────────────────┐ ┌──────────────────────────────┤
│ ExecutionManager │ │ CompletionListener │
│ │ │ │
│ Receives: │ │ Receives: execution.completed│
│ execution.status_changed │ │ │
│ │ │ → notify_completion() │
│ → handle_completion() │ │ → Decrements active_count ✅ │
│ → Handles workflow children │ └──────────────────────────────┘
│ → NO completion publish ✅ │
└─────────────────────────────┘
Result: 1x completion notification, 0x warnings ✅
Problem 2: Unnecessary Database Updates
Symptom
INFO crates/executor/src/execution_manager.rs:108:
Updated execution 9061 status: Completed -> Completed
Before Fix - Status Update Flow
┌─────────────────────────────────────────────────────────────────┐
│ Worker Service │
│ │
│ 1. Completes action execution │
│ 2. ExecutionRepository::update() │
│ status: Running → Completed ✅ │
│ 3. Publishes: execution.status_changed (status: "completed") │
└─────────────────────────────────┬───────────────────────────────┘
│
│ Message Queue
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ExecutionManager │
│ │
│ 1. Receives: execution.status_changed (status: "completed") │
│ 2. Fetches execution from DB │
│ Current status: Completed │
│ 3. Sets: execution.status = Completed (same value) │
│ 4. ExecutionRepository::update() │
│ status: Completed → Completed ❌ │
│ 5. Logs: "Updated execution 9061 status: Completed -> Completed"
└─────────────────────────────────────────────────────────────────┘
Result: 2x database writes for same status value
After Fix - Status Update Flow
┌─────────────────────────────────────────────────────────────────┐
│ Worker Service │
│ │
│ 1. Completes action execution │
│ 2. ExecutionRepository::update() │
│ status: Running → Completed ✅ │
│ 3. Publishes: execution.status_changed (status: "completed") │
└─────────────────────────────────────┬───────────────────────────┘
│
│ Message Queue
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ExecutionManager │
│ │
│ 1. Receives: execution.status_changed (status: "completed") │
│ 2. Fetches execution from DB │
│ Current status: Completed │
│ 3. Compares: old_status (Completed) == new_status (Completed) │
│ 4. Skips database update ✅ │
│ 5. Still handles orchestration (workflow children) │
│ 6. Logs: "Execution 9061 status unchanged, skipping update" │
└─────────────────────────────────────────────────────────────────┘
Result: 1x database write (only when status changes) ✅
Code Changes
Change 1: Remove Duplicate Completion Publication
File: crates/executor/src/execution_manager.rs
// BEFORE
async fn handle_completion(...) -> Result<()> {
// Handle workflow children...
// Publish completion notification
Self::publish_completion_notification(pool, publisher, execution).await?;
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^
// DUPLICATE - worker already did this!
Ok(())
}
// AFTER
async fn handle_completion(...) -> Result<()> {
// Handle workflow children...
// NOTE: Completion notification is published by the worker, not here.
// This prevents duplicate execution.completed messages that would cause
// the queue manager to decrement active_count twice.
Ok(())
}
// Removed entire publish_completion_notification() method
Change 2: Skip Unnecessary Database Updates
File: crates/executor/src/execution_manager.rs
// BEFORE
async fn process_status_change(...) -> Result<()> {
let execution = ExecutionRepository::find_by_id(pool, execution_id).await?;
let old_status = execution.status.clone();
execution.status = status; // Always set, even if same
ExecutionRepository::update(pool, execution.id, execution.clone().into()).await?;
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// ALWAYS writes, even if unchanged!
info!("Updated execution {} status: {:?} -> {:?}", execution_id, old_status, status);
// Handle completion logic...
Ok(())
}
// AFTER
async fn process_status_change(...) -> Result<()> {
let execution = ExecutionRepository::find_by_id(pool, execution_id).await?;
let old_status = execution.status.clone();
// Skip update if status hasn't changed
if old_status == status {
debug!("Execution {} status unchanged ({:?}), skipping database update",
execution_id, status);
// Still handle completion logic for orchestration (e.g., workflow children)
if matches!(status, ExecutionStatus::Completed | ExecutionStatus::Failed | ExecutionStatus::Cancelled) {
Self::handle_completion(pool, publisher, &execution).await?;
}
return Ok(()); // Early return - no DB write
}
execution.status = status;
ExecutionRepository::update(pool, execution.id, execution.clone().into()).await?;
info!("Updated execution {} status: {:?} -> {:?}", execution_id, old_status, status);
// Handle completion logic...
Ok(())
}
Impact & Benefits
Performance Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| Completion messages per execution | 2 | 1 | 50% reduction |
| Queue manager warnings | Frequent | None | 100% elimination |
| Database writes (no status change) | Always | Never | 100% elimination |
| Log noise | High | Low | Significant reduction |
Typical Execution Flow
Before fixes:
- 1x execution completed
- 2x
execution.completedmessages published - 1x unnecessary database write (Completed → Completed)
- 1x queue manager warning
- Noisy logs with redundant "status: Completed -> Completed" messages
After fixes:
- 1x execution completed
- 1x
execution.completedmessage published (worker only) - 0x unnecessary database writes
- 0x queue manager warnings
- Clean, informative logs
High-Throughput Scenarios
At 1000 executions/minute:
Before:
- 2000 completion messages/min
- ~1000 unnecessary DB writes/min
- ~1000 warning logs/min
After:
- 1000 completion messages/min (50% reduction)
- 0 unnecessary DB writes (100% reduction)
- 0 warning logs (100% reduction)
Testing
✅ All 58 executor unit tests pass
✅ Zero compiler warnings
✅ No breaking changes to external behavior
✅ Orchestration logic (workflow children) still works correctly
Architecture Clarifications
Separation of Concerns
| Component | Responsibility |
|---|---|
| Worker | Authoritative source for execution completion, publishes completion notifications |
| Executor | Orchestration (workflows, child executions), NOT completion notifications |
| CompletionListener | Queue management (releases slots for queued executions) |
Idempotency
The executor is now idempotent with respect to status change messages:
- Receiving the same status change multiple times has no effect after the first
- Database is only written when state actually changes
- Orchestration logic (workflows) runs correctly regardless
Lessons Learned
- Message publishers should be explicit - Only one component should publish a given message type
- Always check for actual changes - Don't blindly write to database without comparing old/new values
- Separate orchestration from notification - Workflow logic shouldn't trigger duplicate notifications
- Log levels matter - Changed redundant updates from INFO to DEBUG to reduce noise
- Trust the source - Worker owns execution lifecycle; executor shouldn't second-guess it
Related Documentation
- Work Summary:
attune/work-summary/2026-02-09-duplicate-completion-fix.md - Queue Manager:
attune/crates/executor/src/queue_manager.rs - Completion Listener:
attune/crates/executor/src/completion_listener.rs - Execution Manager:
attune/crates/executor/src/execution_manager.rs