Files
attune/docs/QUICKREF-execution-state-ownership.md

7.6 KiB

Quick Reference: Execution State Ownership

Last Updated: 2026-02-09

Ownership Model at a Glance

┌──────────────────────────────────────────────────────────┐
│  EXECUTOR OWNS                │  WORKER OWNS             │
│  Requested                    │  Running                 │
│  Scheduling                   │  Completed               │
│  Scheduled                    │  Failed                  │
│  (+ pre-handoff Cancelled)    │  (+ post-handoff         │
│                               │     Cancelled/Timeout/   │
│                               │     Abandoned)           │
└───────────────────────────────┴──────────────────────────┘
            │                           │
            └─────── HANDOFF ──────────┘
        execution.scheduled PUBLISHED

Who Updates the Database?

Executor Updates (Pre-Handoff Only)

  • Creates execution record
  • Updates status: RequestedSchedulingScheduled
  • Publishes execution.scheduled message ← HANDOFF POINT
  • Handles cancellations/failures BEFORE handoff (worker never notified)
  • NEVER updates after execution.scheduled is published

Worker Updates (Post-Handoff Only)

  • Receives execution.scheduled message (takes ownership)
  • Updates status: ScheduledRunning
  • Updates status: RunningCompleted/Failed/Cancelled/etc.
  • Handles cancellations/failures AFTER handoff
  • Updates result data
  • Writes for every status change after receiving handoff

Who Publishes Messages?

Executor Publishes

  • enforcement.created (from rules)
  • execution.requested (to scheduler)
  • execution.scheduled (to worker) ← HANDOFF MESSAGE - OWNERSHIP TRANSFER

Worker Publishes

  • execution.status_changed (for each status change after handoff)
  • execution.completed (when done)

Executor Receives (But Doesn't Update DB Post-Handoff)

  • execution.status_changed → triggers orchestration logic (read-only)
  • execution.completed → releases queue slots

Code Locations

Executor Updates DB

// crates/executor/src/scheduler.rs
execution.status = ExecutionStatus::Scheduled;
ExecutionRepository::update(pool, execution.id, execution.into()).await?;

Worker Updates DB

// crates/worker/src/executor.rs
self.update_execution_status(execution_id, ExecutionStatus::Running).await?;
// ...
ExecutionRepository::update(&self.pool, execution_id, input).await?;

Executor Orchestrates (Read-Only)

// crates/executor/src/execution_manager.rs
async fn process_status_change(...) -> Result<()> {
    let execution = ExecutionRepository::find_by_id(pool, execution_id).await?;
    // NO UPDATE - just orchestration logic
    Self::handle_completion(pool, publisher, &execution).await?;
}

Decision Tree: Should I Update the DB?

Are you in the Executor?
├─ Have you published execution.scheduled for this execution?
│  ├─ NO → Update DB (you own it)
│  │  └─ Includes: Requested/Scheduling/Scheduled/pre-handoff Cancelled
│  └─ YES → Don't update DB (worker owns it now)
│     └─ Just orchestrate (trigger workflows, etc)
│
Are you in the Worker?
├─ Have you received execution.scheduled for this execution?
│  ├─ YES → Update DB for ALL status changes (you own it)
│  │  └─ Includes: Running/Completed/Failed/post-handoff Cancelled/etc.
│  └─ NO → Don't touch this execution (doesn't exist for you yet)

Common Patterns

DO: Worker Updates After Handoff

// Worker receives execution.scheduled
self.update_execution_status(execution_id, ExecutionStatus::Running).await?;
self.publish_status_update(execution_id, ExecutionStatus::Running).await?;

DO: Executor Orchestrates Without DB Write

// Executor receives execution.status_changed
let execution = ExecutionRepository::find_by_id(pool, execution_id).await?;
if status == ExecutionStatus::Completed {
    Self::trigger_child_executions(pool, publisher, &execution).await?;
}

DON'T: Executor Updates After Handoff

// Executor receives execution.status_changed
execution.status = status;
ExecutionRepository::update(pool, execution.id, execution).await?; // ❌ WRONG!

DON'T: Worker Updates Before Handoff

// Worker updates execution it hasn't received via execution.scheduled
ExecutionRepository::update(&self.pool, execution_id, input).await?; // ❌ WRONG!

DO: Executor Handles Pre-Handoff Cancellation

// User cancels execution before it's scheduled to worker
// Execution is still in Requested/Scheduling state
execution.status = ExecutionStatus::Cancelled;
ExecutionRepository::update(pool, execution_id, execution).await?; // ✅ CORRECT!
// Worker never receives execution.scheduled, never knows execution existed

DO: Worker Handles Post-Handoff Cancellation

// Worker received execution.scheduled, now owns execution
// User cancels execution while it's running
execution.status = ExecutionStatus::Cancelled;
ExecutionRepository::update(&self.pool, execution_id, execution).await?; // ✅ CORRECT!
self.publish_status_update(execution_id, ExecutionStatus::Cancelled).await?;

Handoff Checklist

When an execution is scheduled:

Executor Must:

  • Update status to Scheduled
  • Write to database
  • Publish execution.scheduled message ← HANDOFF OCCURS HERE
  • Stop updating this execution (ownership transferred)
  • Continue to handle orchestration (read-only)

Worker Must:

  • Receive execution.scheduled message ← OWNERSHIP RECEIVED
  • Take ownership of execution state
  • Update DB for all future status changes
  • Handle any cancellations/failures after this point
  • Publish status notifications

Important: If execution is cancelled BEFORE executor publishes execution.scheduled, the executor updates status to Cancelled and worker never learns about it.

Benefits Summary

Aspect Benefit
Race Conditions Eliminated - only one owner per stage
DB Writes Reduced by ~50% - no duplicates
Code Clarity Clear boundaries - easy to reason about
Message Traffic Reduced - no duplicate completions
Idempotency Safe to receive duplicate messages

Troubleshooting

Execution Stuck in "Scheduled"

Problem: Worker not updating status to Running
Check: Was execution.scheduled published? Worker received it? Worker healthy?

Workflow Children Not Triggering

Problem: Orchestration not running
Check: Worker published execution.status_changed? Message queue healthy?

Duplicate Status Updates

Problem: Both services updating DB
Check: Executor should NOT update after publishing execution.scheduled

Execution Cancelled But Status Not Updated

Problem: Cancellation not reflected in database
Check: Was it cancelled before or after handoff?
Fix: If before handoff → executor updates; if after handoff → worker updates

Queue Warnings

Problem: Duplicate completion notifications
Check: Only worker should publish execution.completed

See Also

  • Full Architecture Doc: docs/ARCHITECTURE-execution-state-ownership.md
  • Bug Fix Visualization: docs/BUGFIX-duplicate-completion-2026-02-09.md
  • Work Summary: work-summary/2026-02-09-execution-state-ownership.md