Files
attune/work-summary/phases/orquesta-refactor-plan.md
2026-02-04 17:46:30 -06:00

9.1 KiB

Orquesta-Style Workflow Refactoring Plan

Goal

Refactor the workflow execution engine from a dependency-based DAG model to a transition-based graph traversal model inspired by StackStorm's Orquesta engine. This will simplify the code and naturally support workflow cycles.

Current Problems

  1. Over-engineered: Computing dependencies, levels, and topological sort that we never actually use
  2. Not using transitions: We parse next transitions but execute based on dependencies instead
  3. Artificial DAG restriction: Prevents legitimate use cases like monitoring loops
  4. Polling-based: Continuously polls for "ready tasks" instead of reacting to completions

Orquesta Model Benefits

  1. Simpler: Pure graph traversal following transitions
  2. Event-driven: Task completions trigger next task scheduling
  3. Naturally supports cycles: Workflows terminate when transitions stop scheduling tasks
  4. Intuitive: Follow the next arrows in the workflow definition

Implementation Plan

Phase 1: Documentation Updates

Files to modify:

  • docs/workflow-execution-engine.md
  • work-summary/TODO.md

Changes:

  • Remove references to DAG and topological sort
  • Document transition-based execution model
  • Add examples of cyclic workflows (monitoring loops)
  • Document join semantics clearly
  • Document workflow termination conditions

Phase 2: Refactor Graph Module (crates/executor/src/workflow/graph.rs)

Remove:

  • CircularDependency error variant (cycles are now valid)
  • NoEntryPoint error variant (can have workflows with all tasks having inbound edges if manually started)
  • level field from TaskNode
  • execution_order field from TaskGraph
  • compute_levels() method (not needed)
  • Topological sort logic in From<GraphBuilder> for TaskGraph

Keep/Modify:

  • entry_points - still useful as default starting tasks
  • Renamed dependencies to inbound_edges - needed for entry point detection and join tracking
  • Renamed dependents to outbound_edges - needed for identifying edges
  • next_tasks() - KEY METHOD - evaluates transitions
  • Simplified compute_dependencies() to compute_inbound_edges() - only tracks inbound edges
  • Updated TaskNode.dependencies to TaskNode.inbound_tasks

Add:

  • get_inbound_tasks(&self, task_name: &str) -> Vec<String> - returns all tasks that can transition to this task
  • Documentation explaining that cycles are supported

Phase 3: Enhance Transition Evaluation

Files to modify:

  • crates/executor/src/workflow/graph.rs

Changes:

  • next_tasks() already returns task names based on success/failure
  • Add support for evaluating when conditions (deferred - needs context)
  • Consider returning a struct with task name + transition info instead of just String (deferred)

Phase 4: Add Join Tracking (crates/executor/src/workflow/coordinator.rs)

Add to WorkflowExecutionState:

  • scheduled_tasks: HashSet<String> - tasks scheduled but not yet executing
  • join_state: HashMap<String, HashSet<String>> - track which predecessors completed for each join task
  • Renamed current_tasks to executing_tasks for clarity

Add methods:

  • Join checking logic implemented in on_task_completion() method
    • Checks if join conditions are met
    • Returns true immediately if no join specified
    • Returns true if join count reached

Phase 5: Refactor Workflow Coordinator

Files to modify:

  • crates/executor/src/workflow/coordinator.rs

Major refactor of WorkflowExecutionHandle::execute():

// NEW EXECUTION MODEL:
// 1. Schedule entry point tasks
// 2. Wait for task completions
// 3. On completion, evaluate transitions and schedule next tasks
// 4. Terminate when nothing executing and nothing scheduled

Changes:

  • Replaced polling ready_tasks with checking scheduled_tasks
  • Start execution by scheduling all entry point tasks
  • Removed graph.ready_tasks() call
  • Added spawn_task_execution() method that:
    • Spawns task execution from main loop
  • Modified execute_task_async() to:
    • Move task from scheduled to executing when starting
    • On completion, evaluate graph.next_tasks()
    • Call on_task_completion() to schedule next tasks
    • Handle join state updates
  • Updated termination condition:
    • scheduled_tasks.is_empty() && executing_tasks.is_empty()

Specific implementation steps:

  1. Added spawn_task_execution() method
  2. Added on_task_completion() method that evaluates transitions
  3. Refactored execute() to start with entry points
  4. Changed main loop to spawn scheduled tasks and check for completion
  5. Updated execute_task_async() to call on_task_completion() at the end
  6. Implemented join barrier logic in on_task_completion()

Phase 6: Update Tests

Files to modify:

  • crates/executor/src/workflow/graph.rs (tests module)
  • crates/executor/src/workflow/coordinator.rs (tests module)
  • Add new test files if needed

Test cases to add:

  • Simple cycle (task transitions to itself) - test_cycle_support
  • Complex cycle (task A -> B -> C -> A)
  • Cycle with termination condition (monitoring loop that exits)
  • Join with 2 parallel tasks
  • Join with N tasks (where join = 2 of 3)
  • Multiple entry points
  • Workflow with no entry points (all tasks have inbound edges) - test_cycle_support covers this
  • Task that transitions to multiple next tasks - test_parallel_entry_points covers this

Test cases to update:

  • Updated existing tests to work with new model
  • Removed dependency on circular dependency errors

Phase 7: Add Cycle Protection

Safety mechanisms to add:

  • Workflow execution timeout (max total execution time)
  • Task iteration limit (max times a single task can execute in one workflow)
  • Add to config: max_workflow_duration_seconds
  • Add to config: max_task_iterations_per_workflow
  • Track iteration count per task in WorkflowExecutionState

Phase 8: Update Workflow YAML Examples

Files to create/update:

  • Add example workflows demonstrating cycles
  • docs/examples/monitoring-loop.yaml
  • docs/examples/retry-with-cycle.yaml
  • docs/examples/conditional-loop.yaml

Phase 9: Final Documentation

Update:

  • README.md - mention cycle support
  • docs/workflow-execution-engine.md - complete rewrite of execution model section
  • docs/testing-status.md - add new test requirements
  • CHANGELOG.md - document the breaking change

Testing Strategy

  1. Unit Tests: Test graph building, transition evaluation, join logic
  2. Integration Tests: Test full workflow execution with cycles
  3. Manual Testing: Run example workflows with monitoring loops
  4. Performance Testing: Ensure cycle detection doesn't cause performance issues

Migration Notes

Breaking Changes:

  • Workflows that relied on implicit execution order from levels may behave differently
  • Cycles that were previously errors are now valid
  • Entry point detection behavior may change slightly

Backwards Compatibility:

  • All valid DAG workflows should continue to work
  • The transition model is more explicit and should be more predictable

Estimated Effort

  • Phase 1 (Docs): 1 hour (DEFERRED)
  • Phase 2 (Graph refactor): 2-3 hours COMPLETE
  • Phase 3 (Transition enhancement): 1 hour (PARTIAL - basic implementation done)
  • Phase 4 (Join tracking): 1-2 hours COMPLETE
  • Phase 5 (Coordinator refactor): 3-4 hours COMPLETE
  • Phase 6 (Tests): 2-3 hours (PARTIAL - basic tests updated, more needed)
  • Phase 7 (Cycle protection): 1-2 hours (DEFERRED - not critical for now)
  • Phase 8 (Examples): 1 hour (TODO)
  • Phase 9 (Final docs): 1 hour (TODO)

Total: 13-19 hours Completed so far: ~6-8 hours

Success Criteria

  1. All existing tests pass
  2. New cycle tests pass
  3. Example monitoring loop workflow executes successfully
  4. Documentation is complete and accurate
  5. No performance regression (not tested yet)
  6. Code is simpler than before (fewer lines, less complexity)

Core Implementation Complete

The fundamental refactoring from DAG to transition-based graph traversal is complete:

  • Removed all cycle detection code
  • Refactored graph building to use inbound/outbound edges
  • Implemented transition-based task scheduling
  • Added join barrier support
  • Updated tests to validate cycle support

Remaining work is primarily documentation and additional examples.

Implementation Order

Execute phases in order 1-9, completing all tasks in each phase before moving to the next. Commit after each phase for easy rollback if needed.


Notes from Orquesta Documentation

Key insights:

  • Tasks are nodes, transitions are edges
  • Entry points are tasks with no inbound edges
  • Workflow terminates when no tasks running AND no tasks scheduled
  • Join creates a barrier - single instance waits for multiple inbound transitions
  • Without join, task is invoked multiple times (once per inbound transition)
  • Fail-fast: task failure with no transition terminates workflow
  • Transitions evaluated in order, first matching transition wins