Files

David Culbreth 3b14c65998 re-uploading work

2026-02-04 17:46:30 -06:00

9.1 KiB

Raw Blame History

Orquesta-Style Workflow Refactoring Plan

Goal

Refactor the workflow execution engine from a dependency-based DAG model to a transition-based graph traversal model inspired by StackStorm's Orquesta engine. This will simplify the code and naturally support workflow cycles.

Current Problems

Over-engineered: Computing dependencies, levels, and topological sort that we never actually use
Not using transitions: We parse next transitions but execute based on dependencies instead
Artificial DAG restriction: Prevents legitimate use cases like monitoring loops
Polling-based: Continuously polls for "ready tasks" instead of reacting to completions

Orquesta Model Benefits

Simpler: Pure graph traversal following transitions
Event-driven: Task completions trigger next task scheduling
Naturally supports cycles: Workflows terminate when transitions stop scheduling tasks
Intuitive: Follow the next arrows in the workflow definition

Implementation Plan

Phase 1: Documentation Updates

Files to modify:

docs/workflow-execution-engine.md
work-summary/TODO.md

Changes:

Remove references to DAG and topological sort
Document transition-based execution model
Add examples of cyclic workflows (monitoring loops)
Document join semantics clearly
Document workflow termination conditions

Phase 2: Refactor Graph Module (`crates/executor/src/workflow/graph.rs`)

Remove:

CircularDependency error variant (cycles are now valid)
NoEntryPoint error variant (can have workflows with all tasks having inbound edges if manually started)
level field from TaskNode
execution_order field from TaskGraph
compute_levels() method (not needed)
Topological sort logic in From<GraphBuilder> for TaskGraph

Keep/Modify:

entry_points - still useful as default starting tasks
Renamed dependencies to inbound_edges - needed for entry point detection and join tracking
Renamed dependents to outbound_edges - needed for identifying edges
next_tasks() - KEY METHOD - evaluates transitions
Simplified compute_dependencies() to compute_inbound_edges() - only tracks inbound edges
Updated TaskNode.dependencies to TaskNode.inbound_tasks

Add:

get_inbound_tasks(&self, task_name: &str) -> Vec<String> - returns all tasks that can transition to this task
Documentation explaining that cycles are supported

Phase 3: Enhance Transition Evaluation

Files to modify:

crates/executor/src/workflow/graph.rs

Changes:

next_tasks() already returns task names based on success/failure
Add support for evaluating when conditions (deferred - needs context)
Consider returning a struct with task name + transition info instead of just String (deferred)

Phase 4: Add Join Tracking (`crates/executor/src/workflow/coordinator.rs`)

Add to WorkflowExecutionState:

scheduled_tasks: HashSet<String> - tasks scheduled but not yet executing
join_state: HashMap<String, HashSet<String>> - track which predecessors completed for each join task
Renamed current_tasks to executing_tasks for clarity

Add methods:

Join checking logic implemented in on_task_completion() method
- Checks if join conditions are met
- Returns true immediately if no join specified
- Returns true if join count reached

Phase 5: Refactor Workflow Coordinator

Files to modify:

crates/executor/src/workflow/coordinator.rs

Major refactor of WorkflowExecutionHandle::execute():

// NEW EXECUTION MODEL:
// 1. Schedule entry point tasks
// 2. Wait for task completions
// 3. On completion, evaluate transitions and schedule next tasks
// 4. Terminate when nothing executing and nothing scheduled

Changes:

Replaced polling ready_tasks with checking scheduled_tasks
Start execution by scheduling all entry point tasks
Removed graph.ready_tasks() call
Added spawn_task_execution() method that:
- Spawns task execution from main loop
Modified execute_task_async() to:
- Move task from scheduled to executing when starting
- On completion, evaluate graph.next_tasks()
- Call on_task_completion() to schedule next tasks
- Handle join state updates
Updated termination condition:
- scheduled_tasks.is_empty() && executing_tasks.is_empty()

Specific implementation steps:

Added spawn_task_execution() method
Added on_task_completion() method that evaluates transitions
Refactored execute() to start with entry points
Changed main loop to spawn scheduled tasks and check for completion
Updated execute_task_async() to call on_task_completion() at the end
Implemented join barrier logic in on_task_completion()

Phase 6: Update Tests

Files to modify:

crates/executor/src/workflow/graph.rs (tests module)
crates/executor/src/workflow/coordinator.rs (tests module)
Add new test files if needed

Test cases to add:

Simple cycle (task transitions to itself) - test_cycle_support
Complex cycle (task A -> B -> C -> A)
Cycle with termination condition (monitoring loop that exits)
Join with 2 parallel tasks
Join with N tasks (where join = 2 of 3)
Multiple entry points
Workflow with no entry points (all tasks have inbound edges) - test_cycle_support covers this
Task that transitions to multiple next tasks - test_parallel_entry_points covers this

Test cases to update:

Updated existing tests to work with new model
Removed dependency on circular dependency errors

Phase 7: Add Cycle Protection

Safety mechanisms to add:

Workflow execution timeout (max total execution time)
Task iteration limit (max times a single task can execute in one workflow)
Add to config: max_workflow_duration_seconds
Add to config: max_task_iterations_per_workflow
Track iteration count per task in WorkflowExecutionState

Phase 8: Update Workflow YAML Examples

Files to create/update:

Add example workflows demonstrating cycles
docs/examples/monitoring-loop.yaml
docs/examples/retry-with-cycle.yaml
docs/examples/conditional-loop.yaml

Phase 9: Final Documentation

Update:

README.md - mention cycle support
docs/workflow-execution-engine.md - complete rewrite of execution model section
docs/testing-status.md - add new test requirements
CHANGELOG.md - document the breaking change

Testing Strategy

Unit Tests: Test graph building, transition evaluation, join logic
Integration Tests: Test full workflow execution with cycles
Manual Testing: Run example workflows with monitoring loops
Performance Testing: Ensure cycle detection doesn't cause performance issues

Migration Notes

Breaking Changes:

Workflows that relied on implicit execution order from levels may behave differently
Cycles that were previously errors are now valid
Entry point detection behavior may change slightly

Backwards Compatibility:

All valid DAG workflows should continue to work
The transition model is more explicit and should be more predictable

Estimated Effort

Phase 1 (Docs): 1 hour (DEFERRED)
Phase 2 (Graph refactor): 2-3 hours ✅ COMPLETE
Phase 3 (Transition enhancement): 1 hour (PARTIAL - basic implementation done)
Phase 4 (Join tracking): 1-2 hours ✅ COMPLETE
Phase 5 (Coordinator refactor): 3-4 hours ✅ COMPLETE
Phase 6 (Tests): 2-3 hours (PARTIAL - basic tests updated, more needed)
Phase 7 (Cycle protection): 1-2 hours (DEFERRED - not critical for now)
Phase 8 (Examples): 1 hour (TODO)
Phase 9 (Final docs): 1 hour (TODO)

Total: 13-19 hours Completed so far: ~6-8 hours

Success Criteria

All existing tests pass ✅
New cycle tests pass ✅
Example monitoring loop workflow executes successfully
Documentation is complete and accurate
No performance regression (not tested yet)
Code is simpler than before (fewer lines, less complexity) ✅

Core Implementation Complete ✅

The fundamental refactoring from DAG to transition-based graph traversal is complete:

Removed all cycle detection code
Refactored graph building to use inbound/outbound edges
Implemented transition-based task scheduling
Added join barrier support
Updated tests to validate cycle support

Remaining work is primarily documentation and additional examples.

Implementation Order

Execute phases in order 1-9, completing all tasks in each phase before moving to the next. Commit after each phase for easy rollback if needed.

Notes from Orquesta Documentation

Key insights:

Tasks are nodes, transitions are edges
Entry points are tasks with no inbound edges
Workflow terminates when no tasks running AND no tasks scheduled
Join creates a barrier - single instance waits for multiple inbound transitions
Without join, task is invoked multiple times (once per inbound transition)
Fail-fast: task failure with no transition terminates workflow
Transitions evaluated in order, first matching transition wins

9.1 KiB Raw Blame History

Orquesta-Style Workflow Refactoring Plan

Goal

Current Problems

Orquesta Model Benefits

Implementation Plan

Phase 1: Documentation Updates

Phase 2: Refactor Graph Module (crates/executor/src/workflow/graph.rs)

Phase 3: Enhance Transition Evaluation

Phase 4: Add Join Tracking (crates/executor/src/workflow/coordinator.rs)

Phase 5: Refactor Workflow Coordinator

Phase 6: Update Tests

Phase 7: Add Cycle Protection

Phase 8: Update Workflow YAML Examples

Phase 9: Final Documentation

Testing Strategy

Migration Notes

Estimated Effort

Success Criteria

Core Implementation Complete ✅

Implementation Order

Notes from Orquesta Documentation

9.1 KiB

Raw Blame History

Phase 2: Refactor Graph Module (`crates/executor/src/workflow/graph.rs`)

Phase 4: Add Join Tracking (`crates/executor/src/workflow/coordinator.rs`)