re-uploading work
This commit is contained in:
641
docs/workflows/workflow-execution-engine.md
Normal file
641
docs/workflows/workflow-execution-engine.md
Normal file
@@ -0,0 +1,641 @@
|
||||
# Workflow Execution Engine
|
||||
|
||||
## Overview
|
||||
|
||||
The Workflow Execution Engine is responsible for orchestrating the execution of workflows in Attune. It manages task dependencies, parallel execution, state transitions, context passing, retries, timeouts, and error handling.
|
||||
|
||||
## Architecture
|
||||
|
||||
The execution engine consists of four main components:
|
||||
|
||||
### 1. Task Graph Builder (`workflow/graph.rs`)
|
||||
|
||||
**Purpose:** Converts workflow definitions into executable task graphs with dependency information.
|
||||
|
||||
**Key Features:**
|
||||
- Builds directed acyclic graph (DAG) from workflow tasks
|
||||
- Topological sorting for execution order
|
||||
- Dependency computation from task transitions
|
||||
- Cycle detection
|
||||
- Entry point identification
|
||||
|
||||
**Data Structures:**
|
||||
- `TaskGraph`: Complete executable graph with nodes, dependencies, and execution order
|
||||
- `TaskNode`: Individual task with configuration, transitions, and dependencies
|
||||
- `TaskTransitions`: Success/failure/complete/timeout transitions and decision branches
|
||||
- `RetryConfig`: Retry configuration with backoff strategies
|
||||
|
||||
**Example Usage:**
|
||||
```rust
|
||||
use attune_executor::workflow::{TaskGraph, parse_workflow_yaml};
|
||||
|
||||
let workflow = parse_workflow_yaml(yaml_content)?;
|
||||
let graph = TaskGraph::from_workflow(&workflow)?;
|
||||
|
||||
// Get entry points (tasks with no dependencies)
|
||||
for entry in &graph.entry_points {
|
||||
println!("Entry point: {}", entry);
|
||||
}
|
||||
|
||||
// Get tasks ready to execute
|
||||
let completed = HashSet::new();
|
||||
let ready = graph.ready_tasks(&completed);
|
||||
```
|
||||
|
||||
### 2. Context Manager (`workflow/context.rs`)
|
||||
|
||||
**Purpose:** Manages workflow execution context, including variables, parameters, and template rendering.
|
||||
|
||||
**Key Features:**
|
||||
- Workflow-level and task-level variable management
|
||||
- Jinja2-like template rendering with `{{ variable }}` syntax
|
||||
- Task result storage and retrieval
|
||||
- With-items iteration support (current item and index)
|
||||
- Nested value access (e.g., `{{ parameters.config.server.port }}`)
|
||||
- Context import/export for persistence
|
||||
|
||||
**Variable Scopes:**
|
||||
- `parameters.*` - Input parameters to the workflow
|
||||
- `vars.*` or `variables.*` - Workflow-scoped variables
|
||||
- `task.*` or `tasks.*` - Task results
|
||||
- `item` - Current item in with-items iteration
|
||||
- `index` - Current index in with-items iteration
|
||||
- `system.*` - System variables (e.g., workflow start time)
|
||||
|
||||
**Example Usage:**
|
||||
```rust
|
||||
use attune_executor::workflow::WorkflowContext;
|
||||
use serde_json::json;
|
||||
|
||||
let params = json!({"name": "Alice"});
|
||||
let mut ctx = WorkflowContext::new(params, HashMap::new());
|
||||
|
||||
// Render template
|
||||
let result = ctx.render_template("Hello {{ parameters.name }}!")?;
|
||||
// Result: "Hello Alice!"
|
||||
|
||||
// Store task result
|
||||
ctx.set_task_result("task1", json!({"status": "success"}));
|
||||
|
||||
// Publish variables from result
|
||||
let result = json!({"output": "value"});
|
||||
ctx.publish_from_result(&result, &["my_var".to_string()], None)?;
|
||||
```
|
||||
|
||||
### 3. Task Executor (`workflow/task_executor.rs`)
|
||||
|
||||
**Purpose:** Executes individual workflow tasks with support for different task types, retries, and timeouts.
|
||||
|
||||
**Key Features:**
|
||||
- Action task execution (queues actions for workers)
|
||||
- Parallel task execution (spawns multiple tasks concurrently)
|
||||
- Workflow task execution (nested workflows - TODO)
|
||||
- With-items iteration (batch processing with concurrency limits)
|
||||
- Conditional execution (when clauses)
|
||||
- Retry logic with configurable backoff strategies
|
||||
- Timeout handling
|
||||
- Task result publishing to context
|
||||
|
||||
**Task Types:**
|
||||
- **Action**: Execute a single action
|
||||
- **Parallel**: Execute multiple sub-tasks concurrently
|
||||
- **Workflow**: Execute a nested workflow (not yet implemented)
|
||||
|
||||
**Retry Strategies:**
|
||||
- **Constant**: Fixed delay between retries
|
||||
- **Linear**: Linearly increasing delay
|
||||
- **Exponential**: Exponentially increasing delay with optional max delay
|
||||
|
||||
**Example Task Execution Flow:**
|
||||
```
|
||||
1. Check if task should be skipped (when condition)
|
||||
2. Check if task has with-items iteration
|
||||
- If yes, process items in batches with concurrency limits
|
||||
- If no, execute single task
|
||||
3. Render task input with context
|
||||
4. Execute based on task type (action/parallel/workflow)
|
||||
5. Apply timeout if configured
|
||||
6. Handle retries on failure
|
||||
7. Publish variables from result
|
||||
8. Update task execution record in database
|
||||
```
|
||||
|
||||
### 4. Workflow Coordinator (`workflow/coordinator.rs`)
|
||||
|
||||
**Purpose:** Main orchestration component that manages the complete workflow execution lifecycle.
|
||||
|
||||
**Key Features:**
|
||||
- Workflow lifecycle management (start, pause, resume, cancel)
|
||||
- State management (completed, failed, skipped tasks)
|
||||
- Concurrent task execution coordination
|
||||
- Database state persistence
|
||||
- Execution result aggregation
|
||||
- Error handling and recovery
|
||||
|
||||
**Workflow Execution States:**
|
||||
- `Requested` - Workflow execution requested
|
||||
- `Scheduling` - Being scheduled
|
||||
- `Scheduled` - Ready to execute
|
||||
- `Running` - Currently executing
|
||||
- `Completed` - Successfully completed
|
||||
- `Failed` - Failed with errors
|
||||
- `Cancelled` - Cancelled by user
|
||||
- `Timeout` - Timed out
|
||||
|
||||
**Example Usage:**
|
||||
```rust
|
||||
use attune_executor::workflow::WorkflowCoordinator;
|
||||
use serde_json::json;
|
||||
|
||||
let coordinator = WorkflowCoordinator::new(db_pool, mq);
|
||||
|
||||
// Start workflow execution
|
||||
let handle = coordinator
|
||||
.start_workflow("my_pack.my_workflow", json!({"param": "value"}), None)
|
||||
.await?;
|
||||
|
||||
// Execute to completion
|
||||
let result = handle.execute().await?;
|
||||
|
||||
println!("Status: {:?}", result.status);
|
||||
println!("Completed tasks: {}", result.completed_tasks);
|
||||
println!("Failed tasks: {}", result.failed_tasks);
|
||||
|
||||
// Or control execution
|
||||
handle.pause(Some("User requested pause".to_string())).await?;
|
||||
handle.resume().await?;
|
||||
handle.cancel().await?;
|
||||
|
||||
// Check status
|
||||
let status = handle.status().await;
|
||||
println!("Current: {}/{} tasks", status.completed_tasks, status.total_tasks);
|
||||
```
|
||||
|
||||
## Execution Flow
|
||||
|
||||
### High-Level Workflow Execution
|
||||
|
||||
```
|
||||
1. Load workflow definition from database
|
||||
2. Parse workflow YAML definition
|
||||
3. Build task graph with dependencies
|
||||
4. Create parent execution record
|
||||
5. Initialize workflow context with parameters and variables
|
||||
6. Create workflow execution record in database
|
||||
7. Enter execution loop:
|
||||
a. Check if workflow is paused -> wait
|
||||
b. Check if workflow is complete -> exit
|
||||
c. Get ready tasks (dependencies satisfied)
|
||||
d. Spawn async execution for each ready task
|
||||
e. Wait briefly before checking again
|
||||
8. Aggregate results and return
|
||||
```
|
||||
|
||||
### Task Execution Flow
|
||||
|
||||
```
|
||||
1. Create task execution record in database
|
||||
2. Get current workflow context
|
||||
3. Execute task (action/parallel/workflow/with-items)
|
||||
4. Update task execution record with result
|
||||
5. Update workflow state:
|
||||
- Add to completed_tasks on success
|
||||
- Add to failed_tasks on failure (unless retrying)
|
||||
- Add to skipped_tasks if skipped
|
||||
- Update context with task result
|
||||
6. Persist workflow state to database
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### workflow_execution Table
|
||||
|
||||
Stores workflow execution state:
|
||||
|
||||
```sql
|
||||
CREATE TABLE attune.workflow_execution (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
execution BIGINT NOT NULL REFERENCES attune.execution(id),
|
||||
workflow_def BIGINT NOT NULL REFERENCES attune.workflow_definition(id),
|
||||
current_tasks TEXT[] NOT NULL DEFAULT '{}',
|
||||
completed_tasks TEXT[] NOT NULL DEFAULT '{}',
|
||||
failed_tasks TEXT[] NOT NULL DEFAULT '{}',
|
||||
skipped_tasks TEXT[] NOT NULL DEFAULT '{}',
|
||||
variables JSONB NOT NULL DEFAULT '{}',
|
||||
task_graph JSONB NOT NULL,
|
||||
status execution_status_enum NOT NULL,
|
||||
error_message TEXT,
|
||||
paused BOOLEAN NOT NULL DEFAULT false,
|
||||
pause_reason TEXT,
|
||||
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
||||
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
### workflow_task_execution Table
|
||||
|
||||
Stores individual task execution state:
|
||||
|
||||
```sql
|
||||
CREATE TABLE attune.workflow_task_execution (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
workflow_execution BIGINT NOT NULL REFERENCES attune.workflow_execution(id),
|
||||
execution BIGINT NOT NULL REFERENCES attune.execution(id),
|
||||
task_name TEXT NOT NULL,
|
||||
task_index INTEGER,
|
||||
task_batch INTEGER,
|
||||
status execution_status_enum NOT NULL,
|
||||
started_at TIMESTAMP WITH TIME ZONE,
|
||||
completed_at TIMESTAMP WITH TIME ZONE,
|
||||
duration_ms BIGINT,
|
||||
result JSONB,
|
||||
error JSONB,
|
||||
retry_count INTEGER NOT NULL DEFAULT 0,
|
||||
max_retries INTEGER NOT NULL DEFAULT 0,
|
||||
next_retry_at TIMESTAMP WITH TIME ZONE,
|
||||
timeout_seconds INTEGER,
|
||||
timed_out BOOLEAN NOT NULL DEFAULT false,
|
||||
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
|
||||
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
## Template Rendering
|
||||
|
||||
### Syntax
|
||||
|
||||
Templates use Jinja2-like syntax with `{{ expression }}`:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: greet
|
||||
action: core.echo
|
||||
input:
|
||||
message: "Hello {{ parameters.name }}!"
|
||||
|
||||
- name: process
|
||||
action: core.process
|
||||
input:
|
||||
data: "{{ task.greet.result.output }}"
|
||||
count: "{{ variables.counter }}"
|
||||
```
|
||||
|
||||
### Supported Expressions
|
||||
|
||||
**Parameters:**
|
||||
```
|
||||
{{ parameters.name }}
|
||||
{{ parameters.config.server.port }}
|
||||
```
|
||||
|
||||
**Variables:**
|
||||
```
|
||||
{{ vars.my_variable }}
|
||||
{{ variables.counter }}
|
||||
{{ my_var }} # Direct variable reference
|
||||
```
|
||||
|
||||
**Task Results:**
|
||||
```
|
||||
{{ task.task_name.result }}
|
||||
{{ task.task_name.output.key }}
|
||||
{{ tasks.previous_task.status }}
|
||||
```
|
||||
|
||||
**With-Items Context:**
|
||||
```
|
||||
{{ item }}
|
||||
{{ item.name }}
|
||||
{{ index }}
|
||||
```
|
||||
|
||||
**System Variables:**
|
||||
```
|
||||
{{ system.workflow_start }}
|
||||
```
|
||||
|
||||
## With-Items Iteration
|
||||
|
||||
Execute a task multiple times with different items:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: process_servers
|
||||
action: server.configure
|
||||
with_items: "{{ parameters.servers }}"
|
||||
batch_size: 5 # Process 5 items at a time
|
||||
concurrency: 10 # Max 10 concurrent executions
|
||||
input:
|
||||
server: "{{ item.hostname }}"
|
||||
index: "{{ index }}"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Batch processing: Process items in batches of specified size
|
||||
- Concurrency control: Limit number of concurrent executions
|
||||
- Context isolation: Each iteration has its own `item` and `index`
|
||||
- Result aggregation: All results collected in array
|
||||
|
||||
## Retry Strategies
|
||||
|
||||
### Constant Backoff
|
||||
|
||||
Fixed delay between retries:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: flaky_task
|
||||
action: external.api_call
|
||||
retry:
|
||||
count: 3
|
||||
delay: 10 # 10 seconds between each retry
|
||||
backoff: constant
|
||||
```
|
||||
|
||||
### Linear Backoff
|
||||
|
||||
Linearly increasing delay:
|
||||
|
||||
```yaml
|
||||
retry:
|
||||
count: 5
|
||||
delay: 5
|
||||
backoff: linear
|
||||
# Delays: 5s, 10s, 15s, 20s, 25s
|
||||
```
|
||||
|
||||
### Exponential Backoff
|
||||
|
||||
Exponentially increasing delay:
|
||||
|
||||
```yaml
|
||||
retry:
|
||||
count: 5
|
||||
delay: 2
|
||||
backoff: exponential
|
||||
max_delay: 60
|
||||
# Delays: 2s, 4s, 8s, 16s, 32s (capped at 60s)
|
||||
```
|
||||
|
||||
## Task Transitions
|
||||
|
||||
Control workflow flow with transitions:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: check
|
||||
action: core.check_status
|
||||
on_success: deploy # Go to deploy on success
|
||||
on_failure: rollback # Go to rollback on failure
|
||||
on_complete: notify # Always go to notify
|
||||
on_timeout: alert # Go to alert on timeout
|
||||
|
||||
- name: decision
|
||||
action: core.evaluate
|
||||
decision:
|
||||
- when: "{{ task.decision.result.action == 'approve' }}"
|
||||
next: deploy
|
||||
- when: "{{ task.decision.result.action == 'reject' }}"
|
||||
next: rollback
|
||||
- default: true
|
||||
next: manual_review
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Task Execution Errors
|
||||
|
||||
Errors are captured with:
|
||||
- Error message
|
||||
- Error type
|
||||
- Optional error details (JSON)
|
||||
|
||||
### Workflow Failure Handling
|
||||
|
||||
- Individual task failures don't immediately stop the workflow
|
||||
- Dependent tasks won't execute if prerequisites failed
|
||||
- Workflow completes when all executable tasks finish
|
||||
- Final status is `Failed` if any task failed
|
||||
|
||||
### Retry on Error
|
||||
|
||||
```yaml
|
||||
retry:
|
||||
count: 3
|
||||
delay: 5
|
||||
backoff: exponential
|
||||
on_error: "{{ result.error_code == 'RETRY_ABLE' }}" # Only retry specific errors
|
||||
```
|
||||
|
||||
## Parallel Execution
|
||||
|
||||
Execute multiple tasks concurrently:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: parallel_checks
|
||||
type: parallel
|
||||
tasks:
|
||||
- name: check_service_a
|
||||
action: monitoring.check_health
|
||||
input:
|
||||
service: "service-a"
|
||||
|
||||
- name: check_service_b
|
||||
action: monitoring.check_health
|
||||
input:
|
||||
service: "service-b"
|
||||
|
||||
- name: check_database
|
||||
action: monitoring.check_db
|
||||
|
||||
on_success: deploy
|
||||
on_failure: abort
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- All sub-tasks execute concurrently
|
||||
- Parent task waits for all sub-tasks to complete
|
||||
- Success only if all sub-tasks succeed
|
||||
- Individual sub-task results aggregated
|
||||
|
||||
## Conditional Execution
|
||||
|
||||
Skip tasks based on conditions:
|
||||
|
||||
```yaml
|
||||
tasks:
|
||||
- name: deploy
|
||||
action: deployment.deploy
|
||||
when: "{{ parameters.environment == 'production' }}"
|
||||
input:
|
||||
version: "{{ parameters.version }}"
|
||||
```
|
||||
|
||||
**When Clause Evaluation:**
|
||||
- Template rendered with current context
|
||||
- Evaluated as boolean (truthy/falsy)
|
||||
- Task skipped if condition is false
|
||||
|
||||
## State Persistence
|
||||
|
||||
Workflow state is persisted to the database after every task completion:
|
||||
|
||||
- Current executing tasks
|
||||
- Completed tasks list
|
||||
- Failed tasks list
|
||||
- Skipped tasks list
|
||||
- Workflow variables (entire context)
|
||||
- Execution status
|
||||
- Pause state and reason
|
||||
- Error messages
|
||||
|
||||
This enables:
|
||||
- Workflow resume after service restart
|
||||
- Pause/resume functionality
|
||||
- Execution history and auditing
|
||||
- Progress monitoring
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Message Queue
|
||||
|
||||
Tasks queue action executions via RabbitMQ:
|
||||
|
||||
```rust
|
||||
// Task executor creates execution record
|
||||
let execution = create_execution_record(...).await?;
|
||||
|
||||
// Queues execution for worker (TODO: implement MQ publishing)
|
||||
self.mq.publish_execution_request(execution.id, action_ref, &input).await?;
|
||||
```
|
||||
|
||||
### Worker Coordination
|
||||
|
||||
- Executor creates execution records
|
||||
- Workers pick up and execute actions
|
||||
- Workers update execution status
|
||||
- Coordinator monitors completion (TODO: implement completion listener)
|
||||
|
||||
### Event Publishing
|
||||
|
||||
Workflow events should be published for:
|
||||
- Workflow started
|
||||
- Workflow completed/failed
|
||||
- Task started/completed/failed
|
||||
- Workflow paused/resumed/cancelled
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### TODO Items
|
||||
|
||||
1. **Completion Listener**: Listen for task completion events from workers
|
||||
2. **Nested Workflows**: Execute workflows as tasks within workflows
|
||||
3. **MQ Publishing**: Implement actual message queue publishing for action execution
|
||||
4. **Advanced Expressions**: Support comparisons, logical operators in templates
|
||||
5. **Error Condition Evaluation**: Evaluate `on_error` expressions for selective retries
|
||||
6. **Workflow Timeouts**: Global workflow timeout configuration
|
||||
7. **Task Dependencies**: Explicit `depends_on` task specification
|
||||
8. **Loop Constructs**: While/until loops in addition to with-items
|
||||
9. **Manual Steps**: Human-in-the-loop approval tasks
|
||||
10. **Sub-workflow Output**: Capture and use nested workflow results
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests
|
||||
|
||||
Each module includes unit tests:
|
||||
|
||||
```bash
|
||||
# Run all executor tests
|
||||
cargo test -p attune-executor
|
||||
|
||||
# Run specific module tests
|
||||
cargo test -p attune-executor --lib workflow::graph
|
||||
cargo test -p attune-executor --lib workflow::context
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
|
||||
Integration tests require database and message queue:
|
||||
|
||||
```bash
|
||||
# Set up test database
|
||||
export DATABASE_URL="postgresql://attune_test:attune_test@localhost:5432/attune_test"
|
||||
sqlx migrate run
|
||||
|
||||
# Run integration tests
|
||||
cargo test -p attune-executor --test '*'
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Concurrency
|
||||
|
||||
- Parallel tasks execute truly concurrently using `futures::join_all`
|
||||
- With-items supports configurable concurrency limits
|
||||
- Task graph execution is optimized with topological sorting
|
||||
|
||||
### Database Operations
|
||||
|
||||
- Workflow state persisted after each task completion
|
||||
- Batch operations used where possible
|
||||
- Connection pooling for database access
|
||||
|
||||
### Memory
|
||||
|
||||
- Task graphs and contexts can be large for complex workflows
|
||||
- Consider workflow size limits in production
|
||||
- Context variables should be reasonably sized
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Workflow Not Progressing
|
||||
|
||||
**Symptoms**: Workflow stuck in Running state
|
||||
|
||||
**Causes**:
|
||||
- Circular dependencies (should be caught during parsing)
|
||||
- All tasks waiting on failed dependencies
|
||||
- Database connection issues
|
||||
|
||||
**Solution**: Check workflow state in database, review task dependencies
|
||||
|
||||
### Tasks Not Executing
|
||||
|
||||
**Symptoms**: Ready tasks not starting
|
||||
|
||||
**Causes**:
|
||||
- Worker service not running
|
||||
- Message queue not connected
|
||||
- Execution records not being created
|
||||
|
||||
**Solution**: Check worker logs, verify MQ connection, check database
|
||||
|
||||
### Template Rendering Errors
|
||||
|
||||
**Symptoms**: Tasks fail with template errors
|
||||
|
||||
**Causes**:
|
||||
- Invalid variable references
|
||||
- Missing context data
|
||||
- Malformed expressions
|
||||
|
||||
**Solution**: Validate templates, check available context variables
|
||||
|
||||
## Examples
|
||||
|
||||
See `docs/workflows/` for complete workflow examples demonstrating:
|
||||
- Sequential workflows
|
||||
- Parallel execution
|
||||
- With-items iteration
|
||||
- Conditional execution
|
||||
- Error handling and retries
|
||||
- Complex workflows with decisions
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Workflow Definition Format](workflow-definition-format.md)
|
||||
- [Pack Integration](api-pack-workflows.md)
|
||||
- [Execution API](api-executions.md)
|
||||
- [Message Queue Architecture](message-queue.md)
|
||||
Reference in New Issue
Block a user