re-uploading work

2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions
--- a/docs/workflows/workflow-execution-engine.md
+++ b/docs/workflows/workflow-execution-engine.md
@@ -0,0 +1,641 @@
+# Workflow Execution Engine
+
+## Overview
+
+The Workflow Execution Engine is responsible for orchestrating the execution of workflows in Attune. It manages task dependencies, parallel execution, state transitions, context passing, retries, timeouts, and error handling.
+
+## Architecture
+
+The execution engine consists of four main components:
+
+### 1. Task Graph Builder (`workflow/graph.rs`)
+
+**Purpose:** Converts workflow definitions into executable task graphs with dependency information.
+
+**Key Features:**
+- Builds directed acyclic graph (DAG) from workflow tasks
+- Topological sorting for execution order
+- Dependency computation from task transitions
+- Cycle detection
+- Entry point identification
+
+**Data Structures:**
+- `TaskGraph`: Complete executable graph with nodes, dependencies, and execution order
+- `TaskNode`: Individual task with configuration, transitions, and dependencies
+- `TaskTransitions`: Success/failure/complete/timeout transitions and decision branches
+- `RetryConfig`: Retry configuration with backoff strategies
+
+**Example Usage:**
+```rust
+use attune_executor::workflow::{TaskGraph, parse_workflow_yaml};
+
+let workflow = parse_workflow_yaml(yaml_content)?;
+let graph = TaskGraph::from_workflow(&workflow)?;
+
+// Get entry points (tasks with no dependencies)
+for entry in &graph.entry_points {
+    println!("Entry point: {}", entry);
+}
+
+// Get tasks ready to execute
+let completed = HashSet::new();
+let ready = graph.ready_tasks(&completed);
+```
+
+### 2. Context Manager (`workflow/context.rs`)
+
+**Purpose:** Manages workflow execution context, including variables, parameters, and template rendering.
+
+**Key Features:**
+- Workflow-level and task-level variable management
+- Jinja2-like template rendering with `{{ variable }}` syntax
+- Task result storage and retrieval
+- With-items iteration support (current item and index)
+- Nested value access (e.g., `{{ parameters.config.server.port }}`)
+- Context import/export for persistence
+
+**Variable Scopes:**
+- `parameters.*` - Input parameters to the workflow
+- `vars.*` or `variables.*` - Workflow-scoped variables
+- `task.*` or `tasks.*` - Task results
+- `item` - Current item in with-items iteration
+- `index` - Current index in with-items iteration
+- `system.*` - System variables (e.g., workflow start time)
+
+**Example Usage:**
+```rust
+use attune_executor::workflow::WorkflowContext;
+use serde_json::json;
+
+let params = json!({"name": "Alice"});
+let mut ctx = WorkflowContext::new(params, HashMap::new());
+
+// Render template
+let result = ctx.render_template("Hello {{ parameters.name }}!")?;
+// Result: "Hello Alice!"
+
+// Store task result
+ctx.set_task_result("task1", json!({"status": "success"}));
+
+// Publish variables from result
+let result = json!({"output": "value"});
+ctx.publish_from_result(&result, &["my_var".to_string()], None)?;
+```
+
+### 3. Task Executor (`workflow/task_executor.rs`)
+
+**Purpose:** Executes individual workflow tasks with support for different task types, retries, and timeouts.
+
+**Key Features:**
+- Action task execution (queues actions for workers)
+- Parallel task execution (spawns multiple tasks concurrently)
+- Workflow task execution (nested workflows - TODO)
+- With-items iteration (batch processing with concurrency limits)
+- Conditional execution (when clauses)
+- Retry logic with configurable backoff strategies
+- Timeout handling
+- Task result publishing to context
+
+**Task Types:**
+- **Action**: Execute a single action
+- **Parallel**: Execute multiple sub-tasks concurrently
+- **Workflow**: Execute a nested workflow (not yet implemented)
+
+**Retry Strategies:**
+- **Constant**: Fixed delay between retries
+- **Linear**: Linearly increasing delay
+- **Exponential**: Exponentially increasing delay with optional max delay
+
+**Example Task Execution Flow:**
+```
+1. Check if task should be skipped (when condition)
+2. Check if task has with-items iteration
+   - If yes, process items in batches with concurrency limits
+   - If no, execute single task
+3. Render task input with context
+4. Execute based on task type (action/parallel/workflow)
+5. Apply timeout if configured
+6. Handle retries on failure
+7. Publish variables from result
+8. Update task execution record in database
+```
+
+### 4. Workflow Coordinator (`workflow/coordinator.rs`)
+
+**Purpose:** Main orchestration component that manages the complete workflow execution lifecycle.
+
+**Key Features:**
+- Workflow lifecycle management (start, pause, resume, cancel)
+- State management (completed, failed, skipped tasks)
+- Concurrent task execution coordination
+- Database state persistence
+- Execution result aggregation
+- Error handling and recovery
+
+**Workflow Execution States:**
+- `Requested` - Workflow execution requested
+- `Scheduling` - Being scheduled
+- `Scheduled` - Ready to execute
+- `Running` - Currently executing
+- `Completed` - Successfully completed
+- `Failed` - Failed with errors
+- `Cancelled` - Cancelled by user
+- `Timeout` - Timed out
+
+**Example Usage:**
+```rust
+use attune_executor::workflow::WorkflowCoordinator;
+use serde_json::json;
+
+let coordinator = WorkflowCoordinator::new(db_pool, mq);
+
+// Start workflow execution
+let handle = coordinator
+    .start_workflow("my_pack.my_workflow", json!({"param": "value"}), None)
+    .await?;
+
+// Execute to completion
+let result = handle.execute().await?;
+
+println!("Status: {:?}", result.status);
+println!("Completed tasks: {}", result.completed_tasks);
+println!("Failed tasks: {}", result.failed_tasks);
+
+// Or control execution
+handle.pause(Some("User requested pause".to_string())).await?;
+handle.resume().await?;
+handle.cancel().await?;
+
+// Check status
+let status = handle.status().await;
+println!("Current: {}/{} tasks", status.completed_tasks, status.total_tasks);
+```
+
+## Execution Flow
+
+### High-Level Workflow Execution
+
+```
+1. Load workflow definition from database
+2. Parse workflow YAML definition
+3. Build task graph with dependencies
+4. Create parent execution record
+5. Initialize workflow context with parameters and variables
+6. Create workflow execution record in database
+7. Enter execution loop:
+   a. Check if workflow is paused -> wait
+   b. Check if workflow is complete -> exit
+   c. Get ready tasks (dependencies satisfied)
+   d. Spawn async execution for each ready task
+   e. Wait briefly before checking again
+8. Aggregate results and return
+```
+
+### Task Execution Flow
+
+```
+1. Create task execution record in database
+2. Get current workflow context
+3. Execute task (action/parallel/workflow/with-items)
+4. Update task execution record with result
+5. Update workflow state:
+   - Add to completed_tasks on success
+   - Add to failed_tasks on failure (unless retrying)
+   - Add to skipped_tasks if skipped
+   - Update context with task result
+6. Persist workflow state to database
+```
+
+## Database Schema
+
+### workflow_execution Table
+
+Stores workflow execution state:
+
+```sql
+CREATE TABLE attune.workflow_execution (
+    id BIGSERIAL PRIMARY KEY,
+    execution BIGINT NOT NULL REFERENCES attune.execution(id),
+    workflow_def BIGINT NOT NULL REFERENCES attune.workflow_definition(id),
+    current_tasks TEXT[] NOT NULL DEFAULT '{}',
+    completed_tasks TEXT[] NOT NULL DEFAULT '{}',
+    failed_tasks TEXT[] NOT NULL DEFAULT '{}',
+    skipped_tasks TEXT[] NOT NULL DEFAULT '{}',
+    variables JSONB NOT NULL DEFAULT '{}',
+    task_graph JSONB NOT NULL,
+    status execution_status_enum NOT NULL,
+    error_message TEXT,
+    paused BOOLEAN NOT NULL DEFAULT false,
+    pause_reason TEXT,
+    created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
+    updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
+);
+```
+
+### workflow_task_execution Table
+
+Stores individual task execution state:
+
+```sql
+CREATE TABLE attune.workflow_task_execution (
+    id BIGSERIAL PRIMARY KEY,
+    workflow_execution BIGINT NOT NULL REFERENCES attune.workflow_execution(id),
+    execution BIGINT NOT NULL REFERENCES attune.execution(id),
+    task_name TEXT NOT NULL,
+    task_index INTEGER,
+    task_batch INTEGER,
+    status execution_status_enum NOT NULL,
+    started_at TIMESTAMP WITH TIME ZONE,
+    completed_at TIMESTAMP WITH TIME ZONE,
+    duration_ms BIGINT,
+    result JSONB,
+    error JSONB,
+    retry_count INTEGER NOT NULL DEFAULT 0,
+    max_retries INTEGER NOT NULL DEFAULT 0,
+    next_retry_at TIMESTAMP WITH TIME ZONE,
+    timeout_seconds INTEGER,
+    timed_out BOOLEAN NOT NULL DEFAULT false,
+    created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
+    updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
+);
+```
+
+## Template Rendering
+
+### Syntax
+
+Templates use Jinja2-like syntax with `{{ expression }}`:
+
+```yaml
+tasks:
+  - name: greet
+    action: core.echo
+    input:
+      message: "Hello {{ parameters.name }}!"
+      
+  - name: process
+    action: core.process
+    input:
+      data: "{{ task.greet.result.output }}"
+      count: "{{ variables.counter }}"
+```
+
+### Supported Expressions
+
+**Parameters:**
+```
+{{ parameters.name }}
+{{ parameters.config.server.port }}
+```
+
+**Variables:**
+```
+{{ vars.my_variable }}
+{{ variables.counter }}
+{{ my_var }}  # Direct variable reference
+```
+
+**Task Results:**
+```
+{{ task.task_name.result }}
+{{ task.task_name.output.key }}
+{{ tasks.previous_task.status }}
+```
+
+**With-Items Context:**
+```
+{{ item }}
+{{ item.name }}
+{{ index }}
+```
+
+**System Variables:**
+```
+{{ system.workflow_start }}
+```
+
+## With-Items Iteration
+
+Execute a task multiple times with different items:
+
+```yaml
+tasks:
+  - name: process_servers
+    action: server.configure
+    with_items: "{{ parameters.servers }}"
+    batch_size: 5        # Process 5 items at a time
+    concurrency: 10      # Max 10 concurrent executions
+    input:
+      server: "{{ item.hostname }}"
+      index: "{{ index }}"
+```
+
+**Features:**
+- Batch processing: Process items in batches of specified size
+- Concurrency control: Limit number of concurrent executions
+- Context isolation: Each iteration has its own `item` and `index`
+- Result aggregation: All results collected in array
+
+## Retry Strategies
+
+### Constant Backoff
+
+Fixed delay between retries:
+
+```yaml
+tasks:
+  - name: flaky_task
+    action: external.api_call
+    retry:
+      count: 3
+      delay: 10        # 10 seconds between each retry
+      backoff: constant
+```
+
+### Linear Backoff
+
+Linearly increasing delay:
+
+```yaml
+retry:
+  count: 5
+  delay: 5
+  backoff: linear
+# Delays: 5s, 10s, 15s, 20s, 25s
+```
+
+### Exponential Backoff
+
+Exponentially increasing delay:
+
+```yaml
+retry:
+  count: 5
+  delay: 2
+  backoff: exponential
+  max_delay: 60
+# Delays: 2s, 4s, 8s, 16s, 32s (capped at 60s)
+```
+
+## Task Transitions
+
+Control workflow flow with transitions:
+
+```yaml
+tasks:
+  - name: check
+    action: core.check_status
+    on_success: deploy      # Go to deploy on success
+    on_failure: rollback    # Go to rollback on failure
+    on_complete: notify     # Always go to notify
+    on_timeout: alert       # Go to alert on timeout
+    
+  - name: decision
+    action: core.evaluate
+    decision:
+      - when: "{{ task.decision.result.action == 'approve' }}"
+        next: deploy
+      - when: "{{ task.decision.result.action == 'reject' }}"
+        next: rollback
+      - default: true
+        next: manual_review
+```
+
+## Error Handling
+
+### Task Execution Errors
+
+Errors are captured with:
+- Error message
+- Error type
+- Optional error details (JSON)
+
+### Workflow Failure Handling
+
+- Individual task failures don't immediately stop the workflow
+- Dependent tasks won't execute if prerequisites failed
+- Workflow completes when all executable tasks finish
+- Final status is `Failed` if any task failed
+
+### Retry on Error
+
+```yaml
+retry:
+  count: 3
+  delay: 5
+  backoff: exponential
+  on_error: "{{ result.error_code == 'RETRY_ABLE' }}"  # Only retry specific errors
+```
+
+## Parallel Execution
+
+Execute multiple tasks concurrently:
+
+```yaml
+tasks:
+  - name: parallel_checks
+    type: parallel
+    tasks:
+      - name: check_service_a
+        action: monitoring.check_health
+        input:
+          service: "service-a"
+      
+      - name: check_service_b
+        action: monitoring.check_health
+        input:
+          service: "service-b"
+      
+      - name: check_database
+        action: monitoring.check_db
+    
+    on_success: deploy
+    on_failure: abort
+```
+
+**Features:**
+- All sub-tasks execute concurrently
+- Parent task waits for all sub-tasks to complete
+- Success only if all sub-tasks succeed
+- Individual sub-task results aggregated
+
+## Conditional Execution
+
+Skip tasks based on conditions:
+
+```yaml
+tasks:
+  - name: deploy
+    action: deployment.deploy
+    when: "{{ parameters.environment == 'production' }}"
+    input:
+      version: "{{ parameters.version }}"
+```
+
+**When Clause Evaluation:**
+- Template rendered with current context
+- Evaluated as boolean (truthy/falsy)
+- Task skipped if condition is false
+
+## State Persistence
+
+Workflow state is persisted to the database after every task completion:
+
+- Current executing tasks
+- Completed tasks list
+- Failed tasks list
+- Skipped tasks list
+- Workflow variables (entire context)
+- Execution status
+- Pause state and reason
+- Error messages
+
+This enables:
+- Workflow resume after service restart
+- Pause/resume functionality
+- Execution history and auditing
+- Progress monitoring
+
+## Integration Points
+
+### Message Queue
+
+Tasks queue action executions via RabbitMQ:
+
+```rust
+// Task executor creates execution record
+let execution = create_execution_record(...).await?;
+
+// Queues execution for worker (TODO: implement MQ publishing)
+self.mq.publish_execution_request(execution.id, action_ref, &input).await?;
+```
+
+### Worker Coordination
+
+- Executor creates execution records
+- Workers pick up and execute actions
+- Workers update execution status
+- Coordinator monitors completion (TODO: implement completion listener)
+
+### Event Publishing
+
+Workflow events should be published for:
+- Workflow started
+- Workflow completed/failed
+- Task started/completed/failed
+- Workflow paused/resumed/cancelled
+
+## Future Enhancements
+
+### TODO Items
+
+1. **Completion Listener**: Listen for task completion events from workers
+2. **Nested Workflows**: Execute workflows as tasks within workflows
+3. **MQ Publishing**: Implement actual message queue publishing for action execution
+4. **Advanced Expressions**: Support comparisons, logical operators in templates
+5. **Error Condition Evaluation**: Evaluate `on_error` expressions for selective retries
+6. **Workflow Timeouts**: Global workflow timeout configuration
+7. **Task Dependencies**: Explicit `depends_on` task specification
+8. **Loop Constructs**: While/until loops in addition to with-items
+9. **Manual Steps**: Human-in-the-loop approval tasks
+10. **Sub-workflow Output**: Capture and use nested workflow results
+
+## Testing
+
+### Unit Tests
+
+Each module includes unit tests:
+
+```bash
+# Run all executor tests
+cargo test -p attune-executor
+
+# Run specific module tests
+cargo test -p attune-executor --lib workflow::graph
+cargo test -p attune-executor --lib workflow::context
+```
+
+### Integration Tests
+
+Integration tests require database and message queue:
+
+```bash
+# Set up test database
+export DATABASE_URL="postgresql://attune_test:attune_test@localhost:5432/attune_test"
+sqlx migrate run
+
+# Run integration tests
+cargo test -p attune-executor --test '*'
+```
+
+## Performance Considerations
+
+### Concurrency
+
+- Parallel tasks execute truly concurrently using `futures::join_all`
+- With-items supports configurable concurrency limits
+- Task graph execution is optimized with topological sorting
+
+### Database Operations
+
+- Workflow state persisted after each task completion
+- Batch operations used where possible
+- Connection pooling for database access
+
+### Memory
+
+- Task graphs and contexts can be large for complex workflows
+- Consider workflow size limits in production
+- Context variables should be reasonably sized
+
+## Troubleshooting
+
+### Workflow Not Progressing
+
+**Symptoms**: Workflow stuck in Running state
+
+**Causes**:
+- Circular dependencies (should be caught during parsing)
+- All tasks waiting on failed dependencies
+- Database connection issues
+
+**Solution**: Check workflow state in database, review task dependencies
+
+### Tasks Not Executing
+
+**Symptoms**: Ready tasks not starting
+
+**Causes**:
+- Worker service not running
+- Message queue not connected
+- Execution records not being created
+
+**Solution**: Check worker logs, verify MQ connection, check database
+
+### Template Rendering Errors
+
+**Symptoms**: Tasks fail with template errors
+
+**Causes**:
+- Invalid variable references
+- Missing context data
+- Malformed expressions
+
+**Solution**: Validate templates, check available context variables
+
+## Examples
+
+See `docs/workflows/` for complete workflow examples demonstrating:
+- Sequential workflows
+- Parallel execution
+- With-items iteration
+- Conditional execution
+- Error handling and retries
+- Complex workflows with decisions
+
+## Related Documentation
+
+- [Workflow Definition Format](workflow-definition-format.md)
+- [Pack Integration](api-pack-workflows.md)
+- [Execution API](api-executions.md)
+- [Message Queue Architecture](message-queue.md)