re-uploading work

This commit is contained in:
2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions

View File

@@ -0,0 +1,265 @@
# Dynamic Parameter Forms
## Overview
The web UI now supports dynamic form generation for rule creation based on parameter schemas defined in triggers and actions. This replaces the previous raw JSON textarea inputs with type-aware form fields.
## Features
- **Type-aware inputs**: Automatically renders appropriate form controls based on parameter type
- **Validation**: Real-time validation with error messages for required fields and type constraints
- **Default values**: Auto-populates form fields with default values from schema
- **Enum support**: Renders dropdown selects for enum-type parameters
- **Complex types**: JSON editing support for arrays and objects with live parsing
## Component Architecture
### ParamSchemaForm Component
Location: `web/src/components/common/ParamSchemaForm.tsx`
A reusable React component that dynamically generates form inputs based on a parameter schema.
**Props:**
- `schema`: Parameter schema object (flat key-value structure)
- `values`: Current parameter values
- `onChange`: Callback when values change
- `errors`: Validation errors to display
- `disabled`: Whether form is disabled
- `className`: Additional CSS classes
**Supported Types:**
- `string` - Text input (or select dropdown if `enum` is provided)
- `number` - Numeric input with decimal support
- `integer` - Numeric input (whole numbers only)
- `boolean` - Checkbox with label
- `array` - JSON textarea with syntax validation
- `object` - JSON textarea with syntax validation
### Parameter Schema Format
The component expects a flat schema structure:
```typescript
{
[parameterName]: {
type?: "string" | "number" | "integer" | "boolean" | "array" | "object";
description?: string;
required?: boolean;
default?: any;
enum?: string[];
}
}
```
**Example:**
```json
{
"expression": {
"type": "string",
"description": "Cron expression in standard format",
"required": true
},
"timezone": {
"type": "string",
"description": "Timezone for cron schedule",
"default": "UTC"
},
"interval": {
"type": "integer",
"description": "Number of time units between each trigger",
"default": 60,
"required": true
},
"unit": {
"type": "string",
"enum": ["seconds", "minutes", "hours"],
"description": "Time unit for the interval",
"default": "seconds",
"required": true
}
}
```
## Usage in Rule Creation
When creating a rule through the web UI:
1. **Select Pack**: Choose which pack to use
2. **Select Trigger**: Dropdown shows available triggers from the pack
3. **Configure Trigger Params**: Dynamic form appears based on trigger's `param_schema`
4. **Select Action**: Dropdown shows available actions from the pack
5. **Configure Action Params**: Dynamic form appears based on action's `param_schema`
6. **Set Conditions**: JSON-based conditional logic (optional)
7. **Submit**: Form validates all required fields before submission
### Data Flow
```
1. User selects trigger (from summary list)
2. System fetches full trigger details (GET /api/v1/triggers/{ref})
3. Extract param_schema from TriggerResponse
4. ParamSchemaForm renders inputs based on schema
5. User fills in parameters
6. Validation runs on submission
7. Parameters sent as trigger_params in CreateRuleRequest
```
Same flow applies for action parameters.
## API Design Pattern
The implementation follows the **"summary for lists, details on demand"** pattern:
- **List endpoints** (`/api/v1/packs/{pack_ref}/triggers`): Return `TriggerSummary` without `param_schema`
- **Detail endpoints** (`/api/v1/triggers/{ref}`): Return `TriggerResponse` with full `param_schema`
This keeps list responses lightweight while providing full schema information when needed.
## Pack Definition Format
### Trigger YAML Structure
```yaml
name: intervaltimer
ref: core.intervaltimer
description: "Fires at regular intervals"
enabled: true
type: interval
# Parameter schema - flat structure
parameters:
unit:
type: string
enum:
- seconds
- minutes
- hours
description: "Time unit for the interval"
default: "seconds"
required: true
interval:
type: integer
description: "Number of time units between each trigger"
default: 60
required: true
# Output schema (payload emitted when trigger fires)
output:
type: object
properties:
type:
type: string
interval_seconds:
type: integer
fired_at:
type: string
format: date-time
```
### Action YAML Structure
Actions use the same flat parameter schema format:
```yaml
name: echo
ref: core.echo
description: "Echoes a message"
runtime: shell
# Parameter schema - flat structure
parameters:
message:
type: string
description: "Message to echo"
required: true
uppercase:
type: boolean
description: "Convert message to uppercase"
default: false
```
## Pack Loader Mapping
The Python pack loader (`scripts/load_core_pack.py`) maps YAML keys to database columns:
| YAML Key | Database Column |
|--------------|-----------------|
| `parameters` | `param_schema` |
| `output` | `out_schema` |
The loader serializes the YAML structure as JSON and stores it in the `param_schema` JSONB column.
## Validation
The `validateParamSchema` utility function validates parameter values against the schema:
```typescript
import { validateParamSchema } from '@/components/common/ParamSchemaForm';
const errors = validateParamSchema(schema, values);
// Returns: { [fieldName]: "error message" }
```
**Validation Rules:**
- Required fields must have non-empty values
- Numbers must be valid numeric values
- Arrays must be valid JSON arrays
- Objects must be valid JSON objects
## Future Enhancements
Potential improvements for the parameter form system:
1. **Advanced validation**: Support for min/max, pattern matching, custom validators
2. **Conditional fields**: Show/hide fields based on other field values
3. **Field hints**: Helper text, examples, tooltips
4. **Template variables**: Autocomplete for Jinja2 template syntax (e.g., `{{ trigger.payload.* }}`)
5. **Schema versioning**: Handle schema changes across pack versions
6. **Array item editing**: Better UX for editing array items individually
7. **Nested objects**: Support for deeply nested object schemas
8. **File uploads**: Support for file-type parameters
9. **Date pickers**: Native date/time inputs for datetime parameters
## Troubleshooting
### Parameters not showing in UI
**Check:**
1. Is the trigger/action selected?
2. Does the database record have `param_schema` populated?
```sql
SELECT ref, param_schema FROM trigger WHERE ref = 'core.intervaltimer';
```
3. Is the API returning the full record (not summary)?
4. Check browser console for JavaScript errors
### Schema not loading from pack YAML
**Check:**
1. YAML uses `parameters` key (not `parameters_schema`)
2. Schema is in flat format (not nested JSON Schema with `properties`)
3. Pack was reloaded after YAML changes: `./scripts/load-core-pack.sh`
4. Database has correct schema: `SELECT param_schema FROM trigger WHERE ref = 'pack.trigger';`
### Validation errors
**Check:**
1. Required fields are marked with `required: true` in schema
2. Type matches expected format (e.g., integer vs string)
3. Enum values match exactly (case-sensitive)
## Related Files
- `web/src/components/common/ParamSchemaForm.tsx` - Core form component
- `web/src/components/forms/RuleForm.tsx` - Rule creation form using ParamSchemaForm
- `web/src/pages/actions/ActionsPage.tsx` - Execute action modal using ParamSchemaForm
- `scripts/load_core_pack.py` - Pack loader that converts YAML to database schema
- `packs/core/triggers/*.yaml` - Example trigger definitions
- `packs/core/actions/*.yaml` - Example action definitions

View File

@@ -0,0 +1,469 @@
# Execution Hierarchy and Parent Relationships
## Overview
The `execution` table supports two types of parent-child relationships:
1. **General execution hierarchies** (via `parent` field)
2. **Workflow task executions** (via `workflow_task` metadata)
This document explains why both are needed, how they differ, and when to use each.
## Field Purposes
### `execution.parent` (General Hierarchy)
**Type**: `Option<Id>` - Foreign key to `execution.id`
**Purpose**: Generic execution tree traversal for ANY type of parent-child relationship.
**Used for**:
- **Workflow tasks**: Parent is the workflow's main execution record
- **Child actions**: Parent is the action that spawned them
- **Nested workflows**: Parent is the outer workflow's execution
- **Any future parent-child patterns**
**Example SQL**:
```sql
-- Find all child executions (any type)
SELECT * FROM attune.execution WHERE parent = 100;
```
### `execution.workflow_task.workflow_execution` (Workflow-Specific)
**Type**: `Id` within `WorkflowTaskMetadata` JSONB - References `workflow_execution.id`
**Purpose**: Direct link to workflow orchestration state.
**Provides access to**:
- Task graph structure
- Workflow variables
- Current/completed/failed task lists
- Workflow-specific metadata
**Example SQL**:
```sql
-- Find all tasks in a specific workflow
SELECT * FROM attune.execution
WHERE workflow_task->>'workflow_execution' = '50';
```
## Workflow Task Execution Structure
When a workflow executes, three types of records are created:
```
┌─────────────────────────────────────────────────────────────┐
│ 1. Parent Execution (the workflow itself as an execution) │
├─────────────────────────────────────────────────────────────┤
│ id: 100 │
│ action_ref: "my_pack.my_workflow" │
│ parent: None (or outer workflow ID if nested) │
│ workflow_task: None │
│ status: running │
└─────────────────────────────────────────────────────────────┘
│ references (execution field)
┌─────────────────────────────────────────────────────────────┐
│ 2. Workflow Execution Record (orchestration state) │
├─────────────────────────────────────────────────────────────┤
│ id: 50 │
│ execution: 100 ← points to parent execution │
│ workflow_def: 10 │
│ task_graph: {...} │
│ variables: {...} │
│ current_tasks: ["send_email", "process_data"] │
│ completed_tasks: [] │
│ failed_tasks: [] │
└─────────────────────────────────────────────────────────────┘
│ references (workflow_execution)
┌─────────────────────────────────────────────────────────────┐
│ 3. Task Execution (one per workflow task) │
├─────────────────────────────────────────────────────────────┤
│ id: 101 │
│ action_ref: "my_pack.send_email" │
│ parent: 100 ← points to workflow execution │
│ workflow_task: { │
│ workflow_execution: 50 ← points to workflow_execution │
│ task_name: "send_email", │
│ task_index: null, │
│ retry_count: 0, │
│ max_retries: 3, │
│ ... │
│ } │
│ status: running │
└─────────────────────────────────────────────────────────────┘
```
## Relationship Diagram
```
┌─────────────────────┐
│ Task Execution │
│ (id: 101) │
│ │
│ parent: 100 │──────┐
│ │ │
│ workflow_task: { │ │
│ workflow_exec: 50│──┐ │
│ } │ │ │
└─────────────────────┘ │ │
│ │
│ ▼
│ ┌─────────────────────┐
│ │ Parent Execution │
│ │ (id: 100) │
│ │ [The Workflow] │
│ └─────────────────────┘
│ ▲
│ │
│ │ execution: 100
│ │
│ ┌─────────────────────┐
└─▶│ Workflow Execution │
│ (id: 50) │
│ [Orchestration] │
└─────────────────────┘
```
**Key**: Both `parent` and `workflow_task.workflow_execution` ultimately reference the same workflow, but serve different query patterns.
## Why Both Fields Are Needed
### ✅ Reason 1: `parent` is Generic
The `parent` field is used for **all types** of execution hierarchies, not just workflows:
**Example 1: Action spawning child actions**
```rust
// Parent action execution
let parent_exec = create_execution("my_pack.parent_action").await?;
// Child action executions (NOT workflow tasks)
let child1 = CreateExecutionInput {
action_ref: "my_pack.child_action_1".to_string(),
parent: Some(parent_exec.id),
workflow_task: None, // Not a workflow task!
...
};
```
**Example 2: Nested workflows**
```rust
// Outer workflow execution
let outer_workflow = create_workflow("outer_workflow").await?;
// Inner workflow execution (nested)
let inner_workflow = CreateExecutionInput {
action_ref: "inner_workflow".to_string(),
parent: Some(outer_workflow.id),
workflow_task: None, // This is a workflow, not a task
...
};
```
### ✅ Reason 2: Workflow-Specific State is Separate
The `workflow_execution` table contains orchestration state that doesn't belong in the main `execution` record:
- **Task graph**: Directed acyclic graph of task dependencies
- **Workflow variables**: Scoped variable context
- **Task tracking**: current_tasks, completed_tasks, failed_tasks arrays
- **Workflow metadata**: pause_reason, error_message, etc.
Direct access via `workflow_task.workflow_execution` avoids JOINs.
### ✅ Reason 3: Query Efficiency
**Without direct `workflow_execution` reference**, finding workflow state requires:
```sql
-- BAD: Two JOINs required
SELECT we.*
FROM attune.execution task
JOIN attune.execution parent ON task.parent = parent.id
JOIN attune.workflow_execution we ON we.execution = parent.id
WHERE task.id = 101;
```
**With direct reference**:
```sql
-- GOOD: Single lookup via JSONB
SELECT we.*
FROM attune.workflow_execution we
WHERE we.id = (
SELECT (workflow_task->>'workflow_execution')::bigint
FROM attune.execution
WHERE id = 101
);
```
### ✅ Reason 4: Clear Semantics
- `parent` = "What execution spawned me?"
- `workflow_task.workflow_execution` = "What workflow orchestration state do I belong to?"
These are related but semantically different questions.
## Use Cases and Query Patterns
### Use Case 1: Generic Execution Tree Traversal
```rust
// Get ALL child executions (workflow tasks, child actions, anything)
async fn get_children(pool: &PgPool, parent_id: Id) -> Result<Vec<Execution>> {
sqlx::query_as::<_, Execution>(
"SELECT * FROM attune.execution WHERE parent = $1"
)
.bind(parent_id)
.fetch_all(pool)
.await
.map_err(Into::into)
}
// Works for workflows, actions, any execution type
let all_children = get_children(&pool, parent_exec_id).await?;
```
### Use Case 2: Workflow Task Queries
```rust
// Get all tasks for a workflow execution
let tasks = ExecutionRepository::find_by_workflow_execution(
&pool,
workflow_execution_id
).await?;
// Implementation uses direct JSONB query:
// WHERE workflow_task->>'workflow_execution' = $1
```
### Use Case 3: Workflow State Access
```rust
// From a task execution, get the workflow state
async fn get_workflow_state(
pool: &PgPool,
task_exec: &Execution
) -> Result<Option<WorkflowExecution>> {
if let Some(wt) = &task_exec.workflow_task {
let workflow_exec = WorkflowExecutionRepository::find_by_id(
pool,
wt.workflow_execution
).await?;
Ok(Some(workflow_exec))
} else {
Ok(None)
}
}
// Without direct link, would need to:
// 1. Get parent execution via task_exec.parent
// 2. Find workflow_execution WHERE execution = parent
```
### Use Case 4: Hierarchical Display
```rust
// Display execution tree with proper indentation
async fn display_execution_tree(pool: &PgPool, root_id: Id, indent: usize) {
let exec = ExecutionRepository::find_by_id(pool, root_id).await.unwrap();
println!("{:indent$}├─ {} ({})", "", exec.action_ref, exec.status, indent = indent);
// Get children using generic parent relationship
let children = sqlx::query_as::<_, Execution>(
"SELECT * FROM attune.execution WHERE parent = $1"
)
.bind(root_id)
.fetch_all(pool)
.await
.unwrap();
for child in children {
display_execution_tree(pool, child.id, indent + 2).await;
}
}
```
## The Redundancy Trade-off
### For Workflow Tasks: Yes, There's Redundancy
```
task.parent
→ parent_execution (id: 100)
← workflow_execution.execution
task.workflow_task.workflow_execution
→ workflow_execution (id: 50)
→ parent_execution (id: 100)
```
Both ultimately point to the same workflow, just through different paths.
### Why This Is Acceptable
1. **Performance**: Direct link avoids JOINs (PostgreSQL JSONB is fast)
2. **Clarity**: Explicit workflow relationship vs generic parent relationship
3. **Flexibility**: `parent` can be used for non-workflow patterns
4. **Consistency**: All executions use `parent` the same way
### Alternatives Considered
#### ❌ Alternative 1: Remove `workflow_execution` from metadata
**Problem**: Forces 2-JOIN queries to access workflow state
```sql
-- Every workflow task query becomes complex
SELECT we.*
FROM attune.execution task
JOIN attune.execution parent ON task.parent = parent.id
JOIN attune.workflow_execution we ON we.execution = parent.id
WHERE task.workflow_task IS NOT NULL;
```
#### ❌ Alternative 2: Remove `parent` for workflow tasks
**Problem**: Breaks generic execution tree queries
```sql
-- Would need complex COALESCE logic
SELECT * FROM attune.execution
WHERE parent = $1
OR (workflow_task IS NOT NULL
AND (workflow_task->>'parent_execution')::bigint = $1);
```
#### ✅ Current Approach: Keep Both
Small redundancy in exchange for:
- Simple generic queries via `parent`
- Efficient workflow queries via `workflow_task.workflow_execution`
- Clear separation of concerns
## Validation and Best Practices
### Validation Logic (Optional)
For data integrity, you could validate consistency:
```rust
async fn validate_workflow_task_consistency(
pool: &PgPool,
task_exec: &Execution
) -> Result<()> {
if let Some(wt) = &task_exec.workflow_task {
// Get workflow_execution record
let workflow_exec = WorkflowExecutionRepository::find_by_id(
pool,
wt.workflow_execution
).await?;
// Ensure parent matches workflow_execution.execution
if task_exec.parent != Some(workflow_exec.execution) {
return Err(Error::validation(format!(
"Inconsistent parent: task.parent={:?}, workflow_exec.execution={}",
task_exec.parent, workflow_exec.execution
)));
}
}
Ok(())
}
```
### Helper Methods (Recommended)
Add convenience methods to the `Execution` model:
```rust
impl Execution {
/// Check if this execution is a workflow task
pub fn is_workflow_task(&self) -> bool {
self.workflow_task.is_some()
}
/// Get the workflow_execution record if this is a workflow task
pub async fn get_workflow_execution(
&self,
pool: &PgPool
) -> Result<Option<WorkflowExecution>> {
if let Some(wt) = &self.workflow_task {
let we = WorkflowExecutionRepository::find_by_id(pool, wt.workflow_execution).await?;
Ok(Some(we))
} else {
Ok(None)
}
}
/// Get the parent execution
pub async fn get_parent(&self, pool: &PgPool) -> Result<Option<Execution>> {
if let Some(parent_id) = self.parent {
ExecutionRepository::find_by_id(pool, parent_id).await
} else {
Ok(None)
}
}
/// Get all child executions (generic, works for any execution type)
pub async fn get_children(&self, pool: &PgPool) -> Result<Vec<Execution>> {
sqlx::query_as::<_, Execution>(
"SELECT * FROM attune.execution WHERE parent = $1 ORDER BY created"
)
.bind(self.id)
.fetch_all(pool)
.await
.map_err(Into::into)
}
}
```
## Summary
### Key Takeaways
1. **`parent`** is a generic field for ALL execution hierarchies (workflows, child actions, nested workflows)
2. **`workflow_task.workflow_execution`** is a workflow-specific optimization for direct access to orchestration state
3. **Both are needed** because:
- `parent` must remain generic for non-workflow use cases
- Direct workflow_execution link avoids expensive JOINs
- Different query patterns benefit from each approach
4. **The redundancy is acceptable** because:
- It's limited to workflow tasks only (not all executions)
- Performance gain from avoiding JOINs
- Clearer semantics for different use cases
### When to Use Which
| Scenario | Use `parent` | Use `workflow_task.workflow_execution` |
|----------|--------------|----------------------------------------|
| Get child executions (any type) | ✅ | ❌ |
| Build execution tree | ✅ | ❌ |
| Find all workflow tasks | ❌ | ✅ |
| Access workflow state | ❌ | ✅ |
| Non-workflow parent-child | ✅ | N/A |
### Design Principle
**Separation of concerns**:
- `parent`: Structural relationship (execution hierarchy)
- `workflow_task.workflow_execution`: Semantic relationship (workflow orchestration)
This follows the principle that a workflow task has TWO relationships:
1. As a child in the execution tree (`parent`)
2. As a task in a workflow (`workflow_task.workflow_execution`)
Both are valid, serve different purposes, and should coexist.
## References
- **Migration**: `migrations/20260127212500_consolidate_workflow_task_execution.sql`
- **Models**: `crates/common/src/models.rs` (Execution, WorkflowTaskMetadata)
- **Repositories**: `crates/common/src/repositories/execution.rs`
- **Workflow Coordinator**: `crates/executor/src/workflow/coordinator.rs`

View File

@@ -0,0 +1,702 @@
# Inquiry Handling - Human-in-the-Loop Workflows
## Overview
Inquiry handling enables **human-in-the-loop workflows** in Attune, allowing action executions to pause and wait for human input, approval, or decisions before continuing. This is essential for workflows that require manual intervention, approval gates, or interactive decision-making.
## Architecture
### Components
1. **Action** - Returns a result containing an inquiry request
2. **Worker** - Executes action and returns result with `__inquiry` marker
3. **Executor (Completion Listener)** - Detects inquiry request and creates inquiry
4. **Inquiry Record** - Database record tracking the inquiry state
5. **API** - Endpoints for users to view and respond to inquiries
6. **Executor (Inquiry Handler)** - Listens for responses and resumes executions
7. **Notifier** - Sends real-time notifications about inquiry events
### Message Flow
```
Action Execution → Worker completes → ExecutionCompleted message →
Completion Listener detects __inquiry → Creates Inquiry record →
Publishes InquiryCreated message → Notifier alerts users →
User responds via API → API publishes InquiryResponded message →
Inquiry Handler receives message → Updates execution with response →
Execution continues/completes
```
## Inquiry Request Format
### Action Result with Inquiry
Actions can request human input by including an `__inquiry` key in their result:
```json
{
"__inquiry": {
"prompt": "Approve deployment to production?",
"response_schema": {
"type": "object",
"properties": {
"approved": {"type": "boolean"},
"comments": {"type": "string"}
},
"required": ["approved"]
},
"assigned_to": 123,
"timeout_seconds": 3600
},
"deployment_plan": {
"target": "production",
"version": "v2.5.0"
}
}
```
### Inquiry Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `prompt` | string | Yes | Question/prompt text displayed to user |
| `response_schema` | JSON Schema | No | Schema defining expected response format |
| `assigned_to` | integer | No | Identity ID of user assigned to respond |
| `timeout_seconds` | integer | No | Seconds from creation until inquiry times out |
## Creating Inquiries
### From Python Actions
```python
def run(deployment_plan):
# Validate deployment plan
validate_plan(deployment_plan)
# Request human approval
return {
"__inquiry": {
"prompt": f"Approve deployment of {deployment_plan['version']} to production?",
"response_schema": {
"type": "object",
"properties": {
"approved": {"type": "boolean"},
"reason": {"type": "string"}
},
"required": ["approved"]
},
"timeout_seconds": 7200 # 2 hours
},
"plan": deployment_plan
}
```
### From JavaScript Actions
```javascript
async function run(config) {
// Prepare deployment
const plan = await prepareDeploy(config);
// Request approval with assigned user
return {
__inquiry: {
prompt: `Deploy ${plan.serviceName} to ${plan.environment}?`,
response_schema: {
type: "object",
properties: {
approved: { type: "boolean" },
comments: { type: "string" }
}
},
assigned_to: config.approver_id,
timeout_seconds: 3600
},
deployment: plan
};
}
```
## Inquiry Lifecycle
### Status Flow
```
pending → responded (user provides response)
pending → timeout (timeout_at expires)
pending → cancelled (manual cancellation)
```
### Database Schema
```sql
CREATE TABLE attune.inquiry (
id BIGSERIAL PRIMARY KEY,
execution BIGINT NOT NULL REFERENCES attune.execution(id),
prompt TEXT NOT NULL,
response_schema JSONB,
assigned_to BIGINT REFERENCES attune.identity(id),
status attune.inquiry_status_enum NOT NULL DEFAULT 'pending',
response JSONB,
timeout_at TIMESTAMPTZ,
responded_at TIMESTAMPTZ,
created TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
## API Endpoints
### List Inquiries
**GET** `/api/v1/inquiries`
Query parameters:
- `status` - Filter by status (pending, responded, timeout, cancelled)
- `execution` - Filter by execution ID
- `assigned_to` - Filter by assigned user ID
- `page`, `per_page` - Pagination
Response:
```json
{
"data": [
{
"id": 1,
"execution": 123,
"prompt": "Approve deployment?",
"assigned_to": 5,
"status": "pending",
"has_response": false,
"timeout_at": "2024-01-15T12:00:00Z",
"created": "2024-01-15T10:00:00Z"
}
],
"total": 1,
"page": 1,
"per_page": 50,
"pages": 1
}
```
### Get Inquiry Details
**GET** `/api/v1/inquiries/:id`
Response:
```json
{
"data": {
"id": 1,
"execution": 123,
"prompt": "Approve deployment to production?",
"response_schema": {
"type": "object",
"properties": {
"approved": {"type": "boolean"}
}
},
"assigned_to": 5,
"status": "pending",
"response": null,
"timeout_at": "2024-01-15T12:00:00Z",
"responded_at": null,
"created": "2024-01-15T10:00:00Z",
"updated": "2024-01-15T10:00:00Z"
}
}
```
### Respond to Inquiry
**POST** `/api/v1/inquiries/:id/respond`
Request body:
```json
{
"response": {
"approved": true,
"comments": "LGTM - all tests passed"
}
}
```
Response:
```json
{
"data": {
"id": 1,
"execution": 123,
"status": "responded",
"response": {
"approved": true,
"comments": "LGTM - all tests passed"
},
"responded_at": "2024-01-15T10:30:00Z"
},
"message": "Response submitted successfully"
}
```
### Cancel Inquiry
**POST** `/api/v1/inquiries/:id/cancel`
Cancels a pending inquiry (admin/system use).
## Message Queue Events
### InquiryCreated
Published when an inquiry is created.
Routing key: `inquiry.created`
Payload:
```json
{
"inquiry_id": 1,
"execution_id": 123,
"prompt": "Approve deployment?",
"response_schema": {...},
"assigned_to": 5,
"timeout_at": "2024-01-15T12:00:00Z"
}
```
### InquiryResponded
Published when a user responds to an inquiry.
Routing key: `inquiry.responded`
Payload:
```json
{
"inquiry_id": 1,
"execution_id": 123,
"response": {
"approved": true
},
"responded_by": 5,
"responded_at": "2024-01-15T10:30:00Z"
}
```
## Executor Service Integration
### Completion Listener
The completion listener detects inquiry requests in execution results:
```rust
// Check if execution result contains an inquiry request
if let Some(result) = &exec.result {
if InquiryHandler::has_inquiry_request(result) {
// Create inquiry and publish InquiryCreated message
InquiryHandler::create_inquiry_from_result(
pool,
publisher,
execution_id,
result,
).await?;
}
}
```
### Inquiry Handler
The inquiry handler processes inquiry responses:
```rust
// Listen for InquiryResponded messages
consumer.consume_with_handler(|envelope: MessageEnvelope<InquiryRespondedPayload>| {
async move {
// Update execution with inquiry response
Self::resume_execution_with_response(
pool,
publisher,
execution,
inquiry,
response,
).await?;
}
}).await?;
```
### Timeout Checker
A background task periodically checks for expired inquiries:
```rust
// Run every 60 seconds
InquiryHandler::timeout_check_loop(pool, 60).await;
```
This updates pending inquiries to `timeout` status when `timeout_at` is exceeded.
## Access Control
### Assignment Enforcement
If an inquiry has `assigned_to` set, only that user can respond:
```rust
if let Some(assigned_to) = inquiry.assigned_to {
if assigned_to != user_id {
return Err(ApiError::Forbidden("Not authorized to respond"));
}
}
```
### RBAC Integration (Future)
Future versions will integrate with RBAC for:
- Permission to respond to inquiries
- Permission to cancel inquiries
- Visibility filtering based on roles
## Timeout Handling
### Automatic Timeout
Inquiries with `timeout_at` set are automatically marked as timed out:
```sql
UPDATE attune.inquiry
SET status = 'timeout', updated = NOW()
WHERE status = 'pending'
AND timeout_at IS NOT NULL
AND timeout_at < NOW();
```
### Timeout Behavior
When an inquiry times out:
1. Status changes to `timeout`
2. Execution remains in current state
3. Optional: Publish timeout event
4. Optional: Resume execution with timeout indicator
## Real-Time Notifications
### WebSocket Integration
The Notifier service sends real-time notifications for inquiry events:
```javascript
// Subscribe to inquiry notifications
ws.send(JSON.stringify({
type: "subscribe",
filters: {
entity_type: "inquiry",
user_id: 5
}
}));
// Receive notification
{
"id": 123,
"entity_type": "inquiry",
"entity": "1",
"activity": "created",
"content": {
"prompt": "Approve deployment?",
"assigned_to": 5
}
}
```
### Notification Triggers
- **inquiry.created** - New inquiry created
- **inquiry.responded** - Inquiry received response
- **inquiry.timeout** - Inquiry timed out
- **inquiry.cancelled** - Inquiry was cancelled
## Use Cases
### Deployment Approval
```python
def deploy_to_production(config):
# Prepare deployment
plan = prepare_deployment(config)
# Request approval
return {
"__inquiry": {
"prompt": f"Approve deployment of {config['service']} v{config['version']}?",
"response_schema": {
"type": "object",
"properties": {
"approved": {"type": "boolean"},
"rollback_plan": {"type": "string"}
}
},
"assigned_to": get_on_call_engineer(),
"timeout_seconds": 1800 # 30 minutes
},
"deployment_plan": plan
}
```
### Data Validation
```python
def validate_data_import(data):
# Check for anomalies
anomalies = detect_anomalies(data)
if anomalies:
return {
"__inquiry": {
"prompt": f"Found {len(anomalies)} anomalies. Continue import?",
"response_schema": {
"type": "object",
"properties": {
"continue": {"type": "boolean"},
"exclude_records": {"type": "array", "items": {"type": "integer"}}
}
},
"timeout_seconds": 3600
},
"anomalies": anomalies
}
# No anomalies, proceed normally
return import_data(data)
```
### Configuration Review
```python
def update_firewall_rules(rules):
# Analyze impact
impact = analyze_impact(rules)
if impact["severity"] == "high":
return {
"__inquiry": {
"prompt": "High-impact firewall changes detected. Approve?",
"response_schema": {
"type": "object",
"properties": {
"approved": {"type": "boolean"},
"review_notes": {"type": "string"}
}
},
"assigned_to": get_security_team_lead(),
"timeout_seconds": 7200
},
"impact_analysis": impact,
"proposed_rules": rules
}
# Low impact, apply immediately
return apply_rules(rules)
```
## Best Practices
### 1. Clear Prompts
Write clear, actionable prompts:
✅ Good: "Approve deployment of api-service v2.1.0 to production?"
❌ Bad: "Continue?"
### 2. Reasonable Timeouts
Set appropriate timeout values:
- **Critical decisions**: 30-60 minutes
- **Routine approvals**: 2-4 hours
- **Non-urgent reviews**: 24-48 hours
### 3. Response Schemas
Define clear response schemas to validate user input:
```json
{
"type": "object",
"properties": {
"approved": {
"type": "boolean",
"description": "Whether to approve the action"
},
"comments": {
"type": "string",
"description": "Optional comments explaining the decision"
}
},
"required": ["approved"]
}
```
### 4. Assignment
Assign inquiries to specific users for accountability:
```python
{
"__inquiry": {
"prompt": "...",
"assigned_to": get_on_call_user_id()
}
}
```
### 5. Context Information
Include relevant context in the action result:
```python
return {
"__inquiry": {
"prompt": "Approve deployment?"
},
"deployment_details": {
"service": "api",
"version": "v2.1.0",
"changes": ["Added new endpoint", "Fixed bug #123"],
"tests_passed": True,
"ci_build_url": "https://ci.example.com/builds/456"
}
}
```
## Troubleshooting
### Inquiry Not Created
**Problem**: Action completes but no inquiry is created.
**Check**:
1. Action result contains `__inquiry` key
2. Completion listener is running
3. Check executor logs for errors
4. Verify inquiry table exists
### Execution Not Resuming
**Problem**: User responds but execution doesn't continue.
**Check**:
1. InquiryResponded message was published (check API logs)
2. Inquiry handler is running and consuming messages
3. Check executor logs for errors processing response
4. Verify execution exists and is in correct state
### Timeout Not Working
**Problem**: Inquiries not timing out automatically.
**Check**:
1. Timeout checker loop is running
2. `timeout_at` is set correctly in inquiry record
3. Check system time/timezone configuration
4. Review executor logs for timeout check errors
### Response Rejected
**Problem**: API rejects inquiry response.
**Check**:
1. Inquiry is still in `pending` status
2. Inquiry hasn't timed out
3. User is authorized (if `assigned_to` is set)
4. Response matches `response_schema` (when validation is enabled)
## Performance Considerations
### Database Indexes
Ensure these indexes exist for efficient inquiry queries:
```sql
CREATE INDEX idx_inquiry_status ON attune.inquiry(status);
CREATE INDEX idx_inquiry_assigned_status ON attune.inquiry(assigned_to, status);
CREATE INDEX idx_inquiry_timeout_at ON attune.inquiry(timeout_at) WHERE timeout_at IS NOT NULL;
```
### Message Queue
- Use separate consumer for inquiry responses
- Set appropriate prefetch count (10-20)
- Enable message acknowledgment
### Timeout Checking
- Run timeout checker every 60-120 seconds
- Use batched updates for efficiency
- Monitor for long-running timeout queries
## Security
### Input Validation
Always validate inquiry responses:
```rust
// TODO: Validate response against response_schema
if let Some(schema) = &inquiry.response_schema {
validate_json_schema(&request.response, schema)?;
}
```
### Authorization
Verify user permissions:
```rust
// Check assignment
if let Some(assigned_to) = inquiry.assigned_to {
if assigned_to != user.id {
return Err(ApiError::Forbidden("Not authorized"));
}
}
// Future: Check RBAC permissions
if !user.has_permission("inquiry:respond") {
return Err(ApiError::Forbidden("Missing permission"));
}
```
### Audit Trail
All inquiry responses are logged:
- Who responded
- When they responded
- What they responded with
- Original inquiry context
## Future Enhancements
### Planned Features
1. **Multi-step Approvals** - Chain multiple inquiries for approval workflows
2. **Conditional Resumption** - Resume execution differently based on response
3. **Inquiry Templates** - Reusable inquiry definitions
4. **Bulk Operations** - Approve/reject multiple inquiries at once
5. **Escalation** - Auto-reassign if no response within timeframe
6. **Reminder Notifications** - Alert users of pending inquiries
7. **Response Validation** - Validate responses against JSON schema
8. **Inquiry History** - View history of all inquiries for an execution chain
### Integration Opportunities
- **Slack/Teams** - Respond to inquiries via chat
- **Email** - Send inquiry notifications and accept email responses
- **Mobile Apps** - Native mobile inquiry interface
- **External Systems** - Webhook integration for external approval systems
## Related Documentation
- [Workflow Orchestration](workflow-orchestration.md)
- [Message Queue Architecture](message-queue.md)
- [Notifier Service](notifier-service.md)
- [API Documentation](api-overview.md)
- [Executor Service](executor-service.md)

View File

@@ -0,0 +1,375 @@
# Parameter Mapping Status
## Quick Reference
This document provides a quick overview of what exists and what needs to be implemented for rule parameter mapping.
---
## ✅ What Already Exists
### Database Schema
- **Migration:** `migrations/20240103000003_add_rule_action_params.sql`
- **Column:** `attune.rule.action_params` (JSONB, default `{}`)
- **Index:** `idx_rule_action_params_gin` (GIN index for efficient querying)
- **Status:** ✅ Complete
### Data Models
- **File:** `crates/common/src/models.rs`
- **Struct:** `rule::Rule` has `pub action_params: JsonValue` field
- **Status:** ✅ Complete
### API Layer
- **File:** `crates/api/src/dto/rule.rs`
- **Request DTOs:**
- `CreateRuleRequest.action_params` (with default `{}`)
- `UpdateRuleRequest.action_params` (optional)
- **Response DTOs:**
- `RuleResponse.action_params`
- **Status:** ✅ Complete
### Repository Layer
- **File:** `crates/common/src/repositories/rule.rs`
- **Operations:**
- `CreateRuleInput.action_params` included in INSERT
- `UpdateRuleInput.action_params` handled in UPDATE
- All SELECT queries include `action_params` column
- **Status:** ✅ Complete
### API Routes
- **File:** `crates/api/src/routes/rules.rs`
- **Handlers:**
- `create_rule()` accepts `action_params` from request
- `update_rule()` updates `action_params` if provided
- **Status:** ✅ Complete
### Data Flow (Static Parameters)
```
Rule.action_params (static JSON)
Enforcement.config (copied verbatim)
Execution.config (passed through)
Worker (receives as action parameters)
```
- **Status:** ✅ Working for static values
---
## ❌ What's Missing
### Template Resolution Logic
- **Needed:** Parse and resolve `{{ }}` templates in `action_params`
- **Location:** `crates/sensor/src/` (new module needed)
- **Status:** ❌ Not implemented
### Template Resolver Module
```rust
// NEW FILE: crates/sensor/src/template_resolver.rs
pub struct TemplateContext {
pub trigger_payload: JsonValue,
pub pack_config: JsonValue,
pub system_vars: JsonValue,
}
pub fn resolve_templates(
params: &JsonValue,
context: &TemplateContext
) -> Result<JsonValue> {
// Implementation needed
}
```
- **Status:** ❌ Does not exist
### Pack Config Loading
- **Needed:** Load pack configuration from database
- **Current:** Rule matcher doesn't load pack config
- **Required for:** `{{ pack.config.* }}` templates
- **Status:** ❌ Not implemented
### Integration in Rule Matcher
- **File:** `crates/sensor/src/rule_matcher.rs`
- **Method:** `create_enforcement()`
- **Current code (line 309):**
```rust
let config = Some(&rule.action_params);
```
- **Needed code:**
```rust
// Load pack config
let pack_config = self.load_pack_config(&rule.pack_ref).await?;
// Build template context
let context = TemplateContext {
trigger_payload: event.payload.clone().unwrap_or_default(),
pack_config,
system_vars: self.build_system_vars(rule, event),
};
// Resolve templates
let resolved_params = resolve_templates(&rule.action_params, &context)?;
let config = Some(resolved_params);
```
- **Status:** ❌ Not implemented
### Unit Tests
- **File:** `crates/sensor/src/template_resolver.rs` (tests module)
- **Needed tests:**
- Simple string substitution
- Nested object access
- Array element access
- Type preservation
- Missing value handling
- Pack config reference
- System variables
- Multiple templates in one string
- Invalid syntax handling
- **Status:** ❌ Not implemented
### Integration Tests
- **Needed:** End-to-end test of template resolution
- **Scenario:** Create rule with templates → fire event → verify enforcement has resolved params
- **Status:** ❌ Not implemented
---
## 📋 Implementation Checklist
### Phase 1: MVP (2-3 days)
- [ ] **Create template resolver module**
- [ ] Define `TemplateContext` struct
- [ ] Implement `resolve_templates()` function
- [ ] Regex pattern matching for `{{ }}`
- [ ] JSON path extraction with dot notation
- [ ] Type preservation logic
- [ ] Error handling for missing values
- [ ] Unit tests (9+ test cases)
- [ ] **Add pack config loading**
- [ ] Add method to load pack config from database
- [ ] Implement in-memory cache with TTL
- [ ] Handle missing pack config gracefully
- [ ] **Integrate with rule matcher**
- [ ] Update `create_enforcement()` method
- [ ] Load pack config before resolution
- [ ] Build template context
- [ ] Call template resolver
- [ ] Handle resolution errors
- [ ] Log warnings for missing values
- [ ] **System variables**
- [ ] Build system context (timestamp, rule ID, event ID)
- [ ] Document available system variables
- [ ] **Testing**
- [ ] Unit tests for template resolver
- [ ] Integration test: end-to-end flow
- [ ] Test with missing values
- [ ] Test with nested objects
- [ ] Test with arrays
- [ ] Test performance (benchmark)
- [ ] **Documentation**
- [x] User documentation (`docs/rule-parameter-mapping.md`) ✅
- [x] API documentation updates (`docs/api-rules.md`) ✅
- [ ] Code documentation (inline comments)
- [ ] Update sensor service docs
### Phase 2: Advanced Features (1-2 days, future)
- [ ] **Default values**
- [ ] Parse `| default: 'value'` syntax
- [ ] Apply defaults when value is null/missing
- [ ] Unit tests
- [ ] **Filters**
- [ ] `upper` - Convert to uppercase
- [ ] `lower` - Convert to lowercase
- [ ] `trim` - Remove whitespace
- [ ] `date: <format>` - Format timestamp
- [ ] `truncate: <length>` - Truncate string
- [ ] `json` - Serialize to JSON string
- [ ] Unit tests for each filter
- [ ] **Performance optimization**
- [ ] Cache compiled regex patterns
- [ ] Skip resolution if no `{{ }}` found
- [ ] Parallel template resolution
- [ ] Benchmark improvements
---
## 🔍 Key Implementation Details
### Current Enforcement Creation (line 306-348)
```rust
async fn create_enforcement(&self, rule: &Rule, event: &Event) -> Result<Id> {
let payload = event.payload.clone().unwrap_or_default();
let config = Some(&rule.action_params); // ← This line needs to change
let enforcement_id = sqlx::query_scalar!(
r#"
INSERT INTO attune.enforcement
(rule, rule_ref, trigger_ref, config, event, status, payload, condition, conditions)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
RETURNING id
"#,
Some(rule.id),
&rule.r#ref,
&rule.trigger_ref,
config, // ← Resolved params go here
Some(event.id),
EnforcementStatus::Created as EnforcementStatus,
payload,
EnforcementCondition::All as EnforcementCondition,
&rule.conditions
)
.fetch_one(&self.db)
.await?;
// ... rest of method
}
```
### Template Examples
**Input (Rule):**
```json
{
"action_params": {
"message": "Error in {{ trigger.payload.service }}: {{ trigger.payload.message }}",
"channel": "{{ pack.config.alert_channel }}",
"severity": "{{ trigger.payload.severity }}"
}
}
```
**Context:**
```json
{
"trigger": {
"payload": {
"service": "api-gateway",
"message": "Connection timeout",
"severity": "critical"
}
},
"pack": {
"config": {
"alert_channel": "#incidents"
}
}
}
```
**Output (Enforcement):**
```json
{
"config": {
"message": "Error in api-gateway: Connection timeout",
"channel": "#incidents",
"severity": "critical"
}
}
```
---
## 📊 Dependencies
### Existing (Already in Cargo.toml)
- `serde_json` - JSON manipulation ✅
- `regex` - Pattern matching ✅
- `anyhow` - Error handling ✅
- `sqlx` - Database access ✅
### New Dependencies
- **None required** - Can implement with existing dependencies
---
## 🎯 Success Criteria
- [ ] Static parameters continue to work unchanged
- [ ] Can reference `{{ trigger.payload.* }}` fields
- [ ] Can reference `{{ pack.config.* }}` fields
- [ ] Can reference `{{ system.* }}` variables
- [ ] Type preservation (strings, numbers, booleans, objects, arrays)
- [ ] Nested object access with dot notation works
- [ ] Array element access by index works
- [ ] Missing values handled gracefully (null + warning)
- [ ] Invalid syntax handled gracefully (literal + error)
- [ ] Unit tests pass (90%+ coverage)
- [ ] Integration tests pass
- [ ] Documentation accurate and complete
- [ ] No performance regression (<500µs overhead)
- [ ] Backward compatibility maintained (100%)
---
## 🚀 Getting Started
1. **Read documentation:**
- `docs/rule-parameter-mapping.md` - User guide
- `work-summary/2026-01-17-parameter-templating.md` - Technical spec
2. **Review current code:**
- `crates/sensor/src/rule_matcher.rs:306-348` - Where to integrate
- `crates/common/src/models.rs` - Rule model structure
- `migrations/20240103000003_add_rule_action_params.sql` - Schema
3. **Start implementation:**
- Create `crates/sensor/src/template_resolver.rs`
- Write unit tests first (TDD approach)
- Implement template parsing and resolution
- Integrate with rule_matcher
- Run integration tests
4. **Test thoroughly:**
- Unit tests for all edge cases
- Integration test with real database
- Manual testing with example rules
- Performance benchmarks
---
## 📚 Related Documentation
- [Rule Parameter Mapping Guide](./rule-parameter-mapping.md) - Complete user documentation
- [Rule Management API](./api-rules.md) - API reference with examples
- [Sensor Service Architecture](./sensor-service.md) - Service overview
- [Implementation Plan](../work-summary/2026-01-17-parameter-templating.md) - Technical specification
- [Session Summary](../work-summary/2026-01-17-session-parameter-mapping.md) - Discovery notes
---
## 🏷️ Status Summary
| Component | Status | Notes |
|-----------|--------|-------|
| Database schema | ✅ Complete | `action_params` column exists |
| Data models | ✅ Complete | Rule struct has field |
| API DTOs | ✅ Complete | Request/response support |
| API routes | ✅ Complete | CRUD operations work |
| Repository | ✅ Complete | All queries include field |
| Static parameters | ✅ Working | Flow end-to-end |
| Template resolver | ❌ Missing | Core implementation needed |
| Pack config loading | ❌ Missing | Required for `{{ pack.config }}` |
| Integration | ❌ Missing | Need to wire up resolver |
| Unit tests | ❌ Missing | Tests for resolver needed |
| Integration tests | ❌ Missing | E2E test needed |
| Documentation | ✅ Complete | User and tech docs done |
**Overall Status:** 📝 Documented, ⏳ Implementation Pending
**Priority:** P1 (High)
**Estimated Effort:** 2-3 days (MVP), 1-2 days (advanced features)
**Risk:** Low (backward compatible, well-scoped, clear requirements)
**Value:** High (unlocks production use cases, user expectation)

View File

@@ -0,0 +1,742 @@
# Rule Parameter Mapping
## Overview
Rules in Attune can specify parameters to pass to actions when triggered. These parameters can be:
1. **Static values** - Hard-coded values defined in the rule
2. **Dynamic from trigger payload** - Values extracted from the event that triggered the rule
3. **Dynamic from pack config** - Values from the pack's configuration
This enables flexible parameter passing without hardcoding values or requiring custom code.
---
## Parameter Mapping Format
Rule `action_params` uses a JSON object where each value can be:
- **Static**: A literal value (string, number, boolean, object, array)
- **Dynamic**: A template string using `{{ }}` syntax to reference runtime values
### Template Syntax
```
{{ source.path.to.value }}
```
**Available Sources:**
- `trigger.payload.*` - Data from the event payload
- `pack.config.*` - Configuration values from the pack
- `system.*` - System-provided values (timestamp, execution context)
---
## Static Parameter Example
The simplest form - just pass fixed values to the action:
```json
{
"ref": "slack.notify_on_error",
"pack_ref": "slack",
"trigger_ref": "core.error_event",
"action_ref": "slack.post_message",
"action_params": {
"channel": "#alerts",
"message": "An error occurred in the system",
"color": "danger"
}
}
```
When this rule triggers, the action receives exactly these parameters.
---
## Dynamic Parameters from Trigger Payload
Extract values from the event that triggered the rule.
### Example: Alert with Event Data
**Trigger Payload:**
```json
{
"severity": "error",
"service": "api-gateway",
"message": "Database connection failed",
"timestamp": "2024-01-15T10:30:00Z",
"metadata": {
"host": "api-01.example.com",
"error_code": "DB_CONN_TIMEOUT"
}
}
```
**Rule Definition:**
```json
{
"ref": "alerts.error_notification",
"pack_ref": "alerts",
"trigger_ref": "core.error_event",
"action_ref": "slack.post_message",
"action_params": {
"channel": "#incidents",
"message": "Error in {{ trigger.payload.service }}: {{ trigger.payload.message }}",
"severity": "{{ trigger.payload.severity }}",
"host": "{{ trigger.payload.metadata.host }}",
"timestamp": "{{ trigger.payload.timestamp }}"
}
}
```
**Resulting Action Parameters:**
```json
{
"channel": "#incidents",
"message": "Error in api-gateway: Database connection failed",
"severity": "error",
"host": "api-01.example.com",
"timestamp": "2024-01-15T10:30:00Z"
}
```
---
## Dynamic Parameters from Pack Config
Use configuration values stored at the pack level (useful for API keys, URLs, etc.).
### Example: Using Pack Configuration
**Pack Configuration:**
```json
{
"ref": "slack",
"config": {
"api_token": "xoxb-1234567890-abcdefghijk",
"default_channel": "#general",
"webhook_url": "https://hooks.slack.com/services/...",
"bot_name": "Attune Bot"
}
}
```
**Rule Definition:**
```json
{
"ref": "slack.auto_notify",
"pack_ref": "slack",
"trigger_ref": "core.notification_event",
"action_ref": "slack.post_message",
"action_params": {
"token": "{{ pack.config.api_token }}",
"channel": "{{ pack.config.default_channel }}",
"username": "{{ pack.config.bot_name }}",
"message": "{{ trigger.payload.message }}"
}
}
```
**Benefits:**
- Secrets stored in pack config, not in rules
- Easy to update credentials without changing rules
- Reuse configuration across multiple rules
---
## Mixed Parameters (Static + Dynamic)
Combine static and dynamic values in the same rule:
```json
{
"ref": "github.create_issue",
"pack_ref": "github",
"trigger_ref": "core.error_event",
"action_ref": "github.create_issue",
"action_params": {
"repo": "myorg/myrepo",
"token": "{{ pack.config.github_token }}",
"title": "Error: {{ trigger.payload.message }}",
"body": "Service {{ trigger.payload.service }} reported an error at {{ trigger.payload.timestamp }}",
"labels": ["bug", "automated"],
"assignees": ["oncall"]
}
}
```
---
## Nested Object Access
Access nested properties using dot notation:
```json
{
"action_params": {
"user_id": "{{ trigger.payload.user.id }}",
"user_name": "{{ trigger.payload.user.profile.name }}",
"metadata": {
"ip_address": "{{ trigger.payload.request.client_ip }}",
"user_agent": "{{ trigger.payload.request.headers.user_agent }}"
}
}
}
```
---
## Array Access
Access array elements by index:
```json
{
"action_params": {
"first_error": "{{ trigger.payload.errors.0 }}",
"primary_tag": "{{ trigger.payload.tags.0 }}"
}
}
```
---
## Default Values and Fallbacks
Provide default values when the referenced field doesn't exist:
```json
{
"action_params": {
"priority": "{{ trigger.payload.priority | default: 'medium' }}",
"assignee": "{{ trigger.payload.assignee | default: 'unassigned' }}"
}
}
```
---
## Type Preservation
Template values preserve their JSON types:
```json
{
"action_params": {
"count": "{{ trigger.payload.count }}", // Number: 42
"enabled": "{{ trigger.payload.enabled }}", // Boolean: true
"tags": "{{ trigger.payload.tags }}", // Array: ["a", "b"]
"metadata": "{{ trigger.payload.metadata }}" // Object: {"key": "value"}
}
}
```
---
## System Variables
Access system-provided values:
```json
{
"action_params": {
"execution_time": "{{ system.timestamp }}",
"rule_id": "{{ system.rule.id }}",
"rule_ref": "{{ system.rule.ref }}",
"event_id": "{{ system.event.id }}",
"enforcement_id": "{{ system.enforcement.id }}"
}
}
```
---
## String Interpolation
Embed multiple values in a single string:
```json
{
"action_params": {
"message": "User {{ trigger.payload.user_id }} performed {{ trigger.payload.action }} at {{ system.timestamp }}",
"subject": "[{{ trigger.payload.severity | upper }}] {{ trigger.payload.service }} Alert"
}
}
```
---
## Filters (Future Enhancement)
Apply transformations to values:
```json
{
"action_params": {
"uppercase_name": "{{ trigger.payload.name | upper }}",
"lowercase_email": "{{ trigger.payload.email | lower }}",
"formatted_date": "{{ trigger.payload.timestamp | date: '%Y-%m-%d' }}",
"truncated": "{{ trigger.payload.message | truncate: 100 }}"
}
}
```
**Available Filters:**
- `upper` - Convert to uppercase
- `lower` - Convert to lowercase
- `trim` - Remove whitespace
- `default: <value>` - Use default if null/missing
- `date: <format>` - Format timestamp
- `truncate: <length>` - Truncate string
- `json` - Serialize to JSON string
- `base64` - Base64 encode
- `length` - Get length/count
---
## Real-World Examples
### 1. Webhook to Slack Alert
```json
{
"ref": "monitoring.webhook_to_slack",
"pack_ref": "monitoring",
"trigger_ref": "core.webhook",
"action_ref": "slack.post_message",
"action_params": {
"channel": "{{ pack.config.alert_channel }}",
"token": "{{ pack.config.slack_token }}",
"message": "⚠️ Alert from {{ trigger.payload.source }}: {{ trigger.payload.message }}",
"attachments": [
{
"color": "{{ trigger.payload.severity | default: 'warning' }}",
"fields": [
{
"title": "Service",
"value": "{{ trigger.payload.service }}",
"short": true
},
{
"title": "Environment",
"value": "{{ trigger.payload.environment | default: 'production' }}",
"short": true
}
],
"footer": "Attune Automation",
"ts": "{{ system.timestamp }}"
}
]
}
}
```
### 2. Error to Ticket System
```json
{
"ref": "errors.create_ticket",
"pack_ref": "errors",
"trigger_ref": "core.error_event",
"action_ref": "jira.create_issue",
"action_params": {
"project": "{{ pack.config.jira_project }}",
"auth": {
"username": "{{ pack.config.jira_username }}",
"token": "{{ pack.config.jira_token }}"
},
"issuetype": "Bug",
"summary": "[{{ trigger.payload.severity }}] {{ trigger.payload.service }}: {{ trigger.payload.message }}",
"description": {
"type": "doc",
"content": [
{
"type": "paragraph",
"content": [
{
"type": "text",
"text": "Error Details:\n\nService: {{ trigger.payload.service }}\nHost: {{ trigger.payload.host }}\nTimestamp: {{ trigger.payload.timestamp }}\n\nStack Trace:\n{{ trigger.payload.stack_trace }}"
}
]
}
]
},
"priority": "{{ trigger.payload.priority | default: 'Medium' }}",
"labels": ["automated", "{{ trigger.payload.service }}"]
}
}
```
### 3. Metric Threshold to PagerDuty
```json
{
"ref": "monitoring.critical_alert",
"pack_ref": "monitoring",
"trigger_ref": "metrics.threshold_exceeded",
"action_ref": "pagerduty.trigger_incident",
"action_params": {
"routing_key": "{{ pack.config.pagerduty_routing_key }}",
"event_action": "trigger",
"payload": {
"summary": "{{ trigger.payload.metric_name }} exceeded threshold on {{ trigger.payload.host }}",
"severity": "critical",
"source": "{{ trigger.payload.host }}",
"custom_details": {
"metric": "{{ trigger.payload.metric_name }}",
"current_value": "{{ trigger.payload.current_value }}",
"threshold": "{{ trigger.payload.threshold }}",
"duration": "{{ trigger.payload.duration_seconds }}s"
}
},
"dedup_key": "{{ trigger.payload.host }}_{{ trigger.payload.metric_name }}"
}
}
```
### 4. Timer to HTTP Request
```json
{
"ref": "healthcheck.periodic_ping",
"pack_ref": "healthcheck",
"trigger_ref": "core.interval_timer",
"action_ref": "http.request",
"action_params": {
"method": "POST",
"url": "{{ pack.config.healthcheck_endpoint }}",
"headers": {
"Authorization": "Bearer {{ pack.config.api_token }}",
"Content-Type": "application/json"
},
"body": {
"source": "attune",
"timestamp": "{{ system.timestamp }}",
"rule": "{{ system.rule.ref }}"
},
"timeout": 30
}
}
```
---
## Implementation Details
### Template Processing Flow
1. **Rule Evaluation** - When an event matches a rule
2. **Template Extraction** - Identify `{{ }}` patterns in `action_params`
3. **Context Building** - Assemble available data:
- `trigger.payload` - Event payload data
- `pack.config` - Pack configuration
- `system.*` - System-provided values
4. **Value Resolution** - Extract values from context using dot notation paths
5. **Type Conversion** - Preserve JSON types (string, number, boolean, object, array)
6. **Parameter Assembly** - Build final parameter object
7. **Enforcement Creation** - Store resolved parameters in enforcement config
8. **Execution Creation** - Pass parameters to action execution
### Error Handling
**Missing Values:**
- If a referenced value doesn't exist and no default is provided, use `null`
- Log warning: `"Template reference not found: trigger.payload.missing_field"`
**Invalid Syntax:**
- If template syntax is invalid, log error and use the raw string
- Log error: `"Invalid template syntax: {{ incomplete"`
**Type Mismatches:**
- Preserve JSON types when possible
- Convert to string as fallback for complex interpolation
---
## Configuration in Pack
Pack configuration should be stored securely and can include:
```json
{
"ref": "mypack",
"config": {
"api_token": "secret-token-here",
"api_url": "https://api.example.com",
"default_timeout": 30,
"retry_attempts": 3,
"enable_notifications": true,
"notification_channels": ["#alerts", "#monitoring"]
}
}
```
**Security Note:** Sensitive values (API keys, tokens, passwords) should be stored in pack config, not in rule definitions, since:
- Pack configs can be encrypted
- Easier to rotate credentials
- Rules can be version controlled without exposing secrets
---
## Best Practices
### 1. Use Pack Config for Secrets
**Bad:**
```json
{
"action_params": {
"api_key": "sk_live_abc123xyz789" // Hardcoded secret
}
}
```
**Good:**
```json
{
"action_params": {
"api_key": "{{ pack.config.api_key }}"
}
}
```
### 2. Provide Defaults for Optional Fields
```json
{
"action_params": {
"priority": "{{ trigger.payload.priority | default: 'medium' }}",
"assignee": "{{ trigger.payload.assignee | default: 'unassigned' }}"
}
}
```
### 3. Use Descriptive Template Paths
```json
{
"action_params": {
"user_email": "{{ trigger.payload.user.email }}",
"user_id": "{{ trigger.payload.user.id }}"
}
}
```
### 4. Keep Static Values Where Appropriate
If a value never changes, keep it static:
```json
{
"action_params": {
"service_name": "my-service", // Static - never changes
"error_code": "{{ trigger.payload.code }}" // Dynamic - from event
}
}
```
### 5. Test Your Templates
Create test events with sample payloads to verify your templates extract the correct values.
---
## Testing Parameter Mapping
### 1. Manual Testing via API
Create a test event with known payload:
```bash
curl -X POST http://localhost:8080/api/v1/events \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"trigger_ref": "core.test_event",
"payload": {
"message": "Test message",
"severity": "info",
"user": {
"id": 123,
"name": "Alice"
}
}
}'
```
Check the resulting enforcement and execution to verify parameters were resolved correctly:
```bash
# Check enforcement
curl -X GET http://localhost:8080/api/v1/enforcements/1 \
-H "Authorization: Bearer $TOKEN"
# Check execution
curl -X GET http://localhost:8080/api/v1/executions/1 \
-H "Authorization: Bearer $TOKEN"
```
### 2. Validate Parameter Resolution
Look for the resolved parameters in the execution's `config` field:
```json
{
"id": 1,
"config": {
"message": "Test message", // Resolved from trigger.payload.message
"severity": "info", // Resolved from trigger.payload.severity
"user_id": 123, // Resolved from trigger.payload.user.id
"user_name": "Alice" // Resolved from trigger.payload.user.name
}
}
```
---
## Migration Guide
### From Static to Dynamic Parameters
**Before (Static):**
```json
{
"action_params": {
"message": "An error occurred"
}
}
```
**After (Dynamic):**
```json
{
"action_params": {
"message": "Error: {{ trigger.payload.message }}"
}
}
```
### From Hardcoded Secrets to Pack Config
**Before (Hardcoded):**
```json
{
"action_params": {
"api_key": "sk_live_abc123"
}
}
```
**Steps:**
1. Add secret to pack config
2. Update rule to reference pack config
3. Remove hardcoded value
**After (Secure):**
```json
{
"action_params": {
"api_key": "{{ pack.config.api_key }}"
}
}
```
---
## Troubleshooting
### Templates Not Resolving
**Problem:** Parameters contain literal `{{ ... }}` strings instead of resolved values.
**Solutions:**
1. Check template syntax is correct
2. Verify the referenced path exists in the event payload
3. Check sensor service logs for template resolution errors
4. Use default values for optional fields
### Incorrect Values
**Problem:** Parameters have wrong values.
**Solutions:**
1. Inspect event payload structure: `SELECT payload FROM attune.event WHERE id = X;`
2. Verify the dot notation path matches the payload structure
3. Check for typos in template paths
4. Use system logs to see template resolution details
### Type Conversion Issues
**Problem:** Numbers or booleans become strings.
**Solutions:**
1. Ensure the source value is the correct type in the payload
2. Check if string interpolation is converting types
3. Use direct references without string interpolation for non-string types
---
## Future Enhancements
### 1. Conditional Parameters
```json
{
"action_params": {
"channel": "{% if trigger.payload.severity == 'critical' %}#incidents{% else %}#monitoring{% endif %}"
}
}
```
### 2. Advanced Filters
- Mathematical operations: `{{ trigger.payload.value | multiply: 100 }}`
- String manipulation: `{{ trigger.payload.text | replace: 'old', 'new' }}`
- Array operations: `{{ trigger.payload.items | join: ', ' }}`
### 3. Custom Functions
```json
{
"action_params": {
"timestamp": "{{ now() }}",
"uuid": "{{ uuid() }}",
"hash": "{{ hash(trigger.payload.data) }}"
}
}
```
### 4. Multi-Source Merging
```json
{
"action_params": {
"user": "{{ trigger.payload.user | merge: pack.config.default_user }}"
}
}
```
---
## Related Documentation
- [Rule Management API](./api-rules.md)
- [Event Management API](./api-events-enforcements.md)
- [Pack Management API](./api-packs.md)
- [Sensor Service Architecture](./sensor-service.md)
- [Security Best Practices](./security-review-2024-01-02.md)
- [Secrets Management](./secrets-management.md)
---
## Summary
Rule parameter mapping provides a powerful way to:
1. **Decouple rules from data** - Rules reference data locations, not specific values
2. **Reuse pack configuration** - Share credentials and settings across rules
3. **Dynamic automation** - Respond to events with context-aware actions
4. **Secure secrets** - Store sensitive data in pack config, not rule definitions
5. **Flexible workflows** - Build complex automations without custom code
**Key Concepts:**
- Static values for constants
- `{{ trigger.payload.* }}` for event data
- `{{ pack.config.* }}` for pack configuration
- `{{ system.* }}` for system-provided values
- Filters and defaults for robust templates
This feature enables Attune to match the flexibility of platforms like StackStorm while maintaining a clean, declarative approach to automation.

View File

@@ -0,0 +1,523 @@
# Rule Trigger Parameters
## Overview
Rules in Attune can now specify `trigger_params` to configure trigger behavior and filter which events should activate the rule. This complements `action_params` (which configures the action to execute) by providing control over trigger matching and event filtering.
---
## What are Trigger Params?
**Trigger params** are JSON parameters stored in a rule that can be used to:
1. **Filter events** - Only match events with specific payload characteristics
2. **Configure trigger behavior** - Customize how the trigger should match events for this specific rule
3. **Pass metadata** - Provide additional context about how events should be processed
This allows multiple rules to reference the same trigger type but respond to different subsets of events.
---
## Use Cases
### 1. Event Filtering by Severity
**Scenario:** You have a generic error trigger, but different rules should handle different severity levels.
```json
{
"ref": "alerts.critical_errors",
"trigger_ref": "core.error_event",
"action_ref": "pagerduty.create_incident",
"trigger_params": {
"severity": "critical",
"min_priority": 5
},
"action_params": {
"routing_key": "{{ pack.config.pagerduty_key }}",
"severity": "critical"
}
}
```
```json
{
"ref": "alerts.minor_errors",
"trigger_ref": "core.error_event",
"action_ref": "slack.post_message",
"trigger_params": {
"severity": "warning",
"max_priority": 2
},
"action_params": {
"channel": "#monitoring"
}
}
```
Both rules use the same `core.error_event` trigger, but `trigger_params` specifies which events each rule should handle.
### 2. Service-Specific Monitoring
**Scenario:** Monitor multiple services with the same trigger type, but different rules per service.
```json
{
"ref": "monitoring.api_gateway_health",
"trigger_ref": "core.health_check",
"action_ref": "alerts.notify_team",
"trigger_params": {
"service": "api-gateway",
"environment": "production"
},
"action_params": {
"team": "backend"
}
}
```
```json
{
"ref": "monitoring.database_health",
"trigger_ref": "core.health_check",
"action_ref": "alerts.notify_team",
"trigger_params": {
"service": "postgresql",
"environment": "production"
},
"action_params": {
"team": "database"
}
}
```
### 3. Threshold-Based Rules
**Scenario:** Different rules for different metric thresholds.
```json
{
"ref": "metrics.cpu_high_warning",
"trigger_ref": "monitoring.cpu_usage",
"action_ref": "slack.post_message",
"trigger_params": {
"threshold": 80,
"comparison": "greater_than",
"duration_seconds": 300
},
"action_params": {
"channel": "#ops"
}
}
```
```json
{
"ref": "metrics.cpu_critical",
"trigger_ref": "monitoring.cpu_usage",
"action_ref": "pagerduty.create_incident",
"trigger_params": {
"threshold": 95,
"comparison": "greater_than",
"duration_seconds": 60
},
"action_params": {
"routing_key": "{{ pack.config.pagerduty_key }}"
}
}
```
---
## How Trigger Params Work
### Architecture Flow
```
Sensor → Generates Event with Payload
Trigger Type Matched
Rules Evaluated (foreach rule matching trigger type):
1. Check trigger_params against event payload
2. Evaluate conditions
3. If both pass → create Enforcement
Enforcement → Execution (with action_params)
```
### Evaluation Logic
When an event fires:
1. **Find matching rules** - All rules that reference the event's trigger type
2. **Filter by trigger_params** - For each rule, check if the event payload matches the rule's `trigger_params`
3. **Evaluate conditions** - Apply the rule's `conditions` logic
4. **Create enforcement** - If both checks pass, activate the rule
---
## Trigger Params vs Conditions
Both `trigger_params` and `conditions` can filter events, but they serve different purposes:
| Feature | `trigger_params` | `conditions` |
|---------|------------------|--------------|
| **Purpose** | Declare intent about which events this rule handles | Complex conditional logic for rule activation |
| **Format** | Simple JSON key-value pairs | JSON Logic expressions or complex DSL |
| **Evaluation** | Direct comparison/matching | Expression evaluation engine |
| **Use Case** | Event filtering, metadata | Business logic, complex conditions |
| **Performance** | Fast direct matching | May require expression parsing |
### Example: Using Both Together
```json
{
"ref": "alerts.critical_api_errors",
"trigger_ref": "core.error_event",
"trigger_params": {
"service": "api-gateway",
"severity": "error"
},
"conditions": {
"and": [
{"var": "trigger.payload.status_code", ">=": 500},
{"var": "trigger.payload.retry_count", ">": 3},
{
"or": [
{"var": "trigger.payload.endpoint", "in": ["/auth", "/payment"]},
{"var": "trigger.payload.customer_impact", "==": true}
]
}
]
},
"action_params": {
"priority": "P1"
}
}
```
Here:
- `trigger_params` declares: "This rule handles API Gateway errors"
- `conditions` adds: "But only if status >= 500, retries > 3, AND it's a critical endpoint or impacts customers"
---
## Implementation in Different Services
### Executor Service
The executor service uses `trigger_params` when evaluating which rules should fire for an event:
```rust
// Pseudo-code
async fn evaluate_rules_for_event(event: &Event) -> Vec<Enforcement> {
let rules = find_rules_by_trigger(event.trigger_id);
let mut enforcements = Vec::new();
for rule in rules {
// Check trigger_params match
if !matches_trigger_params(&rule.trigger_params, &event.payload) {
continue; // Skip this rule
}
// Check conditions
if !evaluate_conditions(&rule.conditions, &event.payload) {
continue;
}
// Rule matches - create enforcement
enforcements.push(create_enforcement(&rule, &event));
}
enforcements
}
```
### API Service
The API allows setting `trigger_params` when creating or updating rules:
**Create Rule Request:**
```json
POST /api/v1/rules
{
"ref": "mypack.my_rule",
"pack_ref": "mypack",
"trigger_ref": "core.webhook",
"action_ref": "slack.post_message",
"trigger_params": {
"webhook_source": "github",
"event_type": "pull_request"
},
"action_params": {
"channel": "#github-prs"
},
"enabled": true
}
```
**Update Rule Request:**
```json
PUT /api/v1/rules/mypack.my_rule
{
"trigger_params": {
"webhook_source": "github",
"event_type": ["pull_request", "push"]
}
}
```
---
## Best Practices
### 1. Use Trigger Params for Simple Filtering
**Good:**
```json
{
"trigger_params": {
"severity": "critical",
"service": "api"
}
}
```
**Not Recommended (use conditions instead):**
```json
{
"trigger_params": {
"complex_logic": "if severity > 3 and (service == 'api' or service == 'web')"
}
}
```
### 2. Keep Trigger Params Declarative
Trigger params should describe *what* events to match, not *how* to process them:
**Good:**
```json
{
"trigger_params": {
"environment": "production",
"region": "us-east-1"
}
}
```
**Bad:**
```json
{
"trigger_params": {
"should_page_oncall": true,
"escalation_policy": "immediate"
}
}
```
The second example describes processing behavior (belongs in action_params).
### 3. Use Empty Object as Default
If a rule should match all events from a trigger, use an empty object:
```json
{
"trigger_params": {}
}
```
This explicitly states "no filtering, match all events."
### 4. Document Expected Fields
When creating trigger types, document what `trigger_params` fields are expected:
```yaml
# Trigger: core.error_event
# Expected trigger_params:
# - severity: string (error|warning|info)
# - service: string (optional, filter by service name)
# - min_priority: number (optional, minimum priority level)
```
### 5. Combine with Conditions for Complex Logic
Use `trigger_params` for simple key-value filtering, and `conditions` for complex expressions:
```json
{
"trigger_params": {
"event_type": "metric_alert"
},
"conditions": {
"and": [
{"var": "metric_value", ">": 100},
{"var": "duration_minutes", ">=": 5}
]
}
}
```
---
## Schema and Validation
### Database Schema
```sql
-- In attune.rule table
trigger_params JSONB DEFAULT '{}'::jsonb
```
### API Schema
**OpenAPI Definition:**
```yaml
trigger_params:
type: object
description: Parameters for trigger configuration and event filtering
default: {}
example:
severity: high
service: api-gateway
```
### Runtime Validation
Trigger params are stored as JSON and validated at runtime:
- Must be valid JSON object
- Keys should match trigger's param_schema (if defined)
- Values are compared against event payload during evaluation
---
## Migration Guide
If you have existing rules without `trigger_params`, they will default to `{}` (empty object), which means "match all events from this trigger."
No action is required for existing rules unless you want to add filtering.
### Adding Trigger Params to Existing Rules
**Before:**
```json
{
"ref": "alerts.notify_errors",
"trigger_ref": "core.error_event",
"action_ref": "slack.post_message",
"conditions": {
"var": "severity",
"==": "critical"
}
}
```
**After (moving simple filtering to trigger_params):**
```json
{
"ref": "alerts.notify_errors",
"trigger_ref": "core.error_event",
"action_ref": "slack.post_message",
"trigger_params": {
"severity": "critical"
},
"conditions": {}
}
```
This improves performance by filtering earlier in the evaluation pipeline.
---
## Examples
### Example 1: Webhook Source Filtering
```json
{
"ref": "webhooks.github_pr_opened",
"trigger_ref": "core.webhook_received",
"trigger_params": {
"source": "github",
"event": "pull_request",
"action": "opened"
},
"action_ref": "slack.post_message",
"action_params": {
"channel": "#pull-requests",
"message": "New PR: {{ trigger.payload.title }} by {{ trigger.payload.user }}"
}
}
```
### Example 2: Multi-Environment Monitoring
```json
{
"ref": "monitoring.prod_cpu_alert",
"trigger_ref": "monitoring.cpu_threshold",
"trigger_params": {
"environment": "production",
"threshold_type": "critical"
},
"action_ref": "pagerduty.create_incident",
"action_params": {
"severity": "critical"
}
}
```
```json
{
"ref": "monitoring.staging_cpu_alert",
"trigger_ref": "monitoring.cpu_threshold",
"trigger_params": {
"environment": "staging",
"threshold_type": "warning"
},
"action_ref": "slack.post_message",
"action_params": {
"channel": "#staging-alerts"
}
}
```
### Example 3: Timer with Context
```json
{
"ref": "backups.hourly_db_backup",
"trigger_ref": "core.intervaltimer",
"trigger_params": {
"interval_minutes": 60,
"context": "database_backup"
},
"action_ref": "backups.run_backup",
"action_params": {
"backup_type": "incremental",
"retention_days": 7
}
}
```
---
## Related Documentation
- [Rule Parameter Mapping](./rule-parameter-mapping.md) - Dynamic parameters from event payload
- [Trigger and Sensor Architecture](./trigger-sensor-architecture.md) - How triggers and sensors work
- [API Rules Endpoint](./api-rules.md) - Creating and managing rules via API
---
## Summary
- **`trigger_params`** provides a way to filter which events activate a rule
- Use for simple key-value filtering and event categorization
- Complements `conditions` for complex business logic
- Improves rule organization when multiple rules share the same trigger type
- Defaults to `{}` (match all events) if not specified
- Stored as JSONB in the database for flexible querying

View File

@@ -0,0 +1,641 @@
# Workflow Execution Engine
## Overview
The Workflow Execution Engine is responsible for orchestrating the execution of workflows in Attune. It manages task dependencies, parallel execution, state transitions, context passing, retries, timeouts, and error handling.
## Architecture
The execution engine consists of four main components:
### 1. Task Graph Builder (`workflow/graph.rs`)
**Purpose:** Converts workflow definitions into executable task graphs with dependency information.
**Key Features:**
- Builds directed acyclic graph (DAG) from workflow tasks
- Topological sorting for execution order
- Dependency computation from task transitions
- Cycle detection
- Entry point identification
**Data Structures:**
- `TaskGraph`: Complete executable graph with nodes, dependencies, and execution order
- `TaskNode`: Individual task with configuration, transitions, and dependencies
- `TaskTransitions`: Success/failure/complete/timeout transitions and decision branches
- `RetryConfig`: Retry configuration with backoff strategies
**Example Usage:**
```rust
use attune_executor::workflow::{TaskGraph, parse_workflow_yaml};
let workflow = parse_workflow_yaml(yaml_content)?;
let graph = TaskGraph::from_workflow(&workflow)?;
// Get entry points (tasks with no dependencies)
for entry in &graph.entry_points {
println!("Entry point: {}", entry);
}
// Get tasks ready to execute
let completed = HashSet::new();
let ready = graph.ready_tasks(&completed);
```
### 2. Context Manager (`workflow/context.rs`)
**Purpose:** Manages workflow execution context, including variables, parameters, and template rendering.
**Key Features:**
- Workflow-level and task-level variable management
- Jinja2-like template rendering with `{{ variable }}` syntax
- Task result storage and retrieval
- With-items iteration support (current item and index)
- Nested value access (e.g., `{{ parameters.config.server.port }}`)
- Context import/export for persistence
**Variable Scopes:**
- `parameters.*` - Input parameters to the workflow
- `vars.*` or `variables.*` - Workflow-scoped variables
- `task.*` or `tasks.*` - Task results
- `item` - Current item in with-items iteration
- `index` - Current index in with-items iteration
- `system.*` - System variables (e.g., workflow start time)
**Example Usage:**
```rust
use attune_executor::workflow::WorkflowContext;
use serde_json::json;
let params = json!({"name": "Alice"});
let mut ctx = WorkflowContext::new(params, HashMap::new());
// Render template
let result = ctx.render_template("Hello {{ parameters.name }}!")?;
// Result: "Hello Alice!"
// Store task result
ctx.set_task_result("task1", json!({"status": "success"}));
// Publish variables from result
let result = json!({"output": "value"});
ctx.publish_from_result(&result, &["my_var".to_string()], None)?;
```
### 3. Task Executor (`workflow/task_executor.rs`)
**Purpose:** Executes individual workflow tasks with support for different task types, retries, and timeouts.
**Key Features:**
- Action task execution (queues actions for workers)
- Parallel task execution (spawns multiple tasks concurrently)
- Workflow task execution (nested workflows - TODO)
- With-items iteration (batch processing with concurrency limits)
- Conditional execution (when clauses)
- Retry logic with configurable backoff strategies
- Timeout handling
- Task result publishing to context
**Task Types:**
- **Action**: Execute a single action
- **Parallel**: Execute multiple sub-tasks concurrently
- **Workflow**: Execute a nested workflow (not yet implemented)
**Retry Strategies:**
- **Constant**: Fixed delay between retries
- **Linear**: Linearly increasing delay
- **Exponential**: Exponentially increasing delay with optional max delay
**Example Task Execution Flow:**
```
1. Check if task should be skipped (when condition)
2. Check if task has with-items iteration
- If yes, process items in batches with concurrency limits
- If no, execute single task
3. Render task input with context
4. Execute based on task type (action/parallel/workflow)
5. Apply timeout if configured
6. Handle retries on failure
7. Publish variables from result
8. Update task execution record in database
```
### 4. Workflow Coordinator (`workflow/coordinator.rs`)
**Purpose:** Main orchestration component that manages the complete workflow execution lifecycle.
**Key Features:**
- Workflow lifecycle management (start, pause, resume, cancel)
- State management (completed, failed, skipped tasks)
- Concurrent task execution coordination
- Database state persistence
- Execution result aggregation
- Error handling and recovery
**Workflow Execution States:**
- `Requested` - Workflow execution requested
- `Scheduling` - Being scheduled
- `Scheduled` - Ready to execute
- `Running` - Currently executing
- `Completed` - Successfully completed
- `Failed` - Failed with errors
- `Cancelled` - Cancelled by user
- `Timeout` - Timed out
**Example Usage:**
```rust
use attune_executor::workflow::WorkflowCoordinator;
use serde_json::json;
let coordinator = WorkflowCoordinator::new(db_pool, mq);
// Start workflow execution
let handle = coordinator
.start_workflow("my_pack.my_workflow", json!({"param": "value"}), None)
.await?;
// Execute to completion
let result = handle.execute().await?;
println!("Status: {:?}", result.status);
println!("Completed tasks: {}", result.completed_tasks);
println!("Failed tasks: {}", result.failed_tasks);
// Or control execution
handle.pause(Some("User requested pause".to_string())).await?;
handle.resume().await?;
handle.cancel().await?;
// Check status
let status = handle.status().await;
println!("Current: {}/{} tasks", status.completed_tasks, status.total_tasks);
```
## Execution Flow
### High-Level Workflow Execution
```
1. Load workflow definition from database
2. Parse workflow YAML definition
3. Build task graph with dependencies
4. Create parent execution record
5. Initialize workflow context with parameters and variables
6. Create workflow execution record in database
7. Enter execution loop:
a. Check if workflow is paused -> wait
b. Check if workflow is complete -> exit
c. Get ready tasks (dependencies satisfied)
d. Spawn async execution for each ready task
e. Wait briefly before checking again
8. Aggregate results and return
```
### Task Execution Flow
```
1. Create task execution record in database
2. Get current workflow context
3. Execute task (action/parallel/workflow/with-items)
4. Update task execution record with result
5. Update workflow state:
- Add to completed_tasks on success
- Add to failed_tasks on failure (unless retrying)
- Add to skipped_tasks if skipped
- Update context with task result
6. Persist workflow state to database
```
## Database Schema
### workflow_execution Table
Stores workflow execution state:
```sql
CREATE TABLE attune.workflow_execution (
id BIGSERIAL PRIMARY KEY,
execution BIGINT NOT NULL REFERENCES attune.execution(id),
workflow_def BIGINT NOT NULL REFERENCES attune.workflow_definition(id),
current_tasks TEXT[] NOT NULL DEFAULT '{}',
completed_tasks TEXT[] NOT NULL DEFAULT '{}',
failed_tasks TEXT[] NOT NULL DEFAULT '{}',
skipped_tasks TEXT[] NOT NULL DEFAULT '{}',
variables JSONB NOT NULL DEFAULT '{}',
task_graph JSONB NOT NULL,
status execution_status_enum NOT NULL,
error_message TEXT,
paused BOOLEAN NOT NULL DEFAULT false,
pause_reason TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
```
### workflow_task_execution Table
Stores individual task execution state:
```sql
CREATE TABLE attune.workflow_task_execution (
id BIGSERIAL PRIMARY KEY,
workflow_execution BIGINT NOT NULL REFERENCES attune.workflow_execution(id),
execution BIGINT NOT NULL REFERENCES attune.execution(id),
task_name TEXT NOT NULL,
task_index INTEGER,
task_batch INTEGER,
status execution_status_enum NOT NULL,
started_at TIMESTAMP WITH TIME ZONE,
completed_at TIMESTAMP WITH TIME ZONE,
duration_ms BIGINT,
result JSONB,
error JSONB,
retry_count INTEGER NOT NULL DEFAULT 0,
max_retries INTEGER NOT NULL DEFAULT 0,
next_retry_at TIMESTAMP WITH TIME ZONE,
timeout_seconds INTEGER,
timed_out BOOLEAN NOT NULL DEFAULT false,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
```
## Template Rendering
### Syntax
Templates use Jinja2-like syntax with `{{ expression }}`:
```yaml
tasks:
- name: greet
action: core.echo
input:
message: "Hello {{ parameters.name }}!"
- name: process
action: core.process
input:
data: "{{ task.greet.result.output }}"
count: "{{ variables.counter }}"
```
### Supported Expressions
**Parameters:**
```
{{ parameters.name }}
{{ parameters.config.server.port }}
```
**Variables:**
```
{{ vars.my_variable }}
{{ variables.counter }}
{{ my_var }} # Direct variable reference
```
**Task Results:**
```
{{ task.task_name.result }}
{{ task.task_name.output.key }}
{{ tasks.previous_task.status }}
```
**With-Items Context:**
```
{{ item }}
{{ item.name }}
{{ index }}
```
**System Variables:**
```
{{ system.workflow_start }}
```
## With-Items Iteration
Execute a task multiple times with different items:
```yaml
tasks:
- name: process_servers
action: server.configure
with_items: "{{ parameters.servers }}"
batch_size: 5 # Process 5 items at a time
concurrency: 10 # Max 10 concurrent executions
input:
server: "{{ item.hostname }}"
index: "{{ index }}"
```
**Features:**
- Batch processing: Process items in batches of specified size
- Concurrency control: Limit number of concurrent executions
- Context isolation: Each iteration has its own `item` and `index`
- Result aggregation: All results collected in array
## Retry Strategies
### Constant Backoff
Fixed delay between retries:
```yaml
tasks:
- name: flaky_task
action: external.api_call
retry:
count: 3
delay: 10 # 10 seconds between each retry
backoff: constant
```
### Linear Backoff
Linearly increasing delay:
```yaml
retry:
count: 5
delay: 5
backoff: linear
# Delays: 5s, 10s, 15s, 20s, 25s
```
### Exponential Backoff
Exponentially increasing delay:
```yaml
retry:
count: 5
delay: 2
backoff: exponential
max_delay: 60
# Delays: 2s, 4s, 8s, 16s, 32s (capped at 60s)
```
## Task Transitions
Control workflow flow with transitions:
```yaml
tasks:
- name: check
action: core.check_status
on_success: deploy # Go to deploy on success
on_failure: rollback # Go to rollback on failure
on_complete: notify # Always go to notify
on_timeout: alert # Go to alert on timeout
- name: decision
action: core.evaluate
decision:
- when: "{{ task.decision.result.action == 'approve' }}"
next: deploy
- when: "{{ task.decision.result.action == 'reject' }}"
next: rollback
- default: true
next: manual_review
```
## Error Handling
### Task Execution Errors
Errors are captured with:
- Error message
- Error type
- Optional error details (JSON)
### Workflow Failure Handling
- Individual task failures don't immediately stop the workflow
- Dependent tasks won't execute if prerequisites failed
- Workflow completes when all executable tasks finish
- Final status is `Failed` if any task failed
### Retry on Error
```yaml
retry:
count: 3
delay: 5
backoff: exponential
on_error: "{{ result.error_code == 'RETRY_ABLE' }}" # Only retry specific errors
```
## Parallel Execution
Execute multiple tasks concurrently:
```yaml
tasks:
- name: parallel_checks
type: parallel
tasks:
- name: check_service_a
action: monitoring.check_health
input:
service: "service-a"
- name: check_service_b
action: monitoring.check_health
input:
service: "service-b"
- name: check_database
action: monitoring.check_db
on_success: deploy
on_failure: abort
```
**Features:**
- All sub-tasks execute concurrently
- Parent task waits for all sub-tasks to complete
- Success only if all sub-tasks succeed
- Individual sub-task results aggregated
## Conditional Execution
Skip tasks based on conditions:
```yaml
tasks:
- name: deploy
action: deployment.deploy
when: "{{ parameters.environment == 'production' }}"
input:
version: "{{ parameters.version }}"
```
**When Clause Evaluation:**
- Template rendered with current context
- Evaluated as boolean (truthy/falsy)
- Task skipped if condition is false
## State Persistence
Workflow state is persisted to the database after every task completion:
- Current executing tasks
- Completed tasks list
- Failed tasks list
- Skipped tasks list
- Workflow variables (entire context)
- Execution status
- Pause state and reason
- Error messages
This enables:
- Workflow resume after service restart
- Pause/resume functionality
- Execution history and auditing
- Progress monitoring
## Integration Points
### Message Queue
Tasks queue action executions via RabbitMQ:
```rust
// Task executor creates execution record
let execution = create_execution_record(...).await?;
// Queues execution for worker (TODO: implement MQ publishing)
self.mq.publish_execution_request(execution.id, action_ref, &input).await?;
```
### Worker Coordination
- Executor creates execution records
- Workers pick up and execute actions
- Workers update execution status
- Coordinator monitors completion (TODO: implement completion listener)
### Event Publishing
Workflow events should be published for:
- Workflow started
- Workflow completed/failed
- Task started/completed/failed
- Workflow paused/resumed/cancelled
## Future Enhancements
### TODO Items
1. **Completion Listener**: Listen for task completion events from workers
2. **Nested Workflows**: Execute workflows as tasks within workflows
3. **MQ Publishing**: Implement actual message queue publishing for action execution
4. **Advanced Expressions**: Support comparisons, logical operators in templates
5. **Error Condition Evaluation**: Evaluate `on_error` expressions for selective retries
6. **Workflow Timeouts**: Global workflow timeout configuration
7. **Task Dependencies**: Explicit `depends_on` task specification
8. **Loop Constructs**: While/until loops in addition to with-items
9. **Manual Steps**: Human-in-the-loop approval tasks
10. **Sub-workflow Output**: Capture and use nested workflow results
## Testing
### Unit Tests
Each module includes unit tests:
```bash
# Run all executor tests
cargo test -p attune-executor
# Run specific module tests
cargo test -p attune-executor --lib workflow::graph
cargo test -p attune-executor --lib workflow::context
```
### Integration Tests
Integration tests require database and message queue:
```bash
# Set up test database
export DATABASE_URL="postgresql://attune_test:attune_test@localhost:5432/attune_test"
sqlx migrate run
# Run integration tests
cargo test -p attune-executor --test '*'
```
## Performance Considerations
### Concurrency
- Parallel tasks execute truly concurrently using `futures::join_all`
- With-items supports configurable concurrency limits
- Task graph execution is optimized with topological sorting
### Database Operations
- Workflow state persisted after each task completion
- Batch operations used where possible
- Connection pooling for database access
### Memory
- Task graphs and contexts can be large for complex workflows
- Consider workflow size limits in production
- Context variables should be reasonably sized
## Troubleshooting
### Workflow Not Progressing
**Symptoms**: Workflow stuck in Running state
**Causes**:
- Circular dependencies (should be caught during parsing)
- All tasks waiting on failed dependencies
- Database connection issues
**Solution**: Check workflow state in database, review task dependencies
### Tasks Not Executing
**Symptoms**: Ready tasks not starting
**Causes**:
- Worker service not running
- Message queue not connected
- Execution records not being created
**Solution**: Check worker logs, verify MQ connection, check database
### Template Rendering Errors
**Symptoms**: Tasks fail with template errors
**Causes**:
- Invalid variable references
- Missing context data
- Malformed expressions
**Solution**: Validate templates, check available context variables
## Examples
See `docs/workflows/` for complete workflow examples demonstrating:
- Sequential workflows
- Parallel execution
- With-items iteration
- Conditional execution
- Error handling and retries
- Complex workflows with decisions
## Related Documentation
- [Workflow Definition Format](workflow-definition-format.md)
- [Pack Integration](api-pack-workflows.md)
- [Execution API](api-executions.md)
- [Message Queue Architecture](message-queue.md)

View File

@@ -0,0 +1,562 @@
# Workflow Orchestration Implementation Plan
## Executive Summary
This document outlines the implementation plan for adding **workflow orchestration** capabilities to Attune. Workflows enable composing multiple actions into complex, conditional execution graphs with variable passing, iteration, and error handling.
## Key Design Decisions
### 1. Workflows as Actions
Workflows are first-class actions that can be:
- Triggered by rules (event-driven)
- Invoked by other workflows (composable)
- Executed directly via API
- Referenced in the same way as regular actions
### 2. YAML-Based Definition
Workflows are defined declaratively in YAML files within pack directories, making them:
- Version-controllable
- Human-readable
- Easy to author and maintain
- Portable across environments
### 3. Event-Driven Execution
Workflows leverage the existing message queue infrastructure:
- Each task creates a child execution
- Tasks execute asynchronously via workers
- Progress is tracked via execution status messages
- No blocking or polling required
### 4. Multi-Scope Variable System
Variables are accessible from 6 scopes (in precedence order):
1. `task.*` - Results from completed tasks
2. `vars.*` - Workflow-scoped variables
3. `parameters.*` - Input parameters
4. `pack.config.*` - Pack configuration
5. `system.*` - System variables (execution_id, timestamp, identity)
6. `kv.*` - Key-value datastore
## Architecture Overview
```
┌────────────────────────────────────────────────────────────┐
│ Attune Platform │
├────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌────────────┐ ┌──────────────────┐ │
│ │ API Service │ │ Executor │ │ Worker Service │ │
│ │ │ │ Service │ │ │ │
│ │ Workflow │ │ │ │ Runtime Engine │ │
│ │ CRUD │ │ ┌────────┐ │ │ │ │
│ │ │ │ │Workflow│ │ │ Execute Actions │ │
│ │ │ │ │Engine │ │ │ │ │
│ └─────────────┘ │ │ │ │ └──────────────────┘ │
│ │ │- Parser│ │ │
│ │ │- Graph │ │ │
│ │ │- Context│ │ │
│ │ │- Sched │ │ │
│ │ └────────┘ │ │
│ └────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database │ │
│ │ - workflow_definition │ │
│ │ - workflow_execution │ │
│ │ - workflow_task_execution │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
```
## Database Schema Changes
### New Tables
1. **`workflow_definition`**
- Stores parsed workflow YAML as JSON
- Links to pack
- Contains parameter/output schemas
- Full task graph definition
2. **`workflow_execution`**
- Tracks runtime state of workflow
- Stores variable context
- Maintains task completion tracking
- Links to parent execution
3. **`workflow_task_execution`**
- Individual task execution tracking
- Supports iteration (with-items)
- Retry tracking
- Result storage
### Modified Tables
- **`action`** table gets two new columns:
- `is_workflow` (boolean)
- `workflow_def` (foreign key)
## Core Features
### 1. Sequential Execution
Tasks execute one after another based on transitions:
```yaml
tasks:
- name: task1
action: pack.action1
on_success: task2
- name: task2
action: pack.action2
```
### 2. Parallel Execution
Multiple tasks execute concurrently:
```yaml
tasks:
- name: parallel_checks
type: parallel
tasks:
- name: check_db
action: db.health
- name: check_cache
action: cache.health
on_success: deploy
```
### 3. Conditional Branching
Execute tasks based on conditions:
```yaml
tasks:
- name: check_env
action: core.noop
decision:
- when: "{{ parameters.env == 'production' }}"
next: require_approval
- default: deploy_directly
```
### 4. Iteration (with-items)
Process lists with optional batching:
```yaml
tasks:
- name: deploy_regions
action: deploy.to_region
with_items: "{{ parameters.regions }}"
batch_size: 5 # Process 5 at a time
input:
region: "{{ item }}"
```
### 5. Variable Publishing
Tasks can publish results to workflow scope:
```yaml
tasks:
- name: create_resource
action: cloud.create
publish:
- resource_id: "{{ task.create_resource.result.id }}"
- resource_url: "{{ task.create_resource.result.url }}"
```
### 6. Error Handling & Retry
Built-in retry with backoff:
```yaml
tasks:
- name: flaky_task
action: http.request
retry:
count: 5
delay: 10
backoff: exponential
on_success: next_task
on_failure: cleanup_task
```
### 7. Human-in-the-Loop
Integrate inquiry (approval) steps:
```yaml
tasks:
- name: require_approval
action: core.inquiry
input:
prompt: "Approve deployment?"
schema:
type: object
properties:
approved:
type: boolean
decision:
- when: "{{ task.require_approval.result.approved }}"
next: deploy
- default: cancel
```
### 8. Nested Workflows
Workflows can invoke other workflows:
```yaml
tasks:
- name: provision_infra
action: infrastructure.full_stack # This is also a workflow
input:
environment: "{{ parameters.env }}"
```
## Template System
### Template Engine: Tera (Rust)
- Jinja2-like syntax
- Variable interpolation: `{{ vars.name }}`
- Filters: `{{ text | upper }}`
- Conditionals: `{{ value ? 'yes' : 'no' }}`
### Helper Functions
```yaml
# String operations
message: "{{ parameters.name | upper | trim }}"
# List operations
first: "{{ vars.list | first }}"
count: "{{ vars.list | length }}"
# JSON operations
parsed: "{{ vars.json_string | from_json }}"
# Batching
batches: "{{ vars.items | batch(size=100) }}"
# Key-value store
value: "{{ kv.get('config.key', default='fallback') }}"
secret: "{{ kv.get_secret('api.token') }}"
```
## Workflow Lifecycle
```
1. Rule/API triggers workflow action
2. Executor detects is_workflow=true
3. Load workflow_definition from database
4. Create workflow_execution record
5. Initialize variable context with parameters
6. Build task dependency graph
7. Schedule initial tasks (entry points)
8. For each task:
a. Template task inputs
b. Create child execution
c. Create workflow_task_execution record
d. Publish execution.scheduled message
9. Worker executes task, publishes result
10. Workflow Engine receives completion:
a. Update workflow_task_execution
b. Publish variables to context
c. Evaluate transitions
d. Schedule next tasks
11. Repeat until all tasks complete
12. Update workflow_execution status
13. Publish workflow.completed event
```
## Implementation Phases
### Phase 1: Foundation (2 weeks)
**Goal**: Core data structures and parsing
- [ ] Database migration for workflow tables
- [ ] Add workflow models to `common/src/models.rs`
- [ ] Create workflow repositories
- [ ] Implement YAML parser for workflow definitions
- [ ] Integrate Tera template engine
- [ ] Create variable context manager
**Deliverables**:
- Migration file: `migrations/020_workflow_orchestration.sql`
- Models: `common/src/models/workflow.rs`
- Repositories: `common/src/repositories/workflow*.rs`
- Parser: `executor/src/workflow/parser.rs`
- Context: `executor/src/workflow/context.rs`
### Phase 2: Execution Engine (2 weeks)
**Goal**: Core workflow execution logic
- [ ] Implement task graph builder
- [ ] Implement graph traversal logic
- [ ] Create workflow executor service
- [ ] Add workflow message handlers
- [ ] Implement task scheduling
- [ ] Handle task completion events
**Deliverables**:
- Graph engine: `executor/src/workflow/graph.rs`
- Executor: `executor/src/workflow/executor.rs`
- Message handlers: `executor/src/workflow/messages.rs`
- State machine: `executor/src/workflow/state.rs`
### Phase 3: Advanced Features (2 weeks)
**Goal**: Iteration, parallelism, error handling
- [ ] Implement with-items iteration
- [ ] Add batching support
- [ ] Implement parallel task execution
- [ ] Add retry logic with backoff
- [ ] Implement timeout handling
- [ ] Add conditional branching (decision trees)
**Deliverables**:
- Iterator: `executor/src/workflow/iterator.rs`
- Parallel executor: `executor/src/workflow/parallel.rs`
- Retry handler: `executor/src/workflow/retry.rs`
### Phase 4: API & Tools (2 weeks)
**Goal**: Management interface and tooling
- [ ] Workflow CRUD API endpoints
- [ ] Workflow execution monitoring API
- [ ] Control operations (pause/resume/cancel)
- [ ] Workflow validation CLI command
- [ ] Workflow visualization endpoint
- [ ] Pack registration workflow scanning
**Deliverables**:
- API routes: `api/src/routes/workflows.rs`
- API handlers: `api/src/handlers/workflows.rs`
- CLI commands: `cli/src/commands/workflow.rs` (future)
- Documentation updates
### Phase 5: Testing & Documentation (1 week)
**Goal**: Comprehensive testing and docs
- [ ] Unit tests for all components
- [ ] Integration tests for workflows
- [ ] Example workflows (simple, complex, failure cases)
- [ ] User documentation
- [ ] API documentation
- [ ] Migration guide
**Deliverables**:
- Test suite: `executor/tests/workflow_tests.rs`
- Examples: `docs/examples/workflow-*.yaml`
- User guide: `docs/workflow-user-guide.md`
- Migration guide: `docs/workflow-migration.md`
## Total Timeline: 9 Weeks
## Testing Strategy
### Unit Tests
- Template rendering with all scope types
- Graph construction and traversal
- Condition evaluation
- Variable publishing
- Task scheduling logic
- Retry logic
- Timeout handling
### Integration Tests
- Simple sequential workflow
- Parallel execution workflow
- Conditional branching workflow
- Iteration workflow (with batching)
- Error handling and retry
- Nested workflow execution
- Workflow cancellation
- Long-running workflow
### Example Test Workflows
Located in `docs/examples/`:
- `simple-workflow.yaml` - Basic sequential flow
- `complete-workflow.yaml` - All features demonstrated
- `parallel-workflow.yaml` - Parallel execution
- `conditional-workflow.yaml` - Branching logic
- `iteration-workflow.yaml` - with-items examples
## API Endpoints
### Workflow Management
```
POST /api/v1/packs/{pack_ref}/workflows - Create workflow
GET /api/v1/packs/{pack_ref}/workflows - List workflows in pack
GET /api/v1/workflows - List all workflows
GET /api/v1/workflows/{workflow_ref} - Get workflow definition
PUT /api/v1/workflows/{workflow_ref} - Update workflow
DELETE /api/v1/workflows/{workflow_ref} - Delete workflow
POST /api/v1/workflows/{workflow_ref}/execute - Execute workflow directly
POST /api/v1/workflows/{workflow_ref}/validate - Validate workflow definition
```
### Workflow Execution Management
```
GET /api/v1/workflow-executions - List workflow executions
GET /api/v1/workflow-executions/{id} - Get workflow execution details
GET /api/v1/workflow-executions/{id}/tasks - List task executions
GET /api/v1/workflow-executions/{id}/graph - Get execution graph (visualization)
GET /api/v1/workflow-executions/{id}/context - Get variable context
POST /api/v1/workflow-executions/{id}/pause - Pause workflow
POST /api/v1/workflow-executions/{id}/resume - Resume paused workflow
POST /api/v1/workflow-executions/{id}/cancel - Cancel workflow
POST /api/v1/workflow-executions/{id}/retry - Retry failed workflow
```
## Pack Structure with Workflows
```
packs/
└── my_pack/
├── pack.yaml # Pack metadata
├── config.yaml # Pack configuration schema
├── actions/
│ ├── action1.py
│ ├── action2.py
│ └── action.yaml
├── sensors/
│ ├── sensor1.py
│ └── sensor.yaml
├── workflows/ # NEW: Workflow definitions
│ ├── deploy.yaml
│ ├── backup.yaml
│ ├── migrate.yaml
│ └── rollback.yaml
├── rules/
│ └── on_push.yaml
└── tests/
├── test_actions.py
└── test_workflows.yaml # Workflow test definitions
```
### Pack Registration Process
When a pack is registered:
1. Scan `workflows/` directory for `.yaml` files
2. Parse and validate each workflow definition
3. Create `workflow_definition` record in database
4. Create synthetic `action` record with `is_workflow=true`
5. Link action to workflow via `workflow_def` foreign key
6. Workflow is now invokable like any other action
## Performance Considerations
### Optimizations
1. **Graph Caching**: Cache parsed task graphs per workflow definition
2. **Template Compilation**: Compile templates once, reuse for iterations
3. **Parallel Scheduling**: Schedule independent tasks concurrently
4. **Database Batching**: Batch task creation/updates when using with-items
5. **Context Serialization**: Use efficient JSON serialization for variable context
### Resource Limits
- Max workflow depth: 10 levels (prevent infinite recursion)
- Max tasks per workflow: 1000 (prevent resource exhaustion)
- Max iterations per with-items: 10,000 (configurable)
- Max parallel tasks: 100 (configurable)
- Variable context size: 10MB (prevent memory issues)
## Security Considerations
1. **Template Injection**: Sanitize all template inputs, no arbitrary code execution
2. **Variable Scoping**: Strict isolation between workflow executions
3. **Secret Access**: Only allow `kv.get_secret()` for authorized identities
4. **Resource Limits**: Enforce max task count, depth, iterations
5. **Audit Trail**: Log all workflow decisions, transitions, variable changes
6. **RBAC**: Workflow execution requires action execution permissions
7. **Input Validation**: Validate parameters against param_schema
## Monitoring & Observability
### Metrics to Track
- Workflow executions per second
- Average workflow duration
- Task execution duration (p50, p95, p99)
- Workflow success/failure rates
- Task retry counts
- Queue depth for workflow tasks
- Variable context size distribution
### Logging Standards
```
INFO [workflow.start] execution=123 workflow=deploy_app version=1.0.0
INFO [workflow.task.schedule] execution=123 task=build_image
INFO [workflow.task.complete] execution=123 task=build_image duration=45s
INFO [workflow.vars.publish] execution=123 vars=["image_uri"]
INFO [workflow.task.schedule] execution=123 tasks=["deploy","health_check"]
WARN [workflow.task.retry] execution=123 task=flaky_api attempt=2
ERROR [workflow.task.failed] execution=123 task=deploy_db error="connection_timeout"
INFO [workflow.complete] execution=123 status=success duration=2m30s
```
### Distributed Tracing
- Propagate `trace_id` through entire workflow
- Link all task executions to parent workflow
- Enable end-to-end request tracing
- Integration with OpenTelemetry (future)
## Dependencies
### New Rust Crates
- **tera** (^1.19) - Template engine
- **petgraph** (^0.6) - Graph data structures and algorithms
### Existing Dependencies
- sqlx - Database access
- serde/serde_json - Serialization
- tokio - Async runtime
- lapin - RabbitMQ client
## Future Enhancements
### Short Term (3-6 months)
- Workflow versioning (multiple versions of same workflow)
- Workflow pausing/resuming with state persistence
- Advanced retry strategies (circuit breaker, adaptive)
- Workflow templates (reusable patterns)
### Medium Term (6-12 months)
- Dynamic workflows (generate graph at runtime)
- Workflow debugging tools (step-through execution)
- Performance analytics and optimization suggestions
- Workflow marketplace (share workflows)
### Long Term (12+ months)
- Visual workflow editor (drag-and-drop UI)
- AI-powered workflow generation
- Workflow optimization recommendations
- Multi-cloud orchestration patterns
## Success Criteria
This implementation will be considered successful when:
1. ✅ Workflows can be defined in YAML and registered via packs
2. ✅ Workflows execute reliably with all features working
3. ✅ Variables are properly scoped and templated across all scopes
4. ✅ Parallel execution works correctly with proper synchronization
5. ✅ Iteration handles lists efficiently with batching
6. ✅ Error handling and retry work as specified
7. ✅ Human-in-the-loop (inquiry) tasks integrate seamlessly
8. ✅ Nested workflows execute correctly
9. ✅ API provides full CRUD and control operations
10. ✅ Comprehensive tests cover all features
11. ✅ Documentation enables users to create workflows easily
## References
- Full design: `docs/workflow-orchestration.md`
- Simple example: `docs/examples/simple-workflow.yaml`
- Complex example: `docs/examples/complete-workflow.yaml`
- Migration SQL: `docs/examples/workflow-migration.sql`
## Next Steps
1. Review this plan with stakeholders
2. Prioritize features if timeline needs adjustment
3. Set up project tracking (GitHub issues/milestones)
4. Begin Phase 1 implementation
5. Schedule weekly progress reviews

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,400 @@
# Workflow Orchestration - Summary
## Overview
Attune's workflow orchestration system enables the composition of multiple actions into complex, conditional execution graphs. Workflows are themselves actions that can be triggered by rules, invoked by other workflows, or executed directly via API.
## Key Concepts
### 1. Workflows are Actions
- Workflows are first-class actions with `is_workflow=true`
- Can be triggered by rules (event-driven)
- Can invoke other workflows (composable)
- Can be executed directly via API
### 2. YAML-Based Definitions
- Workflows defined in `packs/{pack}/workflows/*.yaml`
- Declarative, version-controlled
- Parsed and stored as JSON in database
- Portable across environments
### 3. Task Graph Execution
- Each task is a node in an execution graph
- Tasks transition based on success/failure/completion
- Supports sequential, parallel, and conditional execution
- Fully asynchronous via message queue
### 4. Multi-Scope Variables
Variables accessible from 6 scopes (precedence order):
1. **`task.*`** - Results from completed tasks
- `{{ task.build_image.result.image_uri }}`
- `{{ task.health_check.status }}`
2. **`vars.*`** - Workflow-scoped variables
- `{{ vars.deployment_id }}`
- Set via `publish` directives
3. **`parameters.*`** - Input parameters
- `{{ parameters.app_name }}`
- `{{ parameters.environment }}`
4. **`pack.config.*`** - Pack configuration
- `{{ pack.config.api_key }}`
- `{{ pack.config.base_url }}`
5. **`system.*`** - System variables
- `{{ system.execution_id }}`
- `{{ system.timestamp }}`
- `{{ system.identity.login }}`
6. **`kv.*`** - Key-value datastore
- `{{ kv.get('feature.flags.enabled') }}`
- `{{ kv.get_secret('api.token') }}`
## Core Features
### Sequential Execution
```yaml
tasks:
- name: task1
action: pack.action1
on_success: task2
- name: task2
action: pack.action2
on_success: task3
```
### Parallel Execution
```yaml
tasks:
- name: parallel_checks
type: parallel
tasks:
- name: check_database
action: db.health_check
- name: check_cache
action: redis.ping
- name: check_queue
action: rabbitmq.status
on_success: deploy_app
```
### Conditional Branching
```yaml
tasks:
- name: check_environment
action: core.noop
decision:
- when: "{{ parameters.environment == 'production' }}"
next: require_approval
- when: "{{ parameters.environment == 'staging' }}"
next: run_tests
- default: deploy_directly
```
### Iteration (with-items)
```yaml
tasks:
- name: deploy_to_regions
action: cloud.deploy
with_items: "{{ parameters.regions }}"
batch_size: 5 # Process 5 regions at a time
input:
region: "{{ item }}"
version: "{{ parameters.version }}"
```
### Variable Publishing
```yaml
tasks:
- name: create_deployment
action: deployments.create
input:
app_name: "{{ parameters.app_name }}"
publish:
- deployment_id: "{{ task.create_deployment.result.id }}"
- health_url: "{{ task.create_deployment.result.url }}/health"
```
### Error Handling & Retry
```yaml
tasks:
- name: flaky_api_call
action: http.post
input:
url: "{{ vars.api_endpoint }}"
retry:
count: 5
delay: 10
backoff: exponential
max_delay: 60
on_success: process_response
on_failure: log_error_and_continue
```
### Human-in-the-Loop
```yaml
tasks:
- name: require_approval
action: core.inquiry
input:
prompt: "Approve production deployment?"
schema:
type: object
properties:
approved:
type: boolean
timeout: 3600 # 1 hour
decision:
- when: "{{ task.require_approval.result.approved == true }}"
next: deploy_to_production
- default: cancel_deployment
```
### Nested Workflows
```yaml
tasks:
- name: provision_infrastructure
action: infra.full_stack_workflow # This is also a workflow
input:
environment: "{{ parameters.environment }}"
region: "{{ parameters.region }}"
on_success: deploy_application
```
## Template System
### Tera Template Engine
Jinja2-like syntax for variable interpolation:
```yaml
# String operations
message: "{{ parameters.app_name | upper | trim }}"
# List operations
first_region: "{{ parameters.regions | first }}"
region_count: "{{ parameters.regions | length }}"
# JSON operations
config: "{{ vars.json_string | from_json }}"
# Batching helper
batches: "{{ vars.large_list | batch(size=100) }}"
# Conditionals
status: "{{ vars.success ? 'deployed' : 'failed' }}"
# Key-value store
api_key: "{{ kv.get_secret('service.api_key') }}"
feature_enabled: "{{ kv.get('flags.new_feature', default=false) }}"
```
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Workflow Execution │
├─────────────────────────────────────────────────┤
│ │
│ 1. Rule/API triggers workflow action │
│ 2. Executor loads workflow definition │
│ 3. Create workflow_execution record │
│ 4. Initialize variable context │
│ 5. Build task dependency graph │
│ 6. Schedule initial tasks │
│ 7. For each task: │
│ a. Template inputs using context │
│ b. Create child execution │
│ c. Worker executes action │
│ d. Update task result │
│ e. Publish variables │
│ f. Evaluate transitions │
│ g. Schedule next tasks │
│ 8. Complete when all tasks done │
│ │
└─────────────────────────────────────────────────┘
```
## Database Schema
### New Tables
1. **`workflow_definition`** - Stores workflow YAML as JSON
2. **`workflow_execution`** - Tracks workflow runtime state
3. **`workflow_task_execution`** - Individual task executions
### Modified Tables
- **`action`** table: Add `is_workflow` and `workflow_def` columns
## API Endpoints
### Workflow Management
```
POST /api/v1/packs/{pack_ref}/workflows - Create workflow
GET /api/v1/packs/{pack_ref}/workflows - List workflows
GET /api/v1/workflows/{workflow_ref} - Get workflow
PUT /api/v1/workflows/{workflow_ref} - Update workflow
DELETE /api/v1/workflows/{workflow_ref} - Delete workflow
POST /api/v1/workflows/{workflow_ref}/execute - Execute workflow
```
### Execution Management
```
GET /api/v1/workflow-executions/{id} - Get execution
GET /api/v1/workflow-executions/{id}/tasks - List tasks
GET /api/v1/workflow-executions/{id}/graph - Get graph
POST /api/v1/workflow-executions/{id}/pause - Pause
POST /api/v1/workflow-executions/{id}/resume - Resume
POST /api/v1/workflow-executions/{id}/cancel - Cancel
```
## Implementation Timeline
### Phase 1: Foundation (2 weeks)
- Database schema and migration
- Data models and repositories
- YAML parser
- Template engine integration
### Phase 2: Execution Engine (2 weeks)
- Task graph builder
- Workflow executor
- Message handlers
- State management
### Phase 3: Advanced Features (2 weeks)
- Iteration support (with-items)
- Parallel execution
- Retry logic
- Conditional branching
### Phase 4: API & Tools (2 weeks)
- Workflow CRUD endpoints
- Execution monitoring API
- Control operations
- Validation tools
### Phase 5: Testing & Docs (1 week)
- Comprehensive tests
- Example workflows
- User documentation
**Total: 9 weeks**
## Example Workflow Structure
```yaml
ref: my_pack.deploy_workflow
label: "Deploy Application"
description: "Deploys application with health checks"
version: "1.0.0"
parameters:
app_name:
type: string
required: true
version:
type: string
required: true
environment:
type: string
enum: [dev, staging, production]
vars:
deployment_id: null
health_url: null
tasks:
- name: create_deployment
action: deployments.create
input:
app_name: "{{ parameters.app_name }}"
version: "{{ parameters.version }}"
publish:
- deployment_id: "{{ task.create_deployment.result.id }}"
on_success: build_image
- name: build_image
action: docker.build
input:
app: "{{ parameters.app_name }}"
tag: "{{ parameters.version }}"
on_success: deploy
on_failure: cleanup
- name: deploy
action: kubernetes.deploy
input:
image: "{{ task.build_image.result.image }}"
on_success: health_check
on_failure: rollback
- name: health_check
action: http.get
input:
url: "{{ task.deploy.result.health_url }}"
retry:
count: 5
delay: 10
on_success: notify_success
on_failure: rollback
- name: rollback
action: kubernetes.rollback
on_complete: notify_failure
- name: notify_success
action: slack.post
input:
message: "✅ Deployed {{ parameters.app_name }} v{{ parameters.version }}"
- name: notify_failure
action: slack.post
input:
message: "❌ Deployment failed"
output_map:
deployment_id: "{{ vars.deployment_id }}"
status: "success"
```
## Pack Structure
```
packs/my_pack/
├── pack.yaml
├── config.yaml
├── actions/
│ ├── action1.py
│ └── action.yaml
├── sensors/
│ └── sensor.yaml
├── workflows/ # NEW
│ ├── deploy.yaml
│ ├── backup.yaml
│ └── rollback.yaml
└── rules/
└── on_push.yaml
```
## Benefits
1. **Composability** - Build complex workflows from simple actions
2. **Reusability** - Share workflows across packs and organizations
3. **Maintainability** - YAML definitions are easy to read and version
4. **Observability** - Full execution tracking and tracing
5. **Flexibility** - Conditional logic, iteration, parallel execution
6. **Reliability** - Built-in retry, error handling, rollback
7. **Human Control** - Inquiry tasks for approval workflows
8. **Event-Driven** - Fully async, no blocking or polling
## Resources
- **Full Design**: `docs/workflow-orchestration.md`
- **Implementation Plan**: `docs/workflow-implementation-plan.md`
- **Simple Example**: `docs/examples/simple-workflow.yaml`
- **Complex Example**: `docs/examples/complete-workflow.yaml`
- **Migration SQL**: `docs/examples/workflow-migration.sql`