attune-system/attune

Fork 0

Files

David Culbreth 3b14c65998 re-uploading work

2026-02-04 17:46:30 -06:00

17 KiB

Raw Blame History

Execution Hierarchy and Parent Relationships

Overview

The execution table supports two types of parent-child relationships:

General execution hierarchies (via parent field)
Workflow task executions (via workflow_task metadata)

This document explains why both are needed, how they differ, and when to use each.

Field Purposes

`execution.parent` (General Hierarchy)

Type: Option<Id> - Foreign key to execution.id

Purpose: Generic execution tree traversal for ANY type of parent-child relationship.

Used for:

Workflow tasks: Parent is the workflow's main execution record
Child actions: Parent is the action that spawned them
Nested workflows: Parent is the outer workflow's execution
Any future parent-child patterns

Example SQL:

-- Find all child executions (any type)
SELECT * FROM attune.execution WHERE parent = 100;

`execution.workflow_task.workflow_execution` (Workflow-Specific)

Type: Id within WorkflowTaskMetadata JSONB - References workflow_execution.id

Purpose: Direct link to workflow orchestration state.

Provides access to:

Task graph structure
Workflow variables
Current/completed/failed task lists
Workflow-specific metadata

Example SQL:

-- Find all tasks in a specific workflow
SELECT * FROM attune.execution 
WHERE workflow_task->>'workflow_execution' = '50';

Workflow Task Execution Structure

When a workflow executes, three types of records are created:

┌─────────────────────────────────────────────────────────────┐
│ 1. Parent Execution (the workflow itself as an execution)  │
├─────────────────────────────────────────────────────────────┤
│ id: 100                                                     │
│ action_ref: "my_pack.my_workflow"                          │
│ parent: None (or outer workflow ID if nested)              │
│ workflow_task: None                                         │
│ status: running                                             │
└─────────────────────────────────────────────────────────────┘
                            ▲
                            │
                            │ references (execution field)
                            │
┌─────────────────────────────────────────────────────────────┐
│ 2. Workflow Execution Record (orchestration state)         │
├─────────────────────────────────────────────────────────────┤
│ id: 50                                                      │
│ execution: 100          ← points to parent execution       │
│ workflow_def: 10                                            │
│ task_graph: {...}                                           │
│ variables: {...}                                            │
│ current_tasks: ["send_email", "process_data"]              │
│ completed_tasks: []                                         │
│ failed_tasks: []                                            │
└─────────────────────────────────────────────────────────────┘
                            ▲
                            │
                            │ references (workflow_execution)
                            │
┌─────────────────────────────────────────────────────────────┐
│ 3. Task Execution (one per workflow task)                  │
├─────────────────────────────────────────────────────────────┤
│ id: 101                                                     │
│ action_ref: "my_pack.send_email"                           │
│ parent: 100             ← points to workflow execution     │
│ workflow_task: {                                            │
│   workflow_execution: 50  ← points to workflow_execution   │
│   task_name: "send_email",                                 │
│   task_index: null,                                         │
│   retry_count: 0,                                           │
│   max_retries: 3,                                           │
│   ...                                                       │
│ }                                                           │
│ status: running                                             │
└─────────────────────────────────────────────────────────────┘

Relationship Diagram

┌─────────────────────┐
│  Task Execution     │
│  (id: 101)          │
│                     │
│  parent: 100        │──────┐
│                     │      │
│  workflow_task: {   │      │
│    workflow_exec: 50│──┐   │
│  }                  │  │   │
└─────────────────────┘  │   │
                         │   │
                         │   ▼
                         │  ┌─────────────────────┐
                         │  │ Parent Execution    │
                         │  │ (id: 100)           │
                         │  │ [The Workflow]      │
                         │  └─────────────────────┘
                         │           ▲
                         │           │
                         │           │ execution: 100
                         │           │
                         │  ┌─────────────────────┐
                         └─▶│ Workflow Execution  │
                            │ (id: 50)            │
                            │ [Orchestration]     │
                            └─────────────────────┘

Key: Both parent and workflow_task.workflow_execution ultimately reference the same workflow, but serve different query patterns.

Why Both Fields Are Needed

✅ Reason 1: `parent` is Generic

The parent field is used for all types of execution hierarchies, not just workflows:

Example 1: Action spawning child actions

// Parent action execution
let parent_exec = create_execution("my_pack.parent_action").await?;

// Child action executions (NOT workflow tasks)
let child1 = CreateExecutionInput {
    action_ref: "my_pack.child_action_1".to_string(),
    parent: Some(parent_exec.id),
    workflow_task: None,  // Not a workflow task!
    ...
};

Example 2: Nested workflows

// Outer workflow execution
let outer_workflow = create_workflow("outer_workflow").await?;

// Inner workflow execution (nested)
let inner_workflow = CreateExecutionInput {
    action_ref: "inner_workflow".to_string(),
    parent: Some(outer_workflow.id),
    workflow_task: None,  // This is a workflow, not a task
    ...
};

✅ Reason 2: Workflow-Specific State is Separate

The workflow_execution table contains orchestration state that doesn't belong in the main execution record:

Task graph: Directed acyclic graph of task dependencies
Workflow variables: Scoped variable context
Task tracking: current_tasks, completed_tasks, failed_tasks arrays
Workflow metadata: pause_reason, error_message, etc.

Direct access via workflow_task.workflow_execution avoids JOINs.

✅ Reason 3: Query Efficiency

Without direct workflow_execution reference, finding workflow state requires:

-- BAD: Two JOINs required
SELECT we.* 
FROM attune.execution task
JOIN attune.execution parent ON task.parent = parent.id
JOIN attune.workflow_execution we ON we.execution = parent.id
WHERE task.id = 101;

With direct reference:

-- GOOD: Single lookup via JSONB
SELECT we.*
FROM attune.workflow_execution we
WHERE we.id = (
    SELECT (workflow_task->>'workflow_execution')::bigint 
    FROM attune.execution 
    WHERE id = 101
);

✅ Reason 4: Clear Semantics

parent = "What execution spawned me?"
workflow_task.workflow_execution = "What workflow orchestration state do I belong to?"

These are related but semantically different questions.

Use Cases and Query Patterns

Use Case 1: Generic Execution Tree Traversal

// Get ALL child executions (workflow tasks, child actions, anything)
async fn get_children(pool: &PgPool, parent_id: Id) -> Result<Vec<Execution>> {
    sqlx::query_as::<_, Execution>(
        "SELECT * FROM attune.execution WHERE parent = $1"
    )
    .bind(parent_id)
    .fetch_all(pool)
    .await
    .map_err(Into::into)
}

// Works for workflows, actions, any execution type
let all_children = get_children(&pool, parent_exec_id).await?;

Use Case 2: Workflow Task Queries

// Get all tasks for a workflow execution
let tasks = ExecutionRepository::find_by_workflow_execution(
    &pool, 
    workflow_execution_id
).await?;

// Implementation uses direct JSONB query:
// WHERE workflow_task->>'workflow_execution' = $1

Use Case 3: Workflow State Access

// From a task execution, get the workflow state
async fn get_workflow_state(
    pool: &PgPool, 
    task_exec: &Execution
) -> Result<Option<WorkflowExecution>> {
    if let Some(wt) = &task_exec.workflow_task {
        let workflow_exec = WorkflowExecutionRepository::find_by_id(
            pool, 
            wt.workflow_execution
        ).await?;
        Ok(Some(workflow_exec))
    } else {
        Ok(None)
    }
}

// Without direct link, would need to:
// 1. Get parent execution via task_exec.parent
// 2. Find workflow_execution WHERE execution = parent

Use Case 4: Hierarchical Display

// Display execution tree with proper indentation
async fn display_execution_tree(pool: &PgPool, root_id: Id, indent: usize) {
    let exec = ExecutionRepository::find_by_id(pool, root_id).await.unwrap();
    println!("{:indent$}├─ {} ({})", "", exec.action_ref, exec.status, indent = indent);
    
    // Get children using generic parent relationship
    let children = sqlx::query_as::<_, Execution>(
        "SELECT * FROM attune.execution WHERE parent = $1"
    )
    .bind(root_id)
    .fetch_all(pool)
    .await
    .unwrap();
    
    for child in children {
        display_execution_tree(pool, child.id, indent + 2).await;
    }
}

The Redundancy Trade-off

For Workflow Tasks: Yes, There's Redundancy

task.parent 
  → parent_execution (id: 100)
    ← workflow_execution.execution

task.workflow_task.workflow_execution 
  → workflow_execution (id: 50)
    → parent_execution (id: 100)

Both ultimately point to the same workflow, just through different paths.

Why This Is Acceptable

Performance: Direct link avoids JOINs (PostgreSQL JSONB is fast)
Clarity: Explicit workflow relationship vs generic parent relationship
Flexibility: parent can be used for non-workflow patterns
Consistency: All executions use parent the same way

Alternatives Considered

❌ Alternative 1: Remove `workflow_execution` from metadata

Problem: Forces 2-JOIN queries to access workflow state

-- Every workflow task query becomes complex
SELECT we.* 
FROM attune.execution task
JOIN attune.execution parent ON task.parent = parent.id
JOIN attune.workflow_execution we ON we.execution = parent.id
WHERE task.workflow_task IS NOT NULL;

❌ Alternative 2: Remove `parent` for workflow tasks

Problem: Breaks generic execution tree queries

-- Would need complex COALESCE logic
SELECT * FROM attune.execution 
WHERE parent = $1 
   OR (workflow_task IS NOT NULL 
       AND (workflow_task->>'parent_execution')::bigint = $1);

✅ Current Approach: Keep Both

Small redundancy in exchange for:

Simple generic queries via parent
Efficient workflow queries via workflow_task.workflow_execution
Clear separation of concerns

Validation and Best Practices

Validation Logic (Optional)

For data integrity, you could validate consistency:

async fn validate_workflow_task_consistency(
    pool: &PgPool,
    task_exec: &Execution
) -> Result<()> {
    if let Some(wt) = &task_exec.workflow_task {
        // Get workflow_execution record
        let workflow_exec = WorkflowExecutionRepository::find_by_id(
            pool, 
            wt.workflow_execution
        ).await?;
        
        // Ensure parent matches workflow_execution.execution
        if task_exec.parent != Some(workflow_exec.execution) {
            return Err(Error::validation(format!(
                "Inconsistent parent: task.parent={:?}, workflow_exec.execution={}",
                task_exec.parent, workflow_exec.execution
            )));
        }
    }
    Ok(())
}

Helper Methods (Recommended)

Add convenience methods to the Execution model:

impl Execution {
    /// Check if this execution is a workflow task
    pub fn is_workflow_task(&self) -> bool {
        self.workflow_task.is_some()
    }
    
    /// Get the workflow_execution record if this is a workflow task
    pub async fn get_workflow_execution(
        &self, 
        pool: &PgPool
    ) -> Result<Option<WorkflowExecution>> {
        if let Some(wt) = &self.workflow_task {
            let we = WorkflowExecutionRepository::find_by_id(pool, wt.workflow_execution).await?;
            Ok(Some(we))
        } else {
            Ok(None)
        }
    }
    
    /// Get the parent execution
    pub async fn get_parent(&self, pool: &PgPool) -> Result<Option<Execution>> {
        if let Some(parent_id) = self.parent {
            ExecutionRepository::find_by_id(pool, parent_id).await
        } else {
            Ok(None)
        }
    }
    
    /// Get all child executions (generic, works for any execution type)
    pub async fn get_children(&self, pool: &PgPool) -> Result<Vec<Execution>> {
        sqlx::query_as::<_, Execution>(
            "SELECT * FROM attune.execution WHERE parent = $1 ORDER BY created"
        )
        .bind(self.id)
        .fetch_all(pool)
        .await
        .map_err(Into::into)
    }
}

Summary

Key Takeaways

parent is a generic field for ALL execution hierarchies (workflows, child actions, nested workflows)
workflow_task.workflow_execution is a workflow-specific optimization for direct access to orchestration state
Both are needed because:
- parent must remain generic for non-workflow use cases
- Direct workflow_execution link avoids expensive JOINs
- Different query patterns benefit from each approach
The redundancy is acceptable because:
- It's limited to workflow tasks only (not all executions)
- Performance gain from avoiding JOINs
- Clearer semantics for different use cases

When to Use Which

Scenario	Use `parent`	Use `workflow_task.workflow_execution`
Get child executions (any type)	✅	❌
Build execution tree	✅	❌
Find all workflow tasks	❌	✅
Access workflow state	❌	✅
Non-workflow parent-child	✅	N/A

Design Principle

Separation of concerns:

parent: Structural relationship (execution hierarchy)
workflow_task.workflow_execution: Semantic relationship (workflow orchestration)

This follows the principle that a workflow task has TWO relationships:

As a child in the execution tree (parent)
As a task in a workflow (workflow_task.workflow_execution)

Both are valid, serve different purposes, and should coexist.

References

Migration: migrations/20260127212500_consolidate_workflow_task_execution.sql
Models: crates/common/src/models.rs (Execution, WorkflowTaskMetadata)
Repositories: crates/common/src/repositories/execution.rs
Workflow Coordinator: crates/executor/src/workflow/coordinator.rs

17 KiB Raw Blame History