Files

David Culbreth e31ecb781b more internal polish, resilient workers

2026-02-09 18:32:34 -06:00

16 KiB

Raw Blame History

Executor Service Architecture

Overview

The Executor Service is the core orchestration engine of the Attune automation platform. It is responsible for processing rule enforcements, scheduling executions to workers, managing execution lifecycle, and orchestrating complex workflows.

Service Architecture

The Executor is structured as a distributed microservice with three main processing components:

┌─────────────────────────────────────────────────────────────┐
│                     Executor Service                         │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────────────────┐  ┌──────────────────────┐          │
│  │ Enforcement         │  │ Execution            │          │
│  │ Processor           │  │ Scheduler            │          │
│  └─────────────────────┘  └──────────────────────┘          │
│           │                         │                         │
│           │                         │                         │
│           v                         v                         │
│  ┌─────────────────────────────────────────────┐             │
│  │         Execution Manager                   │             │
│  └─────────────────────────────────────────────┘             │
│                                                               │
└─────────────────────────────────────────────────────────────┘
         │                    │                    │
         v                    v                    v
   PostgreSQL            RabbitMQ              Workers

Core Components

1. Enforcement Processor

Purpose: Processes triggered rules and creates execution requests.

Responsibilities:

Listens for enforcement.created messages from triggered rules
Fetches enforcement, rule, and event data from the database
Evaluates rule conditions and policies
Creates execution records in the database
Publishes execution.requested messages to the scheduler

Message Flow:

Rule Triggered → Enforcement Created → Enforcement Processor → Execution Created

Key Implementation Details:

Uses consume_with_handler pattern for message consumption
All processing methods are static to enable shared state across async handlers
Validates rule is enabled before creating executions
Links executions to enforcements for audit trail

2. Execution Scheduler

Purpose: Routes execution requests to available workers.

Responsibilities:

Listens for execution.requested messages
Determines runtime requirements for the action
Selects appropriate workers based on:
- Runtime compatibility
- Worker status (active only)
- Load balancing (future: capacity, affinity, locality)
Updates execution status to Scheduled
Publishes execution.scheduled messages to worker queues

Message Flow:

Execution Requested → Scheduler → Worker Selection → Execution Scheduled → Worker

Worker Selection Algorithm:

Fetch all available workers
Filter by runtime compatibility (if action specifies runtime)
Filter by worker status (only active workers)
Apply load balancing strategy (currently: first available)
Future: Consider capacity, affinity, geographic locality

Key Implementation Details:

Supports multiple worker types (local, remote, container)
Handles worker unavailability with error responses
Plans for intelligent scheduling based on worker capabilities

3. Execution Manager

Purpose: Orchestrates execution workflows and handles lifecycle events.

Responsibilities:

Listens for execution.status.* messages from workers
Does NOT update execution state (worker owns state after scheduling)
Handles execution completion orchestration (triggering child executions)
Manages workflow executions (parent-child relationships)
Coordinates workflow state transitions

Ownership Model:

Executor owns: Requested → Scheduling → Scheduled (updates DB)
- Includes pre-handoff cancellations/failures (before execution.scheduled is published)
Worker owns: Running → Completed/Failed/Cancelled (updates DB)
- Includes post-handoff cancellations/failures (after receiving execution.scheduled)
Handoff Point: When execution.scheduled message is published to worker
- Before publish: Executor owns and updates state
- After publish: Worker owns and updates state

Message Flow:

Worker Status Update → Execution Manager → Orchestration Logic (Read-Only)
                                         → Trigger Child Executions

Status Lifecycle:

Requested → Scheduling → Scheduled → [HANDOFF: execution.scheduled published] → Running → Completed/Failed/Cancelled
    │                       │                                                     │
    └─ Executor Updates ───┘                                                     └─ Worker Updates
    │  (includes pre-handoff                                                     │  (includes post-handoff
    │   Cancelled)                                                               │   Cancelled/Timeout/Abandoned)
                                                                                  │
                                                                                  └→ Child Executions (workflows)

Key Implementation Details:

Receives status change notifications for orchestration purposes only
Does not update execution state after handoff to worker
Handles workflow orchestration (parent-child execution chaining)
Only triggers child executions on successful parent completion
Read-only access to execution records for orchestration logic

Message Queue Integration

Message Types

The Executor consumes and produces several message types:

Consumed:

enforcement.created - New enforcement from triggered rules
execution.requested - Execution scheduling requests
execution.status.changed - Status change notifications from workers (for orchestration)
execution.completed - Completion notifications from workers (for queue management)

Published:

execution.requested - To scheduler (from enforcement processor)
execution.scheduled - To workers (from scheduler) ← OWNERSHIP HANDOFF

Note: The executor does NOT publish execution.completed messages. This is the worker's responsibility as the authoritative source of execution state after scheduling.

Message Envelope Structure

All messages use the standardized MessageEnvelope<T> structure:

MessageEnvelope {
    message_id: Uuid,
    message_type: MessageType,
    source: String,
    timestamp: DateTime<Utc>,
    correlation_id: Option<Uuid>,
    trace_id: Option<String>,
    payload: T,
    retry_count: u32,
}

Consumer Handler Pattern

All processors use the consume_with_handler pattern for robust message consumption:

consumer.consume_with_handler(move |envelope: MessageEnvelope<PayloadType>| {
    // Clone shared state
    let pool = pool.clone();
    let publisher = publisher.clone();
    
    async move {
        // Process message
        Self::process_message(&pool, &publisher, &envelope).await
            .map_err(|e| format!("Error: {}", e).into())
    }
}).await?;

Benefits:

Automatic message acknowledgment on success
Automatic nack with requeue on retriable errors
Automatic dead letter queue routing on non-retriable errors
Built-in error handling and logging

Database Integration

Repository Pattern

All database access uses the repository layer:

use attune_common::repositories::{
    enforcement::EnforcementRepository,
    execution::ExecutionRepository,
    rule::RuleRepository,
    Create, FindById, Update, List,
};

Database Update Ownership

Executor updates execution state from creation through handoff:

Creates execution records (Requested status)
Updates status during scheduling (Scheduling → Scheduled)
Publishes execution.scheduled message to worker ← HANDOFF POINT
Handles cancellations/failures BEFORE handoff (before message is published)
- Example: User cancels execution while queued by concurrency policy
- Executor updates to Cancelled, worker never receives message

Worker updates execution state after receiving handoff:

Receives execution.scheduled message (takes ownership)
Updates status when execution starts (Running)
Updates status when execution completes (Completed, Failed, etc.)
Handles cancellations/failures AFTER handoff (after receiving message)
Updates result data and artifacts
Worker only owns executions it has received

Executor reads execution state for orchestration after handoff:

Receives status change notifications from workers
Reads execution records to trigger workflow children
Does NOT update execution state after publishing execution.scheduled

Transaction Support

Future implementations will use database transactions for multi-step operations:

Creating execution + publishing message (atomic)
Enforcement processing + execution creation (atomic)

Configuration

The Executor service uses the standard Attune configuration system:

# config.yaml
database:
  url: postgresql://localhost/attune
  max_connections: 20
  
message_queue:
  url: amqp://localhost
  exchange: attune.executions
  prefetch_count: 10

Environment variable overrides:

ATTUNE__DATABASE__URL=postgresql://prod-db/attune
ATTUNE__MESSAGE_QUEUE__URL=amqp://prod-mq

Error Handling

Error Types

The Executor handles several error categories:

Database Errors: Connection issues, query failures
Message Queue Errors: Connection drops, serialization failures
Business Logic Errors: Missing entities, invalid states
Worker Errors: No workers available, incompatible runtimes

Retry Strategy

Retriable Errors: Requeued for retry (connection issues, timeouts)
Non-Retriable Errors: Sent to dead letter queue (invalid data, missing entities)
Retry Limits: Configured per queue (future implementation)

Dead Letter Queues

Failed messages are automatically routed to dead letter queues for investigation:

executor.enforcement.created.dlq
executor.execution.requested.dlq
executor.execution.status.dlq

Workflow Orchestration

Parent-Child Executions

The Executor supports complex workflows through parent-child execution relationships:

Parent Execution (Completed)
  ├── Child Execution 1 (action_ref: "pack.action1")
  ├── Child Execution 2 (action_ref: "pack.action2")
  └── Child Execution 3 (action_ref: "pack.action3")

Implementation:

Parent execution stores child action references
On parent completion, Execution Manager creates child executions
Child executions inherit parent's configuration
Each child is independently scheduled and executed

Future Enhancements

Conditional Workflows: Execute children based on parent result
Parallel vs Sequential: Control execution order
Workflow DAGs: Complex dependency graphs
Workflow Templates: Reusable workflow definitions

Policy Enforcement

Planned Features

Rate Limiting: Limit executions per time window
Concurrency Control: Maximum concurrent executions per action/pack
Priority Queuing: High-priority executions jump the queue
Resource Quotas: Limit resource consumption per tenant
Execution Windows: Only execute during specified time periods

Implementation Location

Policy enforcement will be implemented in:

Enforcement Processor (pre-execution validation)
Scheduler (runtime constraint checking)
New PolicyEnforcer module (future)

Monitoring & Observability

Metrics (Future)

Executions per second (throughput)
Average execution duration
Queue depth and processing lag
Worker utilization
Error rates by type

Logging

Structured logging at multiple levels:

INFO: Successful operations, state transitions
WARN: Degraded states, retry attempts
ERROR: Failures requiring attention
DEBUG: Detailed flow for troubleshooting

Example:

INFO Processing enforcement: 123
INFO Selected worker 45 for execution 789
INFO Execution 789 scheduled to worker 45

Tracing

Message correlation and distributed tracing:

correlation_id: Links related messages
trace_id: End-to-end request tracing (future integration with OpenTelemetry)

Running the Service

Prerequisites

PostgreSQL 14+ with schema initialized
RabbitMQ 3.12+ with exchanges and queues configured
Environment variables or config file set up

Startup

# Using cargo
cd crates/executor
cargo run

# Or with environment overrides
ATTUNE__DATABASE__URL=postgresql://localhost/attune \
ATTUNE__MESSAGE_QUEUE__URL=amqp://localhost \
cargo run

Graceful Shutdown

The service supports graceful shutdown via SIGTERM/SIGINT:

Stop accepting new messages
Finish processing in-flight messages
Close message queue connections
Close database connections
Exit cleanly

Testing

Unit Tests

Each module includes unit tests for business logic:

Rule evaluation
Worker selection algorithms
Status parsing
Workflow creation

Integration Tests

Integration tests require PostgreSQL and RabbitMQ:

End-to-end enforcement → execution flow
Message queue reliability
Database consistency

Running Tests

# Unit tests only
cargo test -p attune-executor --lib

# Integration tests (requires services)
cargo test -p attune-executor --test '*'

Future Enhancements

Phase 1: Core Functionality (Current)

✅ Enforcement processing
✅ Execution scheduling
✅ Lifecycle management
✅ Message queue integration

Phase 2: Advanced Features (Next)

Policy enforcement (rate limiting, concurrency)
Advanced workflow orchestration
Inquiry handling (human-in-the-loop)
Retry and failure handling improvements

Phase 3: Production Readiness

Comprehensive monitoring and metrics
Performance optimization
High availability setup
Load testing and tuning

Phase 4: Enterprise Features

Multi-tenancy isolation
Advanced scheduling algorithms
Resource quotas and limits
Audit logging and compliance

Troubleshooting

Common Issues

Problem: Executions stuck in "Requested" status

Cause: Scheduler not running or no workers available
Solution: Verify scheduler is running, check worker status

Problem: Messages not being consumed

Cause: RabbitMQ connection issues or queue misconfiguration
Solution: Check MQ connection, verify queue bindings

Problem: Database connection errors

Cause: Connection pool exhausted or database down
Solution: Increase pool size, check database health

Debug Mode

Enable detailed logging:

RUST_LOG=attune_executor=debug,attune_common=debug cargo run

16 KiB Raw Blame History

Executor Service Architecture

Overview

Service Architecture

Core Components

1. Enforcement Processor

2. Execution Scheduler

3. Execution Manager

Message Queue Integration

Message Types

Message Envelope Structure

Consumer Handler Pattern

Database Integration

Repository Pattern

Database Update Ownership

Transaction Support

Configuration

Error Handling

Error Types

Retry Strategy

Dead Letter Queues

Workflow Orchestration

Parent-Child Executions

Future Enhancements

Policy Enforcement

Planned Features

Implementation Location

Monitoring & Observability

Metrics (Future)

Logging

Tracing

Running the Service

Prerequisites

Startup

Graceful Shutdown

Testing

Unit Tests

Integration Tests

Running Tests

Future Enhancements

Phase 1: Core Functionality (Current)

Phase 2: Advanced Features (Next)

Phase 3: Production Readiness

Phase 4: Enterprise Features

Troubleshooting

Common Issues

Debug Mode

Related Documentation

16 KiB

Raw Blame History