16 KiB
Executor Service Architecture
Overview
The Executor Service is the core orchestration engine of the Attune automation platform. It is responsible for processing rule enforcements, scheduling executions to workers, managing execution lifecycle, and orchestrating complex workflows.
Service Architecture
The Executor is structured as a distributed microservice with three main processing components:
┌─────────────────────────────────────────────────────────────┐
│ Executor Service │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌──────────────────────┐ │
│ │ Enforcement │ │ Execution │ │
│ │ Processor │ │ Scheduler │ │
│ └─────────────────────┘ └──────────────────────┘ │
│ │ │ │
│ │ │ │
│ v v │
│ ┌─────────────────────────────────────────────┐ │
│ │ Execution Manager │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
│ │ │
v v v
PostgreSQL RabbitMQ Workers
Core Components
1. Enforcement Processor
Purpose: Processes triggered rules and creates execution requests.
Responsibilities:
- Listens for
enforcement.createdmessages from triggered rules - Fetches enforcement, rule, and event data from the database
- Evaluates rule conditions and policies
- Creates execution records in the database
- Publishes
execution.requestedmessages to the scheduler
Message Flow:
Rule Triggered → Enforcement Created → Enforcement Processor → Execution Created
Key Implementation Details:
- Uses
consume_with_handlerpattern for message consumption - All processing methods are static to enable shared state across async handlers
- Validates rule is enabled before creating executions
- Links executions to enforcements for audit trail
2. Execution Scheduler
Purpose: Routes execution requests to available workers.
Responsibilities:
- Listens for
execution.requestedmessages - Determines runtime requirements for the action
- Selects appropriate workers based on:
- Runtime compatibility
- Worker status (active only)
- Load balancing (future: capacity, affinity, locality)
- Updates execution status to
Scheduled - Publishes
execution.scheduledmessages to worker queues
Message Flow:
Execution Requested → Scheduler → Worker Selection → Execution Scheduled → Worker
Worker Selection Algorithm:
- Fetch all available workers
- Filter by runtime compatibility (if action specifies runtime)
- Filter by worker status (only active workers)
- Apply load balancing strategy (currently: first available)
- Future: Consider capacity, affinity, geographic locality
Key Implementation Details:
- Supports multiple worker types (local, remote, container)
- Handles worker unavailability with error responses
- Plans for intelligent scheduling based on worker capabilities
3. Execution Manager
Purpose: Orchestrates execution workflows and handles lifecycle events.
Responsibilities:
- Listens for
execution.status.*messages from workers - Does NOT update execution state (worker owns state after scheduling)
- Handles execution completion orchestration (triggering child executions)
- Manages workflow executions (parent-child relationships)
- Coordinates workflow state transitions
Ownership Model:
- Executor owns: Requested → Scheduling → Scheduled (updates DB)
- Includes pre-handoff cancellations/failures (before
execution.scheduledis published)
- Includes pre-handoff cancellations/failures (before
- Worker owns: Running → Completed/Failed/Cancelled (updates DB)
- Includes post-handoff cancellations/failures (after receiving
execution.scheduled)
- Includes post-handoff cancellations/failures (after receiving
- Handoff Point: When
execution.scheduledmessage is published to worker- Before publish: Executor owns and updates state
- After publish: Worker owns and updates state
Message Flow:
Worker Status Update → Execution Manager → Orchestration Logic (Read-Only)
→ Trigger Child Executions
Status Lifecycle:
Requested → Scheduling → Scheduled → [HANDOFF: execution.scheduled published] → Running → Completed/Failed/Cancelled
│ │ │
└─ Executor Updates ───┘ └─ Worker Updates
│ (includes pre-handoff │ (includes post-handoff
│ Cancelled) │ Cancelled/Timeout/Abandoned)
│
└→ Child Executions (workflows)
Key Implementation Details:
- Receives status change notifications for orchestration purposes only
- Does not update execution state after handoff to worker
- Handles workflow orchestration (parent-child execution chaining)
- Only triggers child executions on successful parent completion
- Read-only access to execution records for orchestration logic
Message Queue Integration
Message Types
The Executor consumes and produces several message types:
Consumed:
enforcement.created- New enforcement from triggered rulesexecution.requested- Execution scheduling requestsexecution.status.changed- Status change notifications from workers (for orchestration)execution.completed- Completion notifications from workers (for queue management)
Published:
execution.requested- To scheduler (from enforcement processor)execution.scheduled- To workers (from scheduler) ← OWNERSHIP HANDOFF
Note: The executor does NOT publish execution.completed messages. This is the worker's responsibility as the authoritative source of execution state after scheduling.
Message Envelope Structure
All messages use the standardized MessageEnvelope<T> structure:
MessageEnvelope {
message_id: Uuid,
message_type: MessageType,
source: String,
timestamp: DateTime<Utc>,
correlation_id: Option<Uuid>,
trace_id: Option<String>,
payload: T,
retry_count: u32,
}
Consumer Handler Pattern
All processors use the consume_with_handler pattern for robust message consumption:
consumer.consume_with_handler(move |envelope: MessageEnvelope<PayloadType>| {
// Clone shared state
let pool = pool.clone();
let publisher = publisher.clone();
async move {
// Process message
Self::process_message(&pool, &publisher, &envelope).await
.map_err(|e| format!("Error: {}", e).into())
}
}).await?;
Benefits:
- Automatic message acknowledgment on success
- Automatic nack with requeue on retriable errors
- Automatic dead letter queue routing on non-retriable errors
- Built-in error handling and logging
Database Integration
Repository Pattern
All database access uses the repository layer:
use attune_common::repositories::{
enforcement::EnforcementRepository,
execution::ExecutionRepository,
rule::RuleRepository,
Create, FindById, Update, List,
};
Database Update Ownership
Executor updates execution state from creation through handoff:
- Creates execution records (
Requestedstatus) - Updates status during scheduling (
Scheduling→Scheduled) - Publishes
execution.scheduledmessage to worker ← HANDOFF POINT - Handles cancellations/failures BEFORE handoff (before message is published)
- Example: User cancels execution while queued by concurrency policy
- Executor updates to
Cancelled, worker never receives message
Worker updates execution state after receiving handoff:
- Receives
execution.scheduledmessage (takes ownership) - Updates status when execution starts (
Running) - Updates status when execution completes (
Completed,Failed, etc.) - Handles cancellations/failures AFTER handoff (after receiving message)
- Updates result data and artifacts
- Worker only owns executions it has received
Executor reads execution state for orchestration after handoff:
- Receives status change notifications from workers
- Reads execution records to trigger workflow children
- Does NOT update execution state after publishing
execution.scheduled
Transaction Support
Future implementations will use database transactions for multi-step operations:
- Creating execution + publishing message (atomic)
- Enforcement processing + execution creation (atomic)
Configuration
The Executor service uses the standard Attune configuration system:
# config.yaml
database:
url: postgresql://localhost/attune
max_connections: 20
message_queue:
url: amqp://localhost
exchange: attune.executions
prefetch_count: 10
Environment variable overrides:
ATTUNE__DATABASE__URL=postgresql://prod-db/attune
ATTUNE__MESSAGE_QUEUE__URL=amqp://prod-mq
Error Handling
Error Types
The Executor handles several error categories:
- Database Errors: Connection issues, query failures
- Message Queue Errors: Connection drops, serialization failures
- Business Logic Errors: Missing entities, invalid states
- Worker Errors: No workers available, incompatible runtimes
Retry Strategy
- Retriable Errors: Requeued for retry (connection issues, timeouts)
- Non-Retriable Errors: Sent to dead letter queue (invalid data, missing entities)
- Retry Limits: Configured per queue (future implementation)
Dead Letter Queues
Failed messages are automatically routed to dead letter queues for investigation:
executor.enforcement.created.dlqexecutor.execution.requested.dlqexecutor.execution.status.dlq
Workflow Orchestration
Parent-Child Executions
The Executor supports complex workflows through parent-child execution relationships:
Parent Execution (Completed)
├── Child Execution 1 (action_ref: "pack.action1")
├── Child Execution 2 (action_ref: "pack.action2")
└── Child Execution 3 (action_ref: "pack.action3")
Implementation:
- Parent execution stores child action references
- On parent completion, Execution Manager creates child executions
- Child executions inherit parent's configuration
- Each child is independently scheduled and executed
Future Enhancements
- Conditional Workflows: Execute children based on parent result
- Parallel vs Sequential: Control execution order
- Workflow DAGs: Complex dependency graphs
- Workflow Templates: Reusable workflow definitions
Policy Enforcement
Planned Features
- Rate Limiting: Limit executions per time window
- Concurrency Control: Maximum concurrent executions per action/pack
- Priority Queuing: High-priority executions jump the queue
- Resource Quotas: Limit resource consumption per tenant
- Execution Windows: Only execute during specified time periods
Implementation Location
Policy enforcement will be implemented in:
- Enforcement Processor (pre-execution validation)
- Scheduler (runtime constraint checking)
- New
PolicyEnforcermodule (future)
Monitoring & Observability
Metrics (Future)
- Executions per second (throughput)
- Average execution duration
- Queue depth and processing lag
- Worker utilization
- Error rates by type
Logging
Structured logging at multiple levels:
INFO: Successful operations, state transitionsWARN: Degraded states, retry attemptsERROR: Failures requiring attentionDEBUG: Detailed flow for troubleshooting
Example:
INFO Processing enforcement: 123
INFO Selected worker 45 for execution 789
INFO Execution 789 scheduled to worker 45
Tracing
Message correlation and distributed tracing:
correlation_id: Links related messagestrace_id: End-to-end request tracing (future integration with OpenTelemetry)
Running the Service
Prerequisites
- PostgreSQL 14+ with schema initialized
- RabbitMQ 3.12+ with exchanges and queues configured
- Environment variables or config file set up
Startup
# Using cargo
cd crates/executor
cargo run
# Or with environment overrides
ATTUNE__DATABASE__URL=postgresql://localhost/attune \
ATTUNE__MESSAGE_QUEUE__URL=amqp://localhost \
cargo run
Graceful Shutdown
The service supports graceful shutdown via SIGTERM/SIGINT:
- Stop accepting new messages
- Finish processing in-flight messages
- Close message queue connections
- Close database connections
- Exit cleanly
Testing
Unit Tests
Each module includes unit tests for business logic:
- Rule evaluation
- Worker selection algorithms
- Status parsing
- Workflow creation
Integration Tests
Integration tests require PostgreSQL and RabbitMQ:
- End-to-end enforcement → execution flow
- Message queue reliability
- Database consistency
Running Tests
# Unit tests only
cargo test -p attune-executor --lib
# Integration tests (requires services)
cargo test -p attune-executor --test '*'
Future Enhancements
Phase 1: Core Functionality (Current)
- ✅ Enforcement processing
- ✅ Execution scheduling
- ✅ Lifecycle management
- ✅ Message queue integration
Phase 2: Advanced Features (Next)
- Policy enforcement (rate limiting, concurrency)
- Advanced workflow orchestration
- Inquiry handling (human-in-the-loop)
- Retry and failure handling improvements
Phase 3: Production Readiness
- Comprehensive monitoring and metrics
- Performance optimization
- High availability setup
- Load testing and tuning
Phase 4: Enterprise Features
- Multi-tenancy isolation
- Advanced scheduling algorithms
- Resource quotas and limits
- Audit logging and compliance
Troubleshooting
Common Issues
Problem: Executions stuck in "Requested" status
- Cause: Scheduler not running or no workers available
- Solution: Verify scheduler is running, check worker status
Problem: Messages not being consumed
- Cause: RabbitMQ connection issues or queue misconfiguration
- Solution: Check MQ connection, verify queue bindings
Problem: Database connection errors
- Cause: Connection pool exhausted or database down
- Solution: Increase pool size, check database health
Debug Mode
Enable detailed logging:
RUST_LOG=attune_executor=debug,attune_common=debug cargo run