13 KiB
Executor Service Architecture
Overview
The Executor Service is the core orchestration engine of the Attune automation platform. It is responsible for processing rule enforcements, scheduling executions to workers, managing execution lifecycle, and orchestrating complex workflows.
Service Architecture
The Executor is structured as a distributed microservice with three main processing components:
┌─────────────────────────────────────────────────────────────┐
│ Executor Service │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌──────────────────────┐ │
│ │ Enforcement │ │ Execution │ │
│ │ Processor │ │ Scheduler │ │
│ └─────────────────────┘ └──────────────────────┘ │
│ │ │ │
│ │ │ │
│ v v │
│ ┌─────────────────────────────────────────────┐ │
│ │ Execution Manager │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
│ │ │
v v v
PostgreSQL RabbitMQ Workers
Core Components
1. Enforcement Processor
Purpose: Processes triggered rules and creates execution requests.
Responsibilities:
- Listens for
enforcement.createdmessages from triggered rules - Fetches enforcement, rule, and event data from the database
- Evaluates rule conditions and policies
- Creates execution records in the database
- Publishes
execution.requestedmessages to the scheduler
Message Flow:
Rule Triggered → Enforcement Created → Enforcement Processor → Execution Created
Key Implementation Details:
- Uses
consume_with_handlerpattern for message consumption - All processing methods are static to enable shared state across async handlers
- Validates rule is enabled before creating executions
- Links executions to enforcements for audit trail
2. Execution Scheduler
Purpose: Routes execution requests to available workers.
Responsibilities:
- Listens for
execution.requestedmessages - Determines runtime requirements for the action
- Selects appropriate workers based on:
- Runtime compatibility
- Worker status (active only)
- Load balancing (future: capacity, affinity, locality)
- Updates execution status to
Scheduled - Publishes
execution.scheduledmessages to worker queues
Message Flow:
Execution Requested → Scheduler → Worker Selection → Execution Scheduled → Worker
Worker Selection Algorithm:
- Fetch all available workers
- Filter by runtime compatibility (if action specifies runtime)
- Filter by worker status (only active workers)
- Apply load balancing strategy (currently: first available)
- Future: Consider capacity, affinity, geographic locality
Key Implementation Details:
- Supports multiple worker types (local, remote, container)
- Handles worker unavailability with error responses
- Plans for intelligent scheduling based on worker capabilities
3. Execution Manager
Purpose: Manages execution lifecycle and status transitions.
Responsibilities:
- Listens for
execution.status.*messages from workers - Updates execution records with status changes
- Handles execution completion (success, failure, cancellation)
- Orchestrates workflow executions (parent-child relationships)
- Publishes completion notifications for downstream consumers
Message Flow:
Worker Status Update → Execution Manager → Database Update → Completion Handler
Status Lifecycle:
Requested → Scheduling → Scheduled → Running → Completed/Failed/Cancelled
│
└→ Child Executions (workflows)
Key Implementation Details:
- Parses status strings to typed enums for type safety
- Handles workflow orchestration (parent-child execution chaining)
- Only triggers child executions on successful parent completion
- Publishes completion events for notification service
Message Queue Integration
Message Types
The Executor consumes and produces several message types:
Consumed:
enforcement.created- New enforcement from triggered rulesexecution.requested- Execution scheduling requestsexecution.status.*- Status updates from workers
Published:
execution.requested- To scheduler (from enforcement processor)execution.scheduled- To workers (from scheduler)execution.completed- To notifier (from execution manager)
Message Envelope Structure
All messages use the standardized MessageEnvelope<T> structure:
MessageEnvelope {
message_id: Uuid,
message_type: MessageType,
source: String,
timestamp: DateTime<Utc>,
correlation_id: Option<Uuid>,
trace_id: Option<String>,
payload: T,
retry_count: u32,
}
Consumer Handler Pattern
All processors use the consume_with_handler pattern for robust message consumption:
consumer.consume_with_handler(move |envelope: MessageEnvelope<PayloadType>| {
// Clone shared state
let pool = pool.clone();
let publisher = publisher.clone();
async move {
// Process message
Self::process_message(&pool, &publisher, &envelope).await
.map_err(|e| format!("Error: {}", e).into())
}
}).await?;
Benefits:
- Automatic message acknowledgment on success
- Automatic nack with requeue on retriable errors
- Automatic dead letter queue routing on non-retriable errors
- Built-in error handling and logging
Database Integration
Repository Pattern
All database access uses the repository layer:
use attune_common::repositories::{
enforcement::EnforcementRepository,
execution::ExecutionRepository,
rule::RuleRepository,
Create, FindById, Update, List,
};
Transaction Support
Future implementations will use database transactions for multi-step operations:
- Creating execution + publishing message (atomic)
- Status update + completion handling (atomic)
Configuration
The Executor service uses the standard Attune configuration system:
# config.yaml
database:
url: postgresql://localhost/attune
max_connections: 20
message_queue:
url: amqp://localhost
exchange: attune.executions
prefetch_count: 10
Environment variable overrides:
ATTUNE__DATABASE__URL=postgresql://prod-db/attune
ATTUNE__MESSAGE_QUEUE__URL=amqp://prod-mq
Error Handling
Error Types
The Executor handles several error categories:
- Database Errors: Connection issues, query failures
- Message Queue Errors: Connection drops, serialization failures
- Business Logic Errors: Missing entities, invalid states
- Worker Errors: No workers available, incompatible runtimes
Retry Strategy
- Retriable Errors: Requeued for retry (connection issues, timeouts)
- Non-Retriable Errors: Sent to dead letter queue (invalid data, missing entities)
- Retry Limits: Configured per queue (future implementation)
Dead Letter Queues
Failed messages are automatically routed to dead letter queues for investigation:
executor.enforcement.created.dlqexecutor.execution.requested.dlqexecutor.execution.status.dlq
Workflow Orchestration
Parent-Child Executions
The Executor supports complex workflows through parent-child execution relationships:
Parent Execution (Completed)
├── Child Execution 1 (action_ref: "pack.action1")
├── Child Execution 2 (action_ref: "pack.action2")
└── Child Execution 3 (action_ref: "pack.action3")
Implementation:
- Parent execution stores child action references
- On parent completion, Execution Manager creates child executions
- Child executions inherit parent's configuration
- Each child is independently scheduled and executed
Future Enhancements
- Conditional Workflows: Execute children based on parent result
- Parallel vs Sequential: Control execution order
- Workflow DAGs: Complex dependency graphs
- Workflow Templates: Reusable workflow definitions
Policy Enforcement
Planned Features
- Rate Limiting: Limit executions per time window
- Concurrency Control: Maximum concurrent executions per action/pack
- Priority Queuing: High-priority executions jump the queue
- Resource Quotas: Limit resource consumption per tenant
- Execution Windows: Only execute during specified time periods
Implementation Location
Policy enforcement will be implemented in:
- Enforcement Processor (pre-execution validation)
- Scheduler (runtime constraint checking)
- New
PolicyEnforcermodule (future)
Monitoring & Observability
Metrics (Future)
- Executions per second (throughput)
- Average execution duration
- Queue depth and processing lag
- Worker utilization
- Error rates by type
Logging
Structured logging at multiple levels:
INFO: Successful operations, state transitionsWARN: Degraded states, retry attemptsERROR: Failures requiring attentionDEBUG: Detailed flow for troubleshooting
Example:
INFO Processing enforcement: 123
INFO Selected worker 45 for execution 789
INFO Execution 789 scheduled to worker 45
Tracing
Message correlation and distributed tracing:
correlation_id: Links related messagestrace_id: End-to-end request tracing (future integration with OpenTelemetry)
Running the Service
Prerequisites
- PostgreSQL 14+ with schema initialized
- RabbitMQ 3.12+ with exchanges and queues configured
- Environment variables or config file set up
Startup
# Using cargo
cd crates/executor
cargo run
# Or with environment overrides
ATTUNE__DATABASE__URL=postgresql://localhost/attune \
ATTUNE__MESSAGE_QUEUE__URL=amqp://localhost \
cargo run
Graceful Shutdown
The service supports graceful shutdown via SIGTERM/SIGINT:
- Stop accepting new messages
- Finish processing in-flight messages
- Close message queue connections
- Close database connections
- Exit cleanly
Testing
Unit Tests
Each module includes unit tests for business logic:
- Rule evaluation
- Worker selection algorithms
- Status parsing
- Workflow creation
Integration Tests
Integration tests require PostgreSQL and RabbitMQ:
- End-to-end enforcement → execution flow
- Message queue reliability
- Database consistency
Running Tests
# Unit tests only
cargo test -p attune-executor --lib
# Integration tests (requires services)
cargo test -p attune-executor --test '*'
Future Enhancements
Phase 1: Core Functionality (Current)
- ✅ Enforcement processing
- ✅ Execution scheduling
- ✅ Lifecycle management
- ✅ Message queue integration
Phase 2: Advanced Features (Next)
- Policy enforcement (rate limiting, concurrency)
- Advanced workflow orchestration
- Inquiry handling (human-in-the-loop)
- Retry and failure handling improvements
Phase 3: Production Readiness
- Comprehensive monitoring and metrics
- Performance optimization
- High availability setup
- Load testing and tuning
Phase 4: Enterprise Features
- Multi-tenancy isolation
- Advanced scheduling algorithms
- Resource quotas and limits
- Audit logging and compliance
Troubleshooting
Common Issues
Problem: Executions stuck in "Requested" status
- Cause: Scheduler not running or no workers available
- Solution: Verify scheduler is running, check worker status
Problem: Messages not being consumed
- Cause: RabbitMQ connection issues or queue misconfiguration
- Solution: Check MQ connection, verify queue bindings
Problem: Database connection errors
- Cause: Connection pool exhausted or database down
- Solution: Increase pool size, check database health
Debug Mode
Enable detailed logging:
RUST_LOG=attune_executor=debug,attune_common=debug cargo run