attune-system/attune

Fork 0

Files

David Culbreth a74e13fa0b working out the worker/execution interface

2026-02-08 12:55:33 -06:00

18 KiB

Raw Blame History

Sensor Interface Specification

Version: 1.0
Last Updated: 2025-01-27
Status: Draft

Overview

This document specifies the standard interface that all Attune sensors must implement. Sensors are lightweight, long-running daemon processes that monitor for events and emit them into the Attune platform. Each sensor type has exactly one process instance running at a time, and individual sensor instances are managed dynamically based on active rules.

Design Principles

Single Process Per Sensor Type: Each sensor type (e.g., timer, webhook, file_watcher) runs as a single daemon process
Lightweight & Async: Sensors should be event-driven and non-blocking
Rule-Driven Behavior: Sensors manage multiple concurrent "instances" based on active rules
RabbitMQ Communication: All control messages flow through RabbitMQ
API Integration: Sensors use the Attune API to emit events and fetch configuration
Standard Authentication: Sensors authenticate using transient API tokens
Graceful Lifecycle: Sensors handle startup, shutdown, and dynamic reconfiguration

Sensor Lifecycle

1. Initialization

When a sensor starts, it must:

Read Configuration from environment variables or stdin
Authenticate with the Attune API using a transient token
Connect to RabbitMQ and declare/bind to its control queue
Load Active Rules from the API that use its trigger types
Start Monitoring for each active rule
Signal Ready (log startup completion)

2. Runtime Operation

During normal operation, a sensor:

Listens to RabbitMQ for rule lifecycle messages (RuleCreated, RuleEnabled, RuleDisabled, RuleDeleted)
Monitors External Sources (timers, webhooks, file systems, etc.) based on active rules
Emits Events to the Attune API when trigger conditions are met
Handles Errors gracefully without crashing
Reports Health (periodic heartbeat/metrics - future)

3. Shutdown

On shutdown (SIGTERM/SIGINT), a sensor must:

Stop Accepting New Work (stop listening to RabbitMQ)
Cancel Active Monitors (stop timers, close connections)
Flush Pending Events (send any buffered events to API)
Close Connections (RabbitMQ, HTTP clients)
Exit Cleanly with appropriate exit code

Configuration

Environment Variables

Sensors MUST accept the following environment variables:

Variable	Required	Description	Example
`ATTUNE_API_URL`	Yes	Base URL of Attune API	`http://localhost:8080`
`ATTUNE_API_TOKEN`	Yes	Transient API token for authentication	`sensor_abc123...`
`ATTUNE_SENSOR_ID`	Yes	Sensor database ID	`42`
`ATTUNE_SENSOR_REF`	Yes	Reference name of this sensor	`core.timer`
`ATTUNE_MQ_URL`	Yes	RabbitMQ connection URL	`amqp://localhost:5672`
`ATTUNE_MQ_EXCHANGE`	No	RabbitMQ exchange name	`attune` (default)
`ATTUNE_LOG_LEVEL`	No	Logging verbosity	`info` (default)

Note: These environment variables provide parity with action execution context (see QUICKREF-execution-environment.md). Sensors receive:

ATTUNE_SENSOR_ID - analogous to ATTUNE_EXEC_ID for actions
ATTUNE_SENSOR_REF - analogous to ATTUNE_ACTION for actions
ATTUNE_API_TOKEN and ATTUNE_API_URL - same as actions for API access

Alternative: stdin Configuration

For containerized or orchestrated deployments, sensors MAY accept configuration as JSON on stdin:

{
  "api_url": "http://localhost:8080",
  "api_token": "sensor_abc123...",
  "sensor_ref": "core.timer",
  "mq_url": "amqp://localhost:5672",
  "mq_exchange": "attune",
  "log_level": "info"
}

If stdin is provided, it takes precedence over environment variables. The JSON must be a single line or complete object, followed by EOF or newline.

API Authentication: Transient Tokens

Token Requirements

Type: JWT with service_account identity type
Scope: Limited to sensor operations (create events, read rules)
Lifetime: Long-lived (90 days) and auto-expires
Rotation: Automatic refresh (sensor refreshes token when 80% of TTL elapsed)
Zero-Downtime: Hot-reload new tokens without restart

Token Format

Sensors receive a standard JWT that includes:

{
  "sub": "sensor:core.timer",
  "jti": "abc123def456",  // JWT ID for revocation tracking
  "identity_id": 123,
  "identity_type": "service_account",
  "scope": "sensor",
  "iat": 1738800000,  // Issued at
  "exp": 1738886400,  // Expires in 24-72 hours (REQUIRED)
  "metadata": {
    "trigger_types": ["core.timer"]  // Enforced by API
  }
}

API Endpoints Used by Sensors

Sensors interact with the following API endpoints:

Method	Endpoint	Purpose	Auth
GET	`/rules?trigger_type={ref}`	Fetch active rules for this sensor's triggers	Required
GET	`/triggers/{ref}`	Fetch trigger metadata	Required
POST	`/events`	Create new event	Required
POST	`/auth/refresh`	Refresh token before expiration	Required
GET	`/health`	Verify API connectivity	Optional

RabbitMQ Integration

Queue Naming

Each sensor binds to a dedicated queue for control messages:

Queue Name: sensor.{sensor_ref} (e.g., sensor.core.timer)
Durable: Yes
Auto-Delete: No
Exclusive: No

Exchange Binding

Sensors bind their queue to the main exchange with routing keys:

rule.created - New rule created
rule.enabled - Existing rule enabled
rule.disabled - Existing rule disabled
rule.deleted - Rule deleted

Message Format

All control messages follow this JSON schema:

{
  "event_type": "RuleCreated | RuleEnabled | RuleDisabled | RuleDeleted",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "trigger_params": {
    "interval_seconds": 5
  },
  "timestamp": "2025-01-27T12:34:56Z"
}

Message Handling

Sensors MUST:

Validate messages against expected schema
Filter messages to only process rules for their trigger types (based on token's metadata.trigger_types)
Acknowledge messages after processing (or reject on unrecoverable error)
Handle Duplicates idempotently (same rule_id + event_type)
Enforce Trigger Type Restrictions: Only emit events for trigger types declared in the sensor's token metadata

Event Emission

Event Creation API

Sensors create events by POSTing to /events:

POST /events
Authorization: Bearer {sensor_token}
Content-Type: application/json

{
  "trigger_type": "core.timer",
  "payload": {
    "timestamp": "2025-01-27T12:34:56Z",
    "scheduled_time": "2025-01-27T12:34:56Z"
  },
  "trigger_instance_id": "rule_123"
}

Important: Sensors can only emit events for trigger types declared in their token's metadata.trigger_types. The API will reject event creation requests for unauthorized trigger types with a 403 Forbidden error.

Event Payload Guidelines

Timestamp: Always include event occurrence time
Context: Include relevant context for rule evaluation
Size: Keep payloads small (<1KB recommended, <10KB max)
Sensitive Data: Never include passwords, tokens, or PII unless explicitly required
Trigger Type Match: The trigger_type field must match one of the sensor's declared trigger types

Error Handling

If event creation fails:

Retry with exponential backoff (3 attempts)
Log Error with full context
Continue Operating (don't crash on single event failure)
Alert if failure rate exceeds threshold (future)

Sensor-Specific Behavior

Each sensor type implements trigger-specific logic. The sensor monitors external sources and translates them into Attune events.

Example: Timer Sensor

Trigger Type: core.timer

Parameters:

{
  "interval_seconds": 60
}

Behavior:

Maintains a hash map of rule_id -> tokio::task::JoinHandle
On RuleCreated/RuleEnabled: Start an async timer loop for the rule
On RuleDisabled/RuleDeleted: Cancel the timer task for the rule
Timer loop: Every interval, emit an event with current timestamp

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "scheduled_time": "2025-01-27T12:34:56Z"
}

Example: Webhook Sensor

Trigger Type: core.webhook

Parameters:

{
  "path": "/hooks/deployment",
  "method": "POST",
  "secret": "shared_secret_123"
}

Behavior:

Runs an HTTP server listening on configured port
On RuleCreated/RuleEnabled: Register a route handler for the webhook path
On RuleDisabled/RuleDeleted: Unregister the route handler
On incoming request: Validate secret, emit event with request body

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "method": "POST",
  "path": "/hooks/deployment",
  "headers": {"Content-Type": "application/json"},
  "body": {"status": "deployed"}
}

Example: File Watcher Sensor

Trigger Type: core.file_changed

Parameters:

{
  "path": "/var/log/app.log",
  "event_types": ["modified", "created"]
}

Behavior:

Uses inotify/FSEvents/equivalent to watch file system
On RuleCreated/RuleEnabled: Add watch for the specified path
On RuleDisabled/RuleDeleted: Remove watch for the path
On file system event: Emit event with file details

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "path": "/var/log/app.log",
  "event_type": "modified",
  "size": 12345
}

Implementation Guidelines

Language & Runtime

Recommended: Rust (for consistency with Attune services)
Alternatives: Python, Node.js, Go (if justified by use case)
Async I/O: Required for scalability

Dependencies

Sensors should use:

HTTP Client: For API communication (e.g., reqwest in Rust)
RabbitMQ Client: For message queue (e.g., lapin in Rust)
Async Runtime: For concurrency (e.g., tokio in Rust)
JSON Parsing: For message/event handling (e.g., serde_json in Rust)
Logging: Structured logging (e.g., tracing in Rust)

Error Handling

Panic/Crash: Never panic on external input (messages, API responses)
Retry Logic: Implement exponential backoff for transient failures
Circuit Breaker: Consider circuit breaker for API calls (future)
Graceful Degradation: Continue operating even if some rules fail

Logging

Sensors MUST log:

Startup: Configuration loaded, connections established
Rule Changes: Rule added/removed/updated
Events Emitted: Event type and rule_id (not full payload)
Errors: All errors with context
Shutdown: Graceful shutdown initiated and completed

Log format should be JSON for structured logging:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "level": "info",
  "sensor": "core.timer",
  "message": "Timer started for rule",
  "rule_id": 123,
  "interval_seconds": 5
}

Testing

Sensors should include:

Unit Tests: Test message parsing, event creation logic
Integration Tests: Test against real RabbitMQ and API (test environment)
Mock Tests: Test with mocked API/MQ for isolated testing

Security Considerations

Token Storage

Never Log Tokens: Redact tokens in logs
Memory Only: Keep tokens in memory, never write to disk
Automatic Refresh: Refresh token when 80% of TTL elapsed (no restart required)
Hot-Reload: Update in-memory token without interrupting operations
Refresh Failure Handling: Log errors and retry with exponential backoff

Input Validation

Validate All Inputs: RabbitMQ messages, API responses
Sanitize Payloads: Prevent injection attacks in event payloads
Rate Limiting: Prevent resource exhaustion from malicious triggers
Trigger Type Enforcement: API validates that sensor tokens can only create events for declared trigger types

Network Security

TLS: Use HTTPS for API calls in production
AMQPS: Use TLS for RabbitMQ in production
Timeouts: Set reasonable timeouts for all network calls

Deployment

Service Management

Sensors should be managed as system services:

systemd: Linux deployments
launchd: macOS deployments
Docker: Container deployments
Kubernetes: Orchestrated deployments (one pod per sensor type)

Resource Limits

Recommended limits:

Memory: 64-256 MB per sensor (depends on rule count)
CPU: Minimal (<5% avg, spikes allowed)
Network: Low bandwidth (<1 Mbps typical)
Disk: Minimal (logs only)

Monitoring

Sensors should expose metrics (future):

Rules Active: Count of rules being monitored
Events Emitted: Counter of events created
Errors: Counter of errors by type
API Latency: Histogram of API call durations
MQ Latency: Histogram of message processing durations

Compatibility

Versioning

Sensors should:

Declare Version: Include sensor version in logs and metrics
API Compatibility: Support current API version
Message Compatibility: Handle unknown fields gracefully

Backwards Compatibility

When updating sensors:

Add Fields: New message fields are optional
Deprecate Fields: Old fields remain supported for 2+ versions
Breaking Changes: Require major version bump and migration guide

Appendix: Reference Implementation

See attune/crates/sensor/ for the reference timer sensor implementation in Rust.

Key components:

src/main.rs - Initialization and configuration
src/listener.rs - RabbitMQ message handling
src/timer.rs - Timer-specific logic
src/api_client.rs - API communication

Appendix: Message Queue Schema

Rule Lifecycle Messages

Exchange: attune (topic exchange)

RuleCreated:

{
  "event_type": "RuleCreated",
  "rule_id": 123,
  "rule_ref": "timer_every_5s",
  "trigger_type": "core.timer",
  "trigger_params": {"interval_seconds": 5},
  "enabled": true,
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleEnabled:

{
  "event_type": "RuleEnabled",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "trigger_params": {"interval_seconds": 5},
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleDisabled:

{
  "event_type": "RuleDisabled",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleDeleted:

{
  "event_type": "RuleDeleted",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "timestamp": "2025-01-27T12:34:56Z"
}

Appendix: API Token Management

Creating Sensor Tokens

Tokens are created via the Attune API (admin only):

POST /service-accounts
Authorization: Bearer {admin_token}
Content-Type: application/json

{
  "name": "sensor:core.timer",
  "description": "Timer sensor service account",
  "scope": "sensor",
  "ttl_days": 90
}

Response:

{
  "identity_id": 123,
  "name": "sensor:core.timer",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2025-04-27T12:34:56Z"
}

Token Scopes

Scope	Permissions
`sensor`	Create events, read rules/triggers
`action`	Read keys, update execution status (for action runners)
`admin`	Full access (for CLI, web UI)

Token Lifecycle Management

Automatic Token Refresh

Sensors automatically refresh their own tokens without human intervention:

Refresh Timing:

Tokens have 90-day TTL
Sensors refresh when 80% of TTL elapsed (72 days)
Calculation: refresh_at = issued_at + (TTL * 0.8)

Refresh Process:

Background task monitors token expiration
When refresh threshold reached, call POST /auth/refresh with current token
Receive new token with fresh 90-day TTL
Hot-load new token (update in-memory reference)
Old token remains valid until original expiration
Continue operations without interruption

Implementation Pattern:

// Calculate when to refresh (80% of TTL)
let token_exp = decode_jwt(&token)?.exp;
let token_iat = decode_jwt(&token)?.iat;
let ttl_seconds = token_exp - token_iat;
let refresh_at = token_iat + (ttl_seconds * 8 / 10);

// Spawn background refresh task
tokio::spawn(async move {
    loop {
        let now = current_timestamp();
        if now >= refresh_at {
            match api_client.refresh_token().await {
                Ok(new_token) => {
                    update_token(new_token);
                    info!("Token refreshed successfully");
                }
                Err(e) => {
                    error!("Failed to refresh token: {}", e);
                    // Retry with exponential backoff
                }
            }
        }
        sleep(Duration::from_hours(1)).await;
    }
});

Refresh Failure Handling:

Log error with full context
Retry with exponential backoff (1min, 2min, 4min, 8min, max 1 hour)
Continue using old token (still valid until expiration)
Alert monitoring system after 3 consecutive failures
If old token expires before successful refresh, shut down gracefully

Zero-Downtime:

Old token valid during refresh
No service interruption
Graceful degradation on failure
No manual intervention required

Token Expiration (Edge Case)

If automatic refresh fails and token expires:

API returns 401 Unauthorized
Sensor logs critical error
Sensor shuts down gracefully (stops accepting work, completes in-flight operations)
Operator must manually create new token and restart sensor

This should rarely occur if automatic refresh is working correctly.

Future Enhancements

Health Checks: HTTP endpoint for liveness/readiness probes
Metrics Export: Prometheus-compatible metrics endpoint (including token refresh metrics)
Dynamic Discovery: Auto-discover available sensors from registry
Sensor Scaling: Support multiple instances per sensor type with work distribution
Backpressure: Handle event backlog when API is slow/unavailable
Circuit Breaker: Automatic failover when API is unreachable
Sensor Plugins: Dynamic loading of sensor implementations
Configurable Refresh Threshold: Allow custom refresh timing (e.g., 75%, 85%)
Token Refresh Alerts: Alert on refresh failures, not normal refresh events

18 KiB Raw Blame History

Sensor Interface Specification

Overview

Design Principles

Sensor Lifecycle

1. Initialization

2. Runtime Operation

3. Shutdown

Configuration

Environment Variables

Alternative: stdin Configuration

API Authentication: Transient Tokens

Token Requirements

Token Format

API Endpoints Used by Sensors

RabbitMQ Integration

Queue Naming

Exchange Binding

Message Format

Message Handling

Event Emission

Event Creation API

Event Payload Guidelines

Error Handling

Sensor-Specific Behavior

Example: Timer Sensor

Example: Webhook Sensor

Example: File Watcher Sensor

Implementation Guidelines

Language & Runtime

Dependencies

Error Handling

Logging

Testing

Security Considerations

Token Storage

Input Validation

Network Security

Deployment

Service Management

Resource Limits

Monitoring

Compatibility

Versioning

Backwards Compatibility

Appendix: Reference Implementation

Appendix: Message Queue Schema

Rule Lifecycle Messages

Appendix: API Token Management

Creating Sensor Tokens

Token Scopes

Token Lifecycle Management

Automatic Token Refresh

Token Expiration (Edge Case)

Future Enhancements

18 KiB

Raw Blame History