Files
attune/docs/sensors/sensor-interface.md
2026-02-04 17:46:30 -06:00

18 KiB

Sensor Interface Specification

Version: 1.0
Last Updated: 2025-01-27
Status: Draft

Overview

This document specifies the standard interface that all Attune sensors must implement. Sensors are lightweight, long-running daemon processes that monitor for events and emit them into the Attune platform. Each sensor type has exactly one process instance running at a time, and individual sensor instances are managed dynamically based on active rules.

Design Principles

  1. Single Process Per Sensor Type: Each sensor type (e.g., timer, webhook, file_watcher) runs as a single daemon process
  2. Lightweight & Async: Sensors should be event-driven and non-blocking
  3. Rule-Driven Behavior: Sensors manage multiple concurrent "instances" based on active rules
  4. RabbitMQ Communication: All control messages flow through RabbitMQ
  5. API Integration: Sensors use the Attune API to emit events and fetch configuration
  6. Standard Authentication: Sensors authenticate using transient API tokens
  7. Graceful Lifecycle: Sensors handle startup, shutdown, and dynamic reconfiguration

Sensor Lifecycle

1. Initialization

When a sensor starts, it must:

  1. Read Configuration from environment variables or stdin
  2. Authenticate with the Attune API using a transient token
  3. Connect to RabbitMQ and declare/bind to its control queue
  4. Load Active Rules from the API that use its trigger types
  5. Start Monitoring for each active rule
  6. Signal Ready (log startup completion)

2. Runtime Operation

During normal operation, a sensor:

  1. Listens to RabbitMQ for rule lifecycle messages (RuleCreated, RuleEnabled, RuleDisabled, RuleDeleted)
  2. Monitors External Sources (timers, webhooks, file systems, etc.) based on active rules
  3. Emits Events to the Attune API when trigger conditions are met
  4. Handles Errors gracefully without crashing
  5. Reports Health (periodic heartbeat/metrics - future)

3. Shutdown

On shutdown (SIGTERM/SIGINT), a sensor must:

  1. Stop Accepting New Work (stop listening to RabbitMQ)
  2. Cancel Active Monitors (stop timers, close connections)
  3. Flush Pending Events (send any buffered events to API)
  4. Close Connections (RabbitMQ, HTTP clients)
  5. Exit Cleanly with appropriate exit code

Configuration

Environment Variables

Sensors MUST accept the following environment variables:

Variable Required Description Example
ATTUNE_API_URL Yes Base URL of Attune API http://localhost:8080
ATTUNE_API_TOKEN Yes Transient API token for authentication sensor_abc123...
ATTUNE_SENSOR_REF Yes Reference name of this sensor core.timer
ATTUNE_MQ_URL Yes RabbitMQ connection URL amqp://localhost:5672
ATTUNE_MQ_EXCHANGE No RabbitMQ exchange name attune (default)
ATTUNE_LOG_LEVEL No Logging verbosity info (default)

Alternative: stdin Configuration

For containerized or orchestrated deployments, sensors MAY accept configuration as JSON on stdin:

{
  "api_url": "http://localhost:8080",
  "api_token": "sensor_abc123...",
  "sensor_ref": "core.timer",
  "mq_url": "amqp://localhost:5672",
  "mq_exchange": "attune",
  "log_level": "info"
}

If stdin is provided, it takes precedence over environment variables. The JSON must be a single line or complete object, followed by EOF or newline.

API Authentication: Transient Tokens

Token Requirements

  • Type: JWT with service_account identity type
  • Scope: Limited to sensor operations (create events, read rules)
  • Lifetime: Long-lived (90 days) and auto-expires
  • Rotation: Automatic refresh (sensor refreshes token when 80% of TTL elapsed)
  • Zero-Downtime: Hot-reload new tokens without restart

Token Format

Sensors receive a standard JWT that includes:

{
  "sub": "sensor:core.timer",
  "jti": "abc123def456",  // JWT ID for revocation tracking
  "identity_id": 123,
  "identity_type": "service_account",
  "scope": "sensor",
  "iat": 1738800000,  // Issued at
  "exp": 1738886400,  // Expires in 24-72 hours (REQUIRED)
  "metadata": {
    "trigger_types": ["core.timer"]  // Enforced by API
  }
}

API Endpoints Used by Sensors

Sensors interact with the following API endpoints:

Method Endpoint Purpose Auth
GET /rules?trigger_type={ref} Fetch active rules for this sensor's triggers Required
GET /triggers/{ref} Fetch trigger metadata Required
POST /events Create new event Required
POST /auth/refresh Refresh token before expiration Required
GET /health Verify API connectivity Optional

RabbitMQ Integration

Queue Naming

Each sensor binds to a dedicated queue for control messages:

  • Queue Name: sensor.{sensor_ref} (e.g., sensor.core.timer)
  • Durable: Yes
  • Auto-Delete: No
  • Exclusive: No

Exchange Binding

Sensors bind their queue to the main exchange with routing keys:

  • rule.created - New rule created
  • rule.enabled - Existing rule enabled
  • rule.disabled - Existing rule disabled
  • rule.deleted - Rule deleted

Message Format

All control messages follow this JSON schema:

{
  "event_type": "RuleCreated | RuleEnabled | RuleDisabled | RuleDeleted",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "trigger_params": {
    "interval_seconds": 5
  },
  "timestamp": "2025-01-27T12:34:56Z"
}

Message Handling

Sensors MUST:

  1. Validate messages against expected schema
  2. Filter messages to only process rules for their trigger types (based on token's metadata.trigger_types)
  3. Acknowledge messages after processing (or reject on unrecoverable error)
  4. Handle Duplicates idempotently (same rule_id + event_type)
  5. Enforce Trigger Type Restrictions: Only emit events for trigger types declared in the sensor's token metadata

Event Emission

Event Creation API

Sensors create events by POSTing to /events:

POST /events
Authorization: Bearer {sensor_token}
Content-Type: application/json

{
  "trigger_type": "core.timer",
  "payload": {
    "timestamp": "2025-01-27T12:34:56Z",
    "scheduled_time": "2025-01-27T12:34:56Z"
  },
  "trigger_instance_id": "rule_123"
}

Important: Sensors can only emit events for trigger types declared in their token's metadata.trigger_types. The API will reject event creation requests for unauthorized trigger types with a 403 Forbidden error.

Event Payload Guidelines

  • Timestamp: Always include event occurrence time
  • Context: Include relevant context for rule evaluation
  • Size: Keep payloads small (<1KB recommended, <10KB max)
  • Sensitive Data: Never include passwords, tokens, or PII unless explicitly required
  • Trigger Type Match: The trigger_type field must match one of the sensor's declared trigger types

Error Handling

If event creation fails:

  1. Retry with exponential backoff (3 attempts)
  2. Log Error with full context
  3. Continue Operating (don't crash on single event failure)
  4. Alert if failure rate exceeds threshold (future)

Sensor-Specific Behavior

Each sensor type implements trigger-specific logic. The sensor monitors external sources and translates them into Attune events.

Example: Timer Sensor

Trigger Type: core.timer

Parameters:

{
  "interval_seconds": 60
}

Behavior:

  • Maintains a hash map of rule_id -> tokio::task::JoinHandle
  • On RuleCreated/RuleEnabled: Start an async timer loop for the rule
  • On RuleDisabled/RuleDeleted: Cancel the timer task for the rule
  • Timer loop: Every interval, emit an event with current timestamp

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "scheduled_time": "2025-01-27T12:34:56Z"
}

Example: Webhook Sensor

Trigger Type: core.webhook

Parameters:

{
  "path": "/hooks/deployment",
  "method": "POST",
  "secret": "shared_secret_123"
}

Behavior:

  • Runs an HTTP server listening on configured port
  • On RuleCreated/RuleEnabled: Register a route handler for the webhook path
  • On RuleDisabled/RuleDeleted: Unregister the route handler
  • On incoming request: Validate secret, emit event with request body

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "method": "POST",
  "path": "/hooks/deployment",
  "headers": {"Content-Type": "application/json"},
  "body": {"status": "deployed"}
}

Example: File Watcher Sensor

Trigger Type: core.file_changed

Parameters:

{
  "path": "/var/log/app.log",
  "event_types": ["modified", "created"]
}

Behavior:

  • Uses inotify/FSEvents/equivalent to watch file system
  • On RuleCreated/RuleEnabled: Add watch for the specified path
  • On RuleDisabled/RuleDeleted: Remove watch for the path
  • On file system event: Emit event with file details

Event Payload:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "path": "/var/log/app.log",
  "event_type": "modified",
  "size": 12345
}

Implementation Guidelines

Language & Runtime

  • Recommended: Rust (for consistency with Attune services)
  • Alternatives: Python, Node.js, Go (if justified by use case)
  • Async I/O: Required for scalability

Dependencies

Sensors should use:

  • HTTP Client: For API communication (e.g., reqwest in Rust)
  • RabbitMQ Client: For message queue (e.g., lapin in Rust)
  • Async Runtime: For concurrency (e.g., tokio in Rust)
  • JSON Parsing: For message/event handling (e.g., serde_json in Rust)
  • Logging: Structured logging (e.g., tracing in Rust)

Error Handling

  • Panic/Crash: Never panic on external input (messages, API responses)
  • Retry Logic: Implement exponential backoff for transient failures
  • Circuit Breaker: Consider circuit breaker for API calls (future)
  • Graceful Degradation: Continue operating even if some rules fail

Logging

Sensors MUST log:

  • Startup: Configuration loaded, connections established
  • Rule Changes: Rule added/removed/updated
  • Events Emitted: Event type and rule_id (not full payload)
  • Errors: All errors with context
  • Shutdown: Graceful shutdown initiated and completed

Log format should be JSON for structured logging:

{
  "timestamp": "2025-01-27T12:34:56Z",
  "level": "info",
  "sensor": "core.timer",
  "message": "Timer started for rule",
  "rule_id": 123,
  "interval_seconds": 5
}

Testing

Sensors should include:

  • Unit Tests: Test message parsing, event creation logic
  • Integration Tests: Test against real RabbitMQ and API (test environment)
  • Mock Tests: Test with mocked API/MQ for isolated testing

Security Considerations

Token Storage

  • Never Log Tokens: Redact tokens in logs
  • Memory Only: Keep tokens in memory, never write to disk
  • Automatic Refresh: Refresh token when 80% of TTL elapsed (no restart required)
  • Hot-Reload: Update in-memory token without interrupting operations
  • Refresh Failure Handling: Log errors and retry with exponential backoff

Input Validation

  • Validate All Inputs: RabbitMQ messages, API responses
  • Sanitize Payloads: Prevent injection attacks in event payloads
  • Rate Limiting: Prevent resource exhaustion from malicious triggers
  • Trigger Type Enforcement: API validates that sensor tokens can only create events for declared trigger types

Network Security

  • TLS: Use HTTPS for API calls in production
  • AMQPS: Use TLS for RabbitMQ in production
  • Timeouts: Set reasonable timeouts for all network calls

Deployment

Service Management

Sensors should be managed as system services:

  • systemd: Linux deployments
  • launchd: macOS deployments
  • Docker: Container deployments
  • Kubernetes: Orchestrated deployments (one pod per sensor type)

Resource Limits

Recommended limits:

  • Memory: 64-256 MB per sensor (depends on rule count)
  • CPU: Minimal (<5% avg, spikes allowed)
  • Network: Low bandwidth (<1 Mbps typical)
  • Disk: Minimal (logs only)

Monitoring

Sensors should expose metrics (future):

  • Rules Active: Count of rules being monitored
  • Events Emitted: Counter of events created
  • Errors: Counter of errors by type
  • API Latency: Histogram of API call durations
  • MQ Latency: Histogram of message processing durations

Compatibility

Versioning

Sensors should:

  • Declare Version: Include sensor version in logs and metrics
  • API Compatibility: Support current API version
  • Message Compatibility: Handle unknown fields gracefully

Backwards Compatibility

When updating sensors:

  • Add Fields: New message fields are optional
  • Deprecate Fields: Old fields remain supported for 2+ versions
  • Breaking Changes: Require major version bump and migration guide

Appendix: Reference Implementation

See attune/crates/sensor/ for the reference timer sensor implementation in Rust.

Key components:

  • src/main.rs - Initialization and configuration
  • src/listener.rs - RabbitMQ message handling
  • src/timer.rs - Timer-specific logic
  • src/api_client.rs - API communication

Appendix: Message Queue Schema

Rule Lifecycle Messages

Exchange: attune (topic exchange)

RuleCreated:

{
  "event_type": "RuleCreated",
  "rule_id": 123,
  "rule_ref": "timer_every_5s",
  "trigger_type": "core.timer",
  "trigger_params": {"interval_seconds": 5},
  "enabled": true,
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleEnabled:

{
  "event_type": "RuleEnabled",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "trigger_params": {"interval_seconds": 5},
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleDisabled:

{
  "event_type": "RuleDisabled",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "timestamp": "2025-01-27T12:34:56Z"
}

RuleDeleted:

{
  "event_type": "RuleDeleted",
  "rule_id": 123,
  "trigger_type": "core.timer",
  "timestamp": "2025-01-27T12:34:56Z"
}

Appendix: API Token Management

Creating Sensor Tokens

Tokens are created via the Attune API (admin only):

POST /service-accounts
Authorization: Bearer {admin_token}
Content-Type: application/json

{
  "name": "sensor:core.timer",
  "description": "Timer sensor service account",
  "scope": "sensor",
  "ttl_days": 90
}

Response:

{
  "identity_id": 123,
  "name": "sensor:core.timer",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2025-04-27T12:34:56Z"
}

Token Scopes

Scope Permissions
sensor Create events, read rules/triggers
action Read keys, update execution status (for action runners)
admin Full access (for CLI, web UI)

Token Lifecycle Management

Automatic Token Refresh

Sensors automatically refresh their own tokens without human intervention:

Refresh Timing:

  • Tokens have 90-day TTL
  • Sensors refresh when 80% of TTL elapsed (72 days)
  • Calculation: refresh_at = issued_at + (TTL * 0.8)

Refresh Process:

  1. Background task monitors token expiration
  2. When refresh threshold reached, call POST /auth/refresh with current token
  3. Receive new token with fresh 90-day TTL
  4. Hot-load new token (update in-memory reference)
  5. Old token remains valid until original expiration
  6. Continue operations without interruption

Implementation Pattern:

// Calculate when to refresh (80% of TTL)
let token_exp = decode_jwt(&token)?.exp;
let token_iat = decode_jwt(&token)?.iat;
let ttl_seconds = token_exp - token_iat;
let refresh_at = token_iat + (ttl_seconds * 8 / 10);

// Spawn background refresh task
tokio::spawn(async move {
    loop {
        let now = current_timestamp();
        if now >= refresh_at {
            match api_client.refresh_token().await {
                Ok(new_token) => {
                    update_token(new_token);
                    info!("Token refreshed successfully");
                }
                Err(e) => {
                    error!("Failed to refresh token: {}", e);
                    // Retry with exponential backoff
                }
            }
        }
        sleep(Duration::from_hours(1)).await;
    }
});

Refresh Failure Handling:

  1. Log error with full context
  2. Retry with exponential backoff (1min, 2min, 4min, 8min, max 1 hour)
  3. Continue using old token (still valid until expiration)
  4. Alert monitoring system after 3 consecutive failures
  5. If old token expires before successful refresh, shut down gracefully

Zero-Downtime:

  • Old token valid during refresh
  • No service interruption
  • Graceful degradation on failure
  • No manual intervention required

Token Expiration (Edge Case)

If automatic refresh fails and token expires:

  1. API returns 401 Unauthorized
  2. Sensor logs critical error
  3. Sensor shuts down gracefully (stops accepting work, completes in-flight operations)
  4. Operator must manually create new token and restart sensor

This should rarely occur if automatic refresh is working correctly.

Future Enhancements

  1. Health Checks: HTTP endpoint for liveness/readiness probes
  2. Metrics Export: Prometheus-compatible metrics endpoint (including token refresh metrics)
  3. Dynamic Discovery: Auto-discover available sensors from registry
  4. Sensor Scaling: Support multiple instances per sensor type with work distribution
  5. Backpressure: Handle event backlog when API is slow/unavailable
  6. Circuit Breaker: Automatic failover when API is unreachable
  7. Sensor Plugins: Dynamic loading of sensor implementations
  8. Configurable Refresh Threshold: Allow custom refresh timing (e.g., 75%, 85%)
  9. Token Refresh Alerts: Alert on refresh failures, not normal refresh events