615 lines
18 KiB
Markdown
615 lines
18 KiB
Markdown
# Sensor Interface Specification
|
|
|
|
**Version:** 1.0
|
|
**Last Updated:** 2025-01-27
|
|
**Status:** Draft
|
|
|
|
## Overview
|
|
|
|
This document specifies the standard interface that all Attune sensors must implement. Sensors are lightweight, long-running daemon processes that monitor for events and emit them into the Attune platform. Each sensor type has exactly one process instance running at a time, and individual sensor instances are managed dynamically based on active rules.
|
|
|
|
## Design Principles
|
|
|
|
1. **Single Process Per Sensor Type**: Each sensor type (e.g., timer, webhook, file_watcher) runs as a single daemon process
|
|
2. **Lightweight & Async**: Sensors should be event-driven and non-blocking
|
|
3. **Rule-Driven Behavior**: Sensors manage multiple concurrent "instances" based on active rules
|
|
4. **RabbitMQ Communication**: All control messages flow through RabbitMQ
|
|
5. **API Integration**: Sensors use the Attune API to emit events and fetch configuration
|
|
6. **Standard Authentication**: Sensors authenticate using transient API tokens
|
|
7. **Graceful Lifecycle**: Sensors handle startup, shutdown, and dynamic reconfiguration
|
|
|
|
## Sensor Lifecycle
|
|
|
|
### 1. Initialization
|
|
|
|
When a sensor starts, it must:
|
|
|
|
1. **Read Configuration** from environment variables or stdin
|
|
2. **Authenticate** with the Attune API using a transient token
|
|
3. **Connect to RabbitMQ** and declare/bind to its control queue
|
|
4. **Load Active Rules** from the API that use its trigger types
|
|
5. **Start Monitoring** for each active rule
|
|
6. **Signal Ready** (log startup completion)
|
|
|
|
### 2. Runtime Operation
|
|
|
|
During normal operation, a sensor:
|
|
|
|
1. **Listens to RabbitMQ** for rule lifecycle messages (`RuleCreated`, `RuleEnabled`, `RuleDisabled`, `RuleDeleted`)
|
|
2. **Monitors External Sources** (timers, webhooks, file systems, etc.) based on active rules
|
|
3. **Emits Events** to the Attune API when trigger conditions are met
|
|
4. **Handles Errors** gracefully without crashing
|
|
5. **Reports Health** (periodic heartbeat/metrics - future)
|
|
|
|
### 3. Shutdown
|
|
|
|
On shutdown (SIGTERM/SIGINT), a sensor must:
|
|
|
|
1. **Stop Accepting New Work** (stop listening to RabbitMQ)
|
|
2. **Cancel Active Monitors** (stop timers, close connections)
|
|
3. **Flush Pending Events** (send any buffered events to API)
|
|
4. **Close Connections** (RabbitMQ, HTTP clients)
|
|
5. **Exit Cleanly** with appropriate exit code
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
Sensors MUST accept the following environment variables:
|
|
|
|
| Variable | Required | Description | Example |
|
|
|----------|----------|-------------|---------|
|
|
| `ATTUNE_API_URL` | Yes | Base URL of Attune API | `http://localhost:8080` |
|
|
| `ATTUNE_API_TOKEN` | Yes | Transient API token for authentication | `sensor_abc123...` |
|
|
| `ATTUNE_SENSOR_ID` | Yes | Sensor database ID | `42` |
|
|
| `ATTUNE_SENSOR_REF` | Yes | Reference name of this sensor | `core.timer` |
|
|
| `ATTUNE_MQ_URL` | Yes | RabbitMQ connection URL | `amqp://localhost:5672` |
|
|
| `ATTUNE_MQ_EXCHANGE` | No | RabbitMQ exchange name | `attune` (default) |
|
|
| `ATTUNE_LOG_LEVEL` | No | Logging verbosity | `info` (default) |
|
|
|
|
**Note:** These environment variables provide parity with action execution context (see `QUICKREF-execution-environment.md`). Sensors receive:
|
|
- `ATTUNE_SENSOR_ID` - analogous to `ATTUNE_EXEC_ID` for actions
|
|
- `ATTUNE_SENSOR_REF` - analogous to `ATTUNE_ACTION` for actions
|
|
- `ATTUNE_API_TOKEN` and `ATTUNE_API_URL` - same as actions for API access
|
|
|
|
### Alternative: stdin Configuration
|
|
|
|
For containerized or orchestrated deployments, sensors MAY accept configuration as JSON on stdin:
|
|
|
|
```json
|
|
{
|
|
"api_url": "http://localhost:8080",
|
|
"api_token": "sensor_abc123...",
|
|
"sensor_ref": "core.timer",
|
|
"mq_url": "amqp://localhost:5672",
|
|
"mq_exchange": "attune",
|
|
"log_level": "info"
|
|
}
|
|
```
|
|
|
|
If stdin is provided, it takes precedence over environment variables. The JSON must be a single line or complete object, followed by EOF or newline.
|
|
|
|
## API Authentication: Transient Tokens
|
|
|
|
### Token Requirements
|
|
|
|
- **Type**: JWT with `service_account` identity type
|
|
- **Scope**: Limited to sensor operations (create events, read rules)
|
|
- **Lifetime**: Long-lived (90 days) and auto-expires
|
|
- **Rotation**: Automatic refresh (sensor refreshes token when 80% of TTL elapsed)
|
|
- **Zero-Downtime**: Hot-reload new tokens without restart
|
|
|
|
### Token Format
|
|
|
|
Sensors receive a standard JWT that includes:
|
|
|
|
```json
|
|
{
|
|
"sub": "sensor:core.timer",
|
|
"jti": "abc123def456", // JWT ID for revocation tracking
|
|
"identity_id": 123,
|
|
"identity_type": "service_account",
|
|
"scope": "sensor",
|
|
"iat": 1738800000, // Issued at
|
|
"exp": 1738886400, // Expires in 24-72 hours (REQUIRED)
|
|
"metadata": {
|
|
"trigger_types": ["core.timer"] // Enforced by API
|
|
}
|
|
}
|
|
```
|
|
|
|
### API Endpoints Used by Sensors
|
|
|
|
Sensors interact with the following API endpoints:
|
|
|
|
| Method | Endpoint | Purpose | Auth |
|
|
|--------|----------|---------|------|
|
|
| GET | `/rules?trigger_type={ref}` | Fetch active rules for this sensor's triggers | Required |
|
|
| GET | `/triggers/{ref}` | Fetch trigger metadata | Required |
|
|
| POST | `/events` | Create new event | Required |
|
|
| POST | `/auth/refresh` | Refresh token before expiration | Required |
|
|
| GET | `/health` | Verify API connectivity | Optional |
|
|
|
|
## RabbitMQ Integration
|
|
|
|
### Queue Naming
|
|
|
|
Each sensor binds to a dedicated queue for control messages:
|
|
|
|
- **Queue Name**: `sensor.{sensor_ref}` (e.g., `sensor.core.timer`)
|
|
- **Durable**: Yes
|
|
- **Auto-Delete**: No
|
|
- **Exclusive**: No
|
|
|
|
### Exchange Binding
|
|
|
|
Sensors bind their queue to the main exchange with routing keys:
|
|
|
|
- `rule.created` - New rule created
|
|
- `rule.enabled` - Existing rule enabled
|
|
- `rule.disabled` - Existing rule disabled
|
|
- `rule.deleted` - Rule deleted
|
|
|
|
### Message Format
|
|
|
|
All control messages follow this JSON schema:
|
|
|
|
```json
|
|
{
|
|
"event_type": "RuleCreated | RuleEnabled | RuleDisabled | RuleDeleted",
|
|
"rule_id": 123,
|
|
"trigger_type": "core.timer",
|
|
"trigger_params": {
|
|
"interval_seconds": 5
|
|
},
|
|
"timestamp": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
### Message Handling
|
|
|
|
Sensors MUST:
|
|
|
|
1. **Validate** messages against expected schema
|
|
2. **Filter** messages to only process rules for their trigger types (based on token's `metadata.trigger_types`)
|
|
3. **Acknowledge** messages after processing (or reject on unrecoverable error)
|
|
4. **Handle Duplicates** idempotently (same rule_id + event_type)
|
|
5. **Enforce Trigger Type Restrictions**: Only emit events for trigger types declared in the sensor's token metadata
|
|
|
|
## Event Emission
|
|
|
|
### Event Creation API
|
|
|
|
Sensors create events by POSTing to `/events`:
|
|
|
|
```http
|
|
POST /events
|
|
Authorization: Bearer {sensor_token}
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"trigger_ref": "core.timer",
|
|
"payload": {
|
|
"timestamp": "2025-01-27T12:34:56Z",
|
|
"scheduled_time": "2025-01-27T12:34:56Z"
|
|
},
|
|
"trigger_instance_id": "rule_123"
|
|
}
|
|
```
|
|
|
|
> **Note**: `trigger_type` is accepted as an alias for `trigger_ref` for backward compatibility, but `trigger_ref` is the canonical field name.
|
|
|
|
**Important**: Sensors can only emit events for trigger types declared in their token's `metadata.trigger_types`. The API will reject event creation requests for unauthorized trigger types with a `403 Forbidden` error.
|
|
|
|
### Event Payload Guidelines
|
|
|
|
- **Timestamp**: Always include event occurrence time
|
|
- **Context**: Include relevant context for rule evaluation
|
|
- **Size**: Keep payloads small (<1KB recommended, <10KB max)
|
|
- **Sensitive Data**: Never include passwords, tokens, or PII unless explicitly required
|
|
- **Trigger Type Match**: The `trigger_type` field must match one of the sensor's declared trigger types
|
|
|
|
### Error Handling
|
|
|
|
If event creation fails:
|
|
|
|
1. **Retry** with exponential backoff (3 attempts)
|
|
2. **Log Error** with full context
|
|
3. **Continue Operating** (don't crash on single event failure)
|
|
4. **Alert** if failure rate exceeds threshold (future)
|
|
|
|
## Sensor-Specific Behavior
|
|
|
|
Each sensor type implements trigger-specific logic. The sensor monitors external sources and translates them into Attune events.
|
|
|
|
### Example: Timer Sensor
|
|
|
|
**Trigger Type**: `core.timer`
|
|
|
|
**Parameters**:
|
|
```json
|
|
{
|
|
"interval_seconds": 60
|
|
}
|
|
```
|
|
|
|
**Behavior**:
|
|
- Maintains a hash map of `rule_id -> tokio::task::JoinHandle`
|
|
- On `RuleCreated`/`RuleEnabled`: Start an async timer loop for the rule
|
|
- On `RuleDisabled`/`RuleDeleted`: Cancel the timer task for the rule
|
|
- Timer loop: Every interval, emit an event with current timestamp
|
|
|
|
**Event Payload**:
|
|
```json
|
|
{
|
|
"timestamp": "2025-01-27T12:34:56Z",
|
|
"scheduled_time": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
### Example: Webhook Sensor
|
|
|
|
**Trigger Type**: `core.webhook`
|
|
|
|
**Parameters**:
|
|
```json
|
|
{
|
|
"path": "/hooks/deployment",
|
|
"method": "POST",
|
|
"secret": "shared_secret_123"
|
|
}
|
|
```
|
|
|
|
**Behavior**:
|
|
- Runs an HTTP server listening on configured port
|
|
- On `RuleCreated`/`RuleEnabled`: Register a route handler for the webhook path
|
|
- On `RuleDisabled`/`RuleDeleted`: Unregister the route handler
|
|
- On incoming request: Validate secret, emit event with request body
|
|
|
|
**Event Payload**:
|
|
```json
|
|
{
|
|
"timestamp": "2025-01-27T12:34:56Z",
|
|
"method": "POST",
|
|
"path": "/hooks/deployment",
|
|
"headers": {"Content-Type": "application/json"},
|
|
"body": {"status": "deployed"}
|
|
}
|
|
```
|
|
|
|
### Example: File Watcher Sensor
|
|
|
|
**Trigger Type**: `core.file_changed`
|
|
|
|
**Parameters**:
|
|
```json
|
|
{
|
|
"path": "/var/log/app.log",
|
|
"event_types": ["modified", "created"]
|
|
}
|
|
```
|
|
|
|
**Behavior**:
|
|
- Uses inotify/FSEvents/equivalent to watch file system
|
|
- On `RuleCreated`/`RuleEnabled`: Add watch for the specified path
|
|
- On `RuleDisabled`/`RuleDeleted`: Remove watch for the path
|
|
- On file system event: Emit event with file details
|
|
|
|
**Event Payload**:
|
|
```json
|
|
{
|
|
"timestamp": "2025-01-27T12:34:56Z",
|
|
"path": "/var/log/app.log",
|
|
"event_type": "modified",
|
|
"size": 12345
|
|
}
|
|
```
|
|
|
|
## Implementation Guidelines
|
|
|
|
### Language & Runtime
|
|
|
|
- **Recommended**: Rust (for consistency with Attune services)
|
|
- **Alternatives**: Python, Node.js, Go (if justified by use case)
|
|
- **Async I/O**: Required for scalability
|
|
|
|
### Dependencies
|
|
|
|
Sensors should use:
|
|
|
|
- **HTTP Client**: For API communication (e.g., `reqwest` in Rust)
|
|
- **RabbitMQ Client**: For message queue (e.g., `lapin` in Rust)
|
|
- **Async Runtime**: For concurrency (e.g., `tokio` in Rust)
|
|
- **JSON Parsing**: For message/event handling (e.g., `serde_json` in Rust)
|
|
- **Logging**: Structured logging (e.g., `tracing` in Rust)
|
|
|
|
### Error Handling
|
|
|
|
- **Panic/Crash**: Never panic on external input (messages, API responses)
|
|
- **Retry Logic**: Implement exponential backoff for transient failures
|
|
- **Circuit Breaker**: Consider circuit breaker for API calls (future)
|
|
- **Graceful Degradation**: Continue operating even if some rules fail
|
|
|
|
### Logging
|
|
|
|
Sensors MUST log:
|
|
|
|
- **Startup**: Configuration loaded, connections established
|
|
- **Rule Changes**: Rule added/removed/updated
|
|
- **Events Emitted**: Event type and rule_id (not full payload)
|
|
- **Errors**: All errors with context
|
|
- **Shutdown**: Graceful shutdown initiated and completed
|
|
|
|
Log format should be JSON for structured logging:
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2025-01-27T12:34:56Z",
|
|
"level": "info",
|
|
"sensor": "core.timer",
|
|
"message": "Timer started for rule",
|
|
"rule_id": 123,
|
|
"interval_seconds": 5
|
|
}
|
|
```
|
|
|
|
### Testing
|
|
|
|
Sensors should include:
|
|
|
|
- **Unit Tests**: Test message parsing, event creation logic
|
|
- **Integration Tests**: Test against real RabbitMQ and API (test environment)
|
|
- **Mock Tests**: Test with mocked API/MQ for isolated testing
|
|
|
|
## Security Considerations
|
|
|
|
### Token Storage
|
|
|
|
- **Never Log Tokens**: Redact tokens in logs
|
|
- **Memory Only**: Keep tokens in memory, never write to disk
|
|
- **Automatic Refresh**: Refresh token when 80% of TTL elapsed (no restart required)
|
|
- **Hot-Reload**: Update in-memory token without interrupting operations
|
|
- **Refresh Failure Handling**: Log errors and retry with exponential backoff
|
|
|
|
### Input Validation
|
|
|
|
- **Validate All Inputs**: RabbitMQ messages, API responses
|
|
- **Sanitize Payloads**: Prevent injection attacks in event payloads
|
|
- **Rate Limiting**: Prevent resource exhaustion from malicious triggers
|
|
- **Trigger Type Enforcement**: API validates that sensor tokens can only create events for declared trigger types
|
|
|
|
### Network Security
|
|
|
|
- **TLS**: Use HTTPS for API calls in production
|
|
- **AMQPS**: Use TLS for RabbitMQ in production
|
|
- **Timeouts**: Set reasonable timeouts for all network calls
|
|
|
|
## Deployment
|
|
|
|
### Service Management
|
|
|
|
Sensors should be managed as system services:
|
|
|
|
- **systemd**: Linux deployments
|
|
- **launchd**: macOS deployments
|
|
- **Docker**: Container deployments
|
|
- **Kubernetes**: Orchestrated deployments (one pod per sensor type)
|
|
|
|
### Resource Limits
|
|
|
|
Recommended limits:
|
|
|
|
- **Memory**: 64-256 MB per sensor (depends on rule count)
|
|
- **CPU**: Minimal (<5% avg, spikes allowed)
|
|
- **Network**: Low bandwidth (<1 Mbps typical)
|
|
- **Disk**: Minimal (logs only)
|
|
|
|
### Monitoring
|
|
|
|
Sensors should expose metrics (future):
|
|
|
|
- **Rules Active**: Count of rules being monitored
|
|
- **Events Emitted**: Counter of events created
|
|
- **Errors**: Counter of errors by type
|
|
- **API Latency**: Histogram of API call durations
|
|
- **MQ Latency**: Histogram of message processing durations
|
|
|
|
## Compatibility
|
|
|
|
### Versioning
|
|
|
|
Sensors should:
|
|
|
|
- **Declare Version**: Include sensor version in logs and metrics
|
|
- **API Compatibility**: Support current API version
|
|
- **Message Compatibility**: Handle unknown fields gracefully
|
|
|
|
### Backwards Compatibility
|
|
|
|
When updating sensors:
|
|
|
|
- **Add Fields**: New message fields are optional
|
|
- **Deprecate Fields**: Old fields remain supported for 2+ versions
|
|
- **Breaking Changes**: Require major version bump and migration guide
|
|
|
|
## Appendix: Reference Implementation
|
|
|
|
See `attune/crates/sensor/` for the reference timer sensor implementation in Rust.
|
|
|
|
Key components:
|
|
|
|
- `src/main.rs` - Initialization and configuration
|
|
- `src/listener.rs` - RabbitMQ message handling
|
|
- `src/timer.rs` - Timer-specific logic
|
|
- `src/api_client.rs` - API communication
|
|
|
|
## Appendix: Message Queue Schema
|
|
|
|
### Rule Lifecycle Messages
|
|
|
|
**Exchange**: `attune` (topic exchange)
|
|
|
|
**RuleCreated**:
|
|
```json
|
|
{
|
|
"event_type": "RuleCreated",
|
|
"rule_id": 123,
|
|
"rule_ref": "timer_every_5s",
|
|
"trigger_type": "core.timer",
|
|
"trigger_params": {"interval_seconds": 5},
|
|
"enabled": true,
|
|
"timestamp": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
**RuleEnabled**:
|
|
```json
|
|
{
|
|
"event_type": "RuleEnabled",
|
|
"rule_id": 123,
|
|
"trigger_type": "core.timer",
|
|
"trigger_params": {"interval_seconds": 5},
|
|
"timestamp": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
**RuleDisabled**:
|
|
```json
|
|
{
|
|
"event_type": "RuleDisabled",
|
|
"rule_id": 123,
|
|
"trigger_type": "core.timer",
|
|
"timestamp": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
**RuleDeleted**:
|
|
```json
|
|
{
|
|
"event_type": "RuleDeleted",
|
|
"rule_id": 123,
|
|
"trigger_type": "core.timer",
|
|
"timestamp": "2025-01-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
## Appendix: API Token Management
|
|
|
|
### Creating Sensor Tokens
|
|
|
|
Tokens are created via the Attune API (admin only):
|
|
|
|
```http
|
|
POST /service-accounts
|
|
Authorization: Bearer {admin_token}
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"name": "sensor:core.timer",
|
|
"description": "Timer sensor service account",
|
|
"scope": "sensor",
|
|
"ttl_days": 90
|
|
}
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"identity_id": 123,
|
|
"name": "sensor:core.timer",
|
|
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
|
"expires_at": "2025-04-27T12:34:56Z"
|
|
}
|
|
```
|
|
|
|
### Token Scopes
|
|
|
|
| Scope | Permissions |
|
|
|-------|-------------|
|
|
| `sensor` | Create events, read rules/triggers |
|
|
| `action` | Read keys, update execution status (for action runners) |
|
|
| `admin` | Full access (for CLI, web UI) |
|
|
|
|
## Token Lifecycle Management
|
|
|
|
### Automatic Token Refresh
|
|
|
|
Sensors automatically refresh their own tokens without human intervention:
|
|
|
|
**Refresh Timing:**
|
|
- Tokens have 90-day TTL
|
|
- Sensors refresh when 80% of TTL elapsed (72 days)
|
|
- Calculation: `refresh_at = issued_at + (TTL * 0.8)`
|
|
|
|
**Refresh Process:**
|
|
1. Background task monitors token expiration
|
|
2. When refresh threshold reached, call `POST /auth/refresh` with current token
|
|
3. Receive new token with fresh 90-day TTL
|
|
4. Hot-load new token (update in-memory reference)
|
|
5. Old token remains valid until original expiration
|
|
6. Continue operations without interruption
|
|
|
|
**Implementation Pattern:**
|
|
```rust
|
|
// Calculate when to refresh (80% of TTL)
|
|
let token_exp = decode_jwt(&token)?.exp;
|
|
let token_iat = decode_jwt(&token)?.iat;
|
|
let ttl_seconds = token_exp - token_iat;
|
|
let refresh_at = token_iat + (ttl_seconds * 8 / 10);
|
|
|
|
// Spawn background refresh task
|
|
tokio::spawn(async move {
|
|
loop {
|
|
let now = current_timestamp();
|
|
if now >= refresh_at {
|
|
match api_client.refresh_token().await {
|
|
Ok(new_token) => {
|
|
update_token(new_token);
|
|
info!("Token refreshed successfully");
|
|
}
|
|
Err(e) => {
|
|
error!("Failed to refresh token: {}", e);
|
|
// Retry with exponential backoff
|
|
}
|
|
}
|
|
}
|
|
sleep(Duration::from_hours(1)).await;
|
|
}
|
|
});
|
|
```
|
|
|
|
**Refresh Failure Handling:**
|
|
1. Log error with full context
|
|
2. Retry with exponential backoff (1min, 2min, 4min, 8min, max 1 hour)
|
|
3. Continue using old token (still valid until expiration)
|
|
4. Alert monitoring system after 3 consecutive failures
|
|
5. If old token expires before successful refresh, shut down gracefully
|
|
|
|
**Zero-Downtime:**
|
|
- Old token valid during refresh
|
|
- No service interruption
|
|
- Graceful degradation on failure
|
|
- No manual intervention required
|
|
|
|
### Token Expiration (Edge Case)
|
|
|
|
If automatic refresh fails and token expires:
|
|
|
|
1. API returns 401 Unauthorized
|
|
2. Sensor logs critical error
|
|
3. Sensor shuts down gracefully (stops accepting work, completes in-flight operations)
|
|
4. Operator must manually create new token and restart sensor
|
|
|
|
**This should rarely occur** if automatic refresh is working correctly.
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Health Checks**: HTTP endpoint for liveness/readiness probes
|
|
2. **Metrics Export**: Prometheus-compatible metrics endpoint (including token refresh metrics)
|
|
3. **Dynamic Discovery**: Auto-discover available sensors from registry
|
|
4. **Sensor Scaling**: Support multiple instances per sensor type with work distribution
|
|
5. **Backpressure**: Handle event backlog when API is slow/unavailable
|
|
6. **Circuit Breaker**: Automatic failover when API is unreachable
|
|
7. **Sensor Plugins**: Dynamic loading of sensor implementations
|
|
8. **Configurable Refresh Threshold**: Allow custom refresh timing (e.g., 75%, 85%)
|
|
9. **Token Refresh Alerts**: Alert on refresh failures, not normal refresh events |