attune-system/attune

Fork 0

Files

David Culbreth 3b14c65998 re-uploading work

2026-02-04 17:46:30 -06:00

22 KiB

Raw Blame History

Service Accounts and Transient API Tokens

Version: 1.0
Last Updated: 2025-01-27
Status: Draft

Overview

Service accounts provide programmatic access to the Attune API for sensors, action executions, and other automated processes. Unlike user accounts, service accounts:

Have no password (token-based authentication only)
Have limited scopes (principle of least privilege)
Can be short-lived or long-lived depending on use case
Are not tied to a human user
Can be easily revoked without affecting user access

Use Cases

Sensors: Long-lived tokens for sensor daemons to emit events
Action Executions: Short-lived tokens scoped to a single execution
CLI Tools: User-scoped tokens for command-line operations
Webhooks: Tokens for external systems to trigger actions
Monitoring: Tokens for health checks and metrics collection

Token Types

1. Sensor Tokens

Purpose: Authentication for sensor daemon processes

Characteristics:

Lifetime: Long-lived (90 days, auto-expires)
Scope: sensor
Permissions: Create events, read rules/triggers for specific trigger types
Revocable: Yes (manual revocation via API)
Renewable: Yes (automatic refresh via API, no restart required)
Rotation: Automatic (sensor refreshes token when 80% of TTL elapsed)

Example Usage:

ATTUNE_API_TOKEN=sensor_abc123... ./attune-sensor --sensor-ref core.timer

2. Action Execution Tokens

Purpose: Authentication for action scripts during execution

Characteristics:

Lifetime: Short-lived (matches execution timeout, typically 5-60 minutes)
Scope: action_execution
Permissions: Read keys, update execution status, limited to specific execution_id
Revocable: Yes (auto-revoked on execution completion or timeout)
Renewable: No (single-use, expires when execution completes or times out)
Auto-Cleanup: Token revocation records are auto-deleted after expiration

Example Usage:

# Action script receives token via environment variable
import os
import requests

api_url = os.environ['ATTUNE_API_URL']
api_token = os.environ['ATTUNE_API_TOKEN']
execution_id = os.environ['ATTUNE_EXECUTION_ID']

# Fetch encrypted key
response = requests.get(
    f"{api_url}/keys/myapp.api_key",
    headers={"Authorization": f"Bearer {api_token}"}
)
secret = response.json()['value']

3. User CLI Tokens

Purpose: Authentication for CLI tools on behalf of a user

Characteristics:

Lifetime: Medium-lived (7-30 days)
Scope: user
Permissions: Full user permissions (RBAC-based)
Revocable: Yes
Renewable: Yes (via refresh token)

Example Usage:

attune auth login  # Stores token in ~/.attune/token
attune action execute core.echo --param message="Hello"

4. Webhook Tokens

Purpose: Authentication for external systems calling Attune webhooks

Characteristics:

Lifetime: Long-lived (90-365 days, auto-expires)
Scope: webhook
Permissions: Trigger specific actions or create events
Revocable: Yes
Renewable: Yes (generate new token before expiration)
Rotation: Recommended every 90 days

Example Usage:

curl -X POST https://attune.example.com/api/webhooks/deploy \
  -H "Authorization: Bearer webhook_xyz789..." \
  -d '{"status": "deployed"}'

Token Scopes and Permissions

Scope	Permissions	Use Case
`admin`	Full access to all resources	System administrators, web UI
`user`	RBAC-based permissions	CLI tools, user sessions
`sensor`	Create events, read rules/triggers	Sensor daemons
`action_execution`	Read keys, update execution (scoped to execution_id)	Action scripts
`webhook`	Create events, trigger actions	External integrations
`readonly`	Read-only access to all resources	Monitoring, auditing

Database Schema

Identity Table

Service accounts are stored in the identity table with identity_type = 'service_account':

CREATE TABLE identity (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL UNIQUE,
    identity_type identity_type NOT NULL,  -- 'user' or 'service_account'
    email VARCHAR(255),  -- NULL for service accounts
    password_hash VARCHAR(255),  -- NULL for service accounts
    metadata JSONB DEFAULT '{}',
    created TIMESTAMPTZ DEFAULT NOW(),
    updated TIMESTAMPTZ DEFAULT NOW()
);

Service account metadata includes:

{
  "scope": "sensor",
  "description": "Timer sensor service account",
  "created_by": 1,  // identity_id of creator
  "expires_at": "2025-04-27T12:34:56Z",
  "trigger_types": ["core.timer"],  // For sensor scope
  "execution_id": 123  // For action_execution scope
}

Token Storage

Tokens are not stored in the database (they are stateless JWTs). However, revocation is tracked:

CREATE TABLE token_revocation (
    id BIGSERIAL PRIMARY KEY,
    identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
    token_jti VARCHAR(255) NOT NULL,  -- JWT ID (jti claim)
    token_exp TIMESTAMPTZ NOT NULL,   -- Token expiration (from exp claim)
    revoked_at TIMESTAMPTZ DEFAULT NOW(),
    revoked_by BIGINT REFERENCES identity(id),
    reason VARCHAR(500),
    UNIQUE(token_jti)
);

CREATE INDEX idx_token_revocation_jti ON token_revocation(token_jti);
CREATE INDEX idx_token_revocation_identity ON token_revocation(identity_id);
CREATE INDEX idx_token_revocation_exp ON token_revocation(token_exp);  -- For cleanup queries

JWT Token Format

Claims

All service account tokens include these claims:

{
  "sub": "sensor:core.timer",  // Subject: "type:name"
  "jti": "abc123...",  // JWT ID (for revocation)
  "iat": 1706356496,  // Issued at (Unix timestamp)
  "exp": 1714132496,  // Expires at (Unix timestamp)
  "identity_id": 123,
  "identity_type": "service_account",
  "scope": "sensor",
  "metadata": {
    "trigger_types": ["core.timer"]
  }
}

Scope-Specific Claims

Sensor tokens (restricted to declared trigger types):

{
  "scope": "sensor",
  "metadata": {
    "trigger_types": ["core.timer", "core.interval"]
  }
}

The API enforces that sensors can only create events for trigger types listed in metadata.trigger_types. Attempting to create an event for an unauthorized trigger type will result in a 403 Forbidden error.

Action execution tokens:

{
  "scope": "action_execution",
  "metadata": {
    "execution_id": 456,
    "action_ref": "core.echo",
    "workflow_id": 789  // Optional, if part of workflow
  }
}

Webhook tokens:

{
  "scope": "webhook",
  "metadata": {
    "allowed_paths": ["/webhooks/deploy", "/webhooks/alert"],
    "ip_whitelist": ["203.0.113.0/24"]  // Optional
  }
}

API Endpoints

Create Service Account

Admin only

POST /service-accounts
Authorization: Bearer {admin_token}
Content-Type: application/json

{
  "name": "sensor:core.timer",
  "scope": "sensor",
  "description": "Timer sensor service account",
  "ttl_days": 90,  // Sensor tokens: 90 days, auto-refresh before expiration
  "metadata": {
    "trigger_types": ["core.timer"]
  }
}

Response:

{
  "identity_id": 123,
  "name": "sensor:core.timer",
  "scope": "sensor",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2025-04-27T12:34:56Z"  // 90 days from now
}

Important: The token is only shown once. Store it securely.

List Service Accounts

Admin only

GET /service-accounts
Authorization: Bearer {admin_token}

Response:

{
  "data": [
    {
      "identity_id": 123,
      "name": "sensor:core.timer",
      "scope": "sensor",
      "created_at": "2025-01-27T12:34:56Z",
      "expires_at": "2025-04-27T12:34:56Z",
      "metadata": {
        "trigger_types": ["core.timer"]
      }
    }
  ]
}

Refresh Token (Self-Service)

Sensor/User tokens can refresh themselves

POST /auth/refresh
Authorization: Bearer {current_token}
Content-Type: application/json

{}

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2025-04-27T12:34:56Z"
}

Notes:

Current token must be valid (not expired, not revoked)
New token has same scope and metadata as current token
New token has same TTL as original token type (e.g., 90 days for sensors)
Old token remains valid until its original expiration (allows zero-downtime refresh)
Only sensor and user scopes can refresh (not action_execution or webhook)

Revoke Service Account Token

Admin only

DELETE /service-accounts/{identity_id}
Authorization: Bearer {admin_token}
Content-Type: application/json

{
  "reason": "Token compromised"
}

Response:

{
  "message": "Service account revoked",
  "identity_id": 123
}

Create Execution Token (Internal)

Called by executor service, not exposed in API

// In executor service
let execution_timeout_minutes = get_action_timeout(action_ref); // e.g., 30 minutes
let token = create_execution_token(
    execution_id,
    action_ref,
    ttl_minutes: execution_timeout_minutes
)?;

This token is passed to the worker service, which injects it into the action's environment.

Token Creation Workflow

1. Sensor Token Creation

Admin → POST /service-accounts (scope=sensor) → API
API → Create identity record → Database
API → Generate JWT with sensor scope → Response
Admin → Store token in secure config → Sensor deployment
Sensor → Use token for API calls → Event emission

2. Execution Token Creation

Rule fires → Executor creates enforcement → Executor
Executor → Schedule execution → Database
Executor → Create execution token (internal) → JWT library
Executor → Send execution request to worker → RabbitMQ
Worker → Receive message with token → Action runner
Action → Use token to fetch keys → API
Execution completes → Token expires (TTL) → Automatic cleanup

Token Validation

Middleware (API Service)

// In API service
pub async fn validate_token(
    token: &str,
    required_scope: Option<&str>
) -> Result<Claims> {
    // 1. Verify JWT signature
    let claims = decode_jwt(token)?;
    
    // 2. Check expiration (JWT library handles this, but explicit check for clarity)
    if claims.exp < now() {
        return Err(Error::TokenExpired);
    }
    
    // 3. Check revocation (only check non-expired tokens)
    if is_revoked(&claims.jti, claims.exp).await? {
        return Err(Error::TokenRevoked);
    }
    
    // 4. Check scope
    if let Some(scope) = required_scope {
        if claims.scope != scope {
            return Err(Error::InsufficientPermissions);
        }
    }
    
    Ok(claims)
}

Scope-Based Authorization

// Execution-scoped token can only access its own execution
if claims.scope == "action_execution" {
    let allowed_execution_id = claims.metadata
        .get("execution_id")
        .and_then(|v| v.as_i64())
        .ok_or(Error::InvalidToken)?;
    
    if execution_id != allowed_execution_id {
        return Err(Error::InsufficientPermissions);
    }
}

// Sensor-scoped token can only create events for declared trigger types
if claims.scope == "sensor" {
    let allowed_trigger_types = claims.metadata
        .get("trigger_types")
        .and_then(|v| v.as_array())
        .ok_or(Error::InvalidToken)?;
    
    let allowed_types: Vec<String> = allowed_trigger_types
        .iter()
        .filter_map(|v| v.as_str().map(String::from))
        .collect();
    
    if !allowed_types.contains(&trigger_type) {
        return Err(Error::InsufficientPermissions);
    }
}

Security Best Practices

Token Generation

Generation:

Use Strong Secrets: JWT signing key must be 256+ bits, randomly generated
Include JTI: Always include jti claim for revocation support
REQUIRED Expiration: All tokens MUST have exp claim - no exceptions
- Sensor tokens: 90 days (auto-refresh before expiration)
- Action execution tokens: Match execution timeout (5-60 minutes)
- User CLI tokens: 7-30 days (auto-refresh before expiration)
- Webhook tokens: 90-365 days (manual rotation)
Minimal Scope: Grant least privilege necessary
Restrict Trigger Types: For sensor tokens, only include necessary trigger types in metadata

Token Storage

Environment Variables: Preferred method for sensors and actions
Never Log: Redact tokens from logs (show only last 4 chars)
Never Commit: Don't commit tokens to version control
Secure Config: Store in encrypted config management (Vault, k8s secrets)

Token Transmission

HTTPS Only: Never send tokens over unencrypted connections
Authorization Header: Use Authorization: Bearer {token} header
No Query Params: Don't pass tokens in URL query parameters
No Cookies: For service accounts, avoid cookie-based auth

Token Revocation

Immediate Revocation: Check revocation list on every request
Audit Trail: Log who revoked, when, and why
Cascade Delete: Revoke all tokens when service account is deleted
Automatic Cleanup: Delete revocation records for expired tokens (run hourly)
- Query: DELETE FROM token_revocation WHERE token_exp < NOW()
- Prevents indefinite table bloat
- Expired tokens are already invalid, no need to track revocation
Validate Permissions: Enforce trigger type restrictions for sensor tokens on event creation

Implementation Checklist

Add identity_type enum to database schema
Add token_revocation table (with token_exp column)
Create POST /service-accounts endpoint
Create GET /service-accounts endpoint
Create DELETE /service-accounts/{id} endpoint
Create POST /auth/refresh endpoint (for automatic token refresh)
Add scope validation middleware
Add token revocation check middleware (skip check for expired tokens)
Implement execution token creation in executor (TTL = action timeout)
Pass execution token to worker via RabbitMQ
Inject execution token into action environment
Add CLI commands: attune service-account create/list/revoke
Document token creation for sensor deployment
Implement automatic token refresh in sensors (refresh at 80% of TTL)
Implement cleanup job for expired token revocations (hourly cron)

Migration Path

Phase 1: Database Schema

-- Add identity_type enum if not exists
DO $$ BEGIN
    CREATE TYPE identity_type AS ENUM ('user', 'service_account');
EXCEPTION
    WHEN duplicate_object THEN null;
END $$;

-- Add identity_type column to identity table
ALTER TABLE identity 
    ADD COLUMN IF NOT EXISTS identity_type identity_type DEFAULT 'user';

-- Create token_revocation table
CREATE TABLE IF NOT EXISTS token_revocation (
    id BIGSERIAL PRIMARY KEY,
    identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
    token_jti VARCHAR(255) NOT NULL,
    token_exp TIMESTAMPTZ NOT NULL,  -- For cleanup queries
    revoked_at TIMESTAMPTZ DEFAULT NOW(),
    revoked_by BIGINT REFERENCES identity(id),
    reason VARCHAR(500),
    UNIQUE(token_jti)
);

CREATE INDEX IF NOT EXISTS idx_token_revocation_jti ON token_revocation(token_jti);
CREATE INDEX IF NOT EXISTS idx_token_revocation_exp ON token_revocation(token_exp);

Phase 2: API Implementation

Add service account repository
Add JWT utilities for scope-based tokens
Implement service account CRUD endpoints
Add middleware for token validation and revocation

Phase 3: Integration

Update executor to create execution tokens
Update worker to receive and use execution tokens
Update sensor to accept and use sensor tokens
Update CLI to support service account management

Examples

Python Action Using Execution Token

#!/usr/bin/env python3
import os
import requests
import sys

# Token is injected by worker
api_url = os.environ['ATTUNE_API_URL']
api_token = os.environ['ATTUNE_API_TOKEN']
execution_id = os.environ['ATTUNE_EXECUTION_ID']

# Fetch encrypted secret
response = requests.get(
    f"{api_url}/keys/myapp.database_password",
    headers={"Authorization": f"Bearer {api_token}"}
)

if response.status_code != 200:
    print(f"Failed to fetch key: {response.text}", file=sys.stderr)
    sys.exit(1)

db_password = response.json()['value']

# Use the secret...
print("Successfully connected to database")

Sensor Using Sensor Token

// In sensor initialization
let api_token = env::var("ATTUNE_API_TOKEN")?;
let api_url = env::var("ATTUNE_API_URL")?;

let client = reqwest::Client::new();

// Fetch active rules
let response = client
    .get(format!("{}/rules?trigger_type=core.timer", api_url))
    .header("Authorization", format!("Bearer {}", api_token))
    .send()
    .await?;

let rules: Vec<Rule> = response.json().await?;

Token Lifecycle Management

Expiration Strategy

All tokens MUST expire to prevent indefinite revocation table bloat and reduce attack surface:

Token Type	Expiration	Rationale
Sensor	90 days	Perpetually running service, auto-refresh before expiration
Action Execution	5-60 minutes	Matches action timeout, auto-cleanup on completion
User CLI	7-30 days	Balance between convenience and security, auto-refresh
Webhook	90-365 days	External integration, manual rotation required

Revocation Table Cleanup

Cleanup job runs hourly to prevent table bloat:

-- Delete revocation records for expired tokens
DELETE FROM token_revocation 
WHERE token_exp < NOW();

Why this works:

Expired tokens are already invalid (enforced by JWT exp claim)
No need to track revocation status for invalid tokens
Keeps revocation table small and queries fast
Typical size: <1000 rows instead of millions

Sensor Token Refresh

Sensors automatically refresh their own tokens without human intervention:

Automatic Process:

Sensor starts with 90-day token
Background task monitors token expiration
When 80% of TTL elapsed (72 days), sensor requests new token via POST /auth/refresh
New token is hot-loaded without restart
Old token remains valid until original expiration
Process repeats indefinitely

Refresh Timing Example:

Token issued: Day 0, expires Day 90
Refresh trigger: Day 72 (80% of 90 days)
New token issued: Day 72, expires Day 162
Old token still valid: Day 72-90 (overlap period)
Next refresh: Day 144 (80% of new token)

Zero-Downtime:

No service interruption during refresh
Old token valid during transition
Graceful fallback on refresh failure

Cleanup Job Implementation

Purpose

Prevent indefinite growth of the token_revocation table by removing revocation records for expired tokens.

Why Cleanup Is Safe

Expired tokens are already invalid (enforced by JWT exp claim)
Token validation checks expiration before checking revocation
No security risk in deleting expired token revocations
Significantly reduces table size and improves query performance

Implementation

Frequency: Hourly cron job or background task

SQL Query:

DELETE FROM token_revocation 
WHERE token_exp < NOW();

Expected Impact:

Typical table size: <1,000 rows instead of millions over time
Fast revocation checks (indexed queries on small dataset)
Reduced storage and backup costs

Rust Implementation Example

use tokio::time::{interval, Duration};

/// Background task to clean up expired token revocations
pub async fn start_revocation_cleanup_task(db: PgPool) {
    let mut interval = interval(Duration::from_secs(3600)); // Every hour
    
    loop {
        interval.tick().await;
        
        match cleanup_expired_revocations(&db).await {
            Ok(count) => {
                info!("Cleaned up {} expired token revocations", count);
            }
            Err(e) => {
                error!("Failed to clean up expired token revocations: {}", e);
            }
        }
    }
}

/// Delete token revocation records for expired tokens
async fn cleanup_expired_revocations(db: &PgPool) -> Result<u64> {
    let result = sqlx::query!(
        "DELETE FROM token_revocation WHERE token_exp < NOW()"
    )
    .execute(db)
    .await?;
    
    Ok(result.rows_affected())
}

Monitoring

Track cleanup job metrics:

Number of records deleted per run
Job execution time
Job failures (alert if consecutive failures)

Prometheus Metrics Example:

// Define metrics
lazy_static! {
    static ref REVOCATION_CLEANUP_COUNT: IntCounter = register_int_counter!(
        "attune_revocation_cleanup_total",
        "Total number of expired token revocations cleaned up"
    ).unwrap();
    
    static ref REVOCATION_CLEANUP_DURATION: Histogram = register_histogram!(
        "attune_revocation_cleanup_duration_seconds",
        "Duration of token revocation cleanup job"
    ).unwrap();
}

// In cleanup function
let timer = REVOCATION_CLEANUP_DURATION.start_timer();
let count = cleanup_expired_revocations(&db).await?;
REVOCATION_CLEANUP_COUNT.inc_by(count);
timer.observe_duration();

Alternative: Database Trigger

For automatic cleanup without application code:

-- Create function to delete old revocations
CREATE OR REPLACE FUNCTION cleanup_expired_token_revocations()
RETURNS trigger AS $$
BEGIN
    DELETE FROM token_revocation WHERE token_exp < NOW() - INTERVAL '1 hour';
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Trigger on insert (cleanup when new revocations are added)
CREATE TRIGGER trigger_cleanup_expired_revocations
    AFTER INSERT ON token_revocation
    EXECUTE FUNCTION cleanup_expired_token_revocations();

Note: Application-level cleanup is preferred for better observability and control.

Future Enhancements

Rate Limiting: Per-token rate limits to prevent abuse
Audit Logging: Comprehensive audit trail of token usage and refresh events
OAuth 2.0: Support OAuth 2.0 client credentials flow
mTLS: Mutual TLS authentication for high-security deployments
Token Introspection: RFC 7662-compliant token introspection endpoint
Scope Hierarchies: More granular permission scopes
IP Whitelisting: Restrict token usage to specific IP ranges
Configurable Refresh Timing: Allow custom refresh thresholds per token type
Token Lineage Tracking: Track token refresh chains for security audits
Refresh Failure Alerts: Notify operators when automatic refresh fails
Token Lineage Tracking: Track token refresh chains for audit purposes

22 KiB Raw Blame History

Service Accounts and Transient API Tokens

Overview

Use Cases

Token Types

1. Sensor Tokens

2. Action Execution Tokens

3. User CLI Tokens

4. Webhook Tokens

Token Scopes and Permissions

Database Schema

Identity Table

Token Storage

JWT Token Format

Claims

Scope-Specific Claims

API Endpoints

Create Service Account

List Service Accounts

Refresh Token (Self-Service)

Revoke Service Account Token

Create Execution Token (Internal)

Token Creation Workflow

1. Sensor Token Creation

2. Execution Token Creation

Token Validation

Middleware (API Service)

Scope-Based Authorization

Security Best Practices

Token Generation

Token Storage

Token Transmission

Token Revocation

Implementation Checklist

Migration Path

Phase 1: Database Schema

Phase 2: API Implementation

Phase 3: Integration

Examples

Python Action Using Execution Token

Sensor Using Sensor Token

Token Lifecycle Management

Expiration Strategy

Revocation Table Cleanup

Sensor Token Refresh

Cleanup Job Implementation

Purpose

Why Cleanup Is Safe

Implementation

Rust Implementation Example

Monitoring

Alternative: Database Trigger

Future Enhancements

22 KiB

Raw Blame History