re-uploading work

This commit is contained in:
2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions

View File

@@ -0,0 +1,206 @@
# Authentication Quick Reference
## Environment Variables
```bash
JWT_SECRET=your-secret-key-here # Required in production!
JWT_ACCESS_EXPIRATION=3600 # Optional (1 hour default)
JWT_REFRESH_EXPIRATION=604800 # Optional (7 days default)
```
## Endpoints
### Register New User
```http
POST /auth/register
Content-Type: application/json
{
"login": "username",
"password": "securepass123",
"display_name": "Full Name" // optional
}
```
### Login
```http
POST /auth/login
Content-Type: application/json
{
"login": "username",
"password": "securepass123"
}
```
### Refresh Token
```http
POST /auth/refresh
Content-Type: application/json
{
"refresh_token": "eyJhbGc..."
}
```
### Get Current User (Protected)
```http
GET /auth/me
Authorization: Bearer <access_token>
```
### Change Password (Protected)
```http
POST /auth/change-password
Authorization: Bearer <access_token>
Content-Type: application/json
{
"current_password": "oldpass123",
"new_password": "newpass456"
}
```
## Response Format
### Success (Register/Login/Refresh)
```json
{
"data": {
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"refresh_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "Bearer",
"expires_in": 3600
}
}
```
### Success (Get Current User)
```json
{
"data": {
"id": 1,
"login": "username",
"display_name": "Full Name"
}
}
```
### Error
```json
{
"error": "Invalid login or password",
"code": "UNAUTHORIZED"
}
```
## HTTP Status Codes
- `200 OK` - Success
- `400 Bad Request` - Invalid request format
- `401 Unauthorized` - Missing/invalid/expired token or bad credentials
- `403 Forbidden` - Insufficient permissions
- `409 Conflict` - Username already exists
- `422 Unprocessable Entity` - Validation error
## Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| Missing authentication token | No Authorization header | Add `Authorization: Bearer <token>` |
| Invalid authentication token | Malformed or wrong secret | Verify token format and JWT_SECRET |
| Authentication token expired | Access token expired | Use refresh token to get new one |
| Invalid login or password | Wrong credentials | Check username and password |
| Username already exists | Duplicate registration | Use different username |
| Validation failed | Password too short, etc. | Check validation requirements |
## Validation Rules
- **Login:** 3-255 characters
- **Password:** 8-128 characters
- **Display Name:** 0-255 characters (optional)
## cURL Examples
```bash
# Register
curl -X POST http://localhost:8080/auth/register \
-H "Content-Type: application/json" \
-d '{"login":"alice","password":"secure123","display_name":"Alice"}'
# Login
curl -X POST http://localhost:8080/auth/login \
-H "Content-Type: application/json" \
-d '{"login":"alice","password":"secure123"}'
# Get Current User (replace TOKEN)
curl http://localhost:8080/auth/me \
-H "Authorization: Bearer TOKEN"
# Change Password
curl -X POST http://localhost:8080/auth/change-password \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"current_password":"secure123","new_password":"newsecure456"}'
# Refresh Token
curl -X POST http://localhost:8080/auth/refresh \
-H "Content-Type: application/json" \
-d '{"refresh_token":"REFRESH_TOKEN"}'
```
## Using in Route Handlers
```rust
use crate::auth::middleware::RequireAuth;
async fn protected_handler(
RequireAuth(user): RequireAuth,
) -> Result<Json<ApiResponse<Data>>, ApiError> {
let identity_id = user.identity_id()?;
let login = user.login();
// Your handler logic
Ok(Json(ApiResponse::new(data)))
}
```
## Security Checklist
- [ ] Use HTTPS in production
- [ ] Set strong JWT_SECRET (256+ bits)
- [ ] Store tokens securely on client
- [ ] Implement rate limiting
- [ ] Never log tokens
- [ ] Rotate secrets periodically
- [ ] Clear tokens on logout
## Token Lifecycle
1. **Register/Login** → Receive access + refresh tokens
2. **API Call** → Use access token in Authorization header
3. **Token Expires** → Use refresh token to get new access token
4. **Refresh Expires** → User must login again
## Troubleshooting
**Server won't start?**
- Check DATABASE_URL is set
- Verify database is running
- Run migrations: `sqlx migrate run`
**Auth fails with valid credentials?**
- Check password hash in database
- Verify JWT_SECRET matches
- Check token expiration
**Debug logging:**
```bash
RUST_LOG=attune_api=debug cargo run --bin attune-api
```
## Documentation
- Full docs: `docs/authentication.md`
- Testing guide: `docs/testing-authentication.md`
- Implementation: `crates/api/src/routes/auth.rs`

View File

@@ -0,0 +1,381 @@
# Authentication & Authorization
## Overview
Attune uses JWT (JSON Web Token) based authentication for securing API endpoints. The authentication system supports user registration, login, token refresh, and password management.
## Architecture
### Components
1. **JWT Tokens**
- **Access Tokens**: Short-lived tokens (default: 1 hour) used for API authentication
- **Refresh Tokens**: Long-lived tokens (default: 7 days) used to obtain new access tokens
2. **Password Security**
- Passwords are hashed using **Argon2id** (industry-standard, memory-hard algorithm)
- Password hashes are stored in the `attributes` JSONB field of the `identity` table
- Minimum password length: 8 characters
3. **Middleware**
- `require_auth`: Middleware function that validates JWT tokens on protected routes
- `RequireAuth`: Extractor for accessing authenticated user information in handlers
## Configuration
Authentication is configured via environment variables:
```bash
# JWT Secret Key (REQUIRED in production)
JWT_SECRET=your-secret-key-here
# Token Expiration (in seconds)
JWT_ACCESS_EXPIRATION=3600 # 1 hour (default)
JWT_REFRESH_EXPIRATION=604800 # 7 days (default)
```
**Security Warning**: Always set a strong, random `JWT_SECRET` in production. The default value is insecure and should only be used for development.
## API Endpoints
### Public Endpoints (No Authentication Required)
#### Register a New User
```http
POST /auth/register
Content-Type: application/json
{
"login": "username",
"password": "securepassword123",
"display_name": "John Doe" // optional
}
```
**Response:**
```json
{
"data": {
"access_token": "eyJhbGc...",
"refresh_token": "eyJhbGc...",
"token_type": "Bearer",
"expires_in": 3600
}
}
```
#### Login
```http
POST /auth/login
Content-Type: application/json
{
"login": "username",
"password": "securepassword123"
}
```
**Response:**
```json
{
"data": {
"access_token": "eyJhbGc...",
"refresh_token": "eyJhbGc...",
"token_type": "Bearer",
"expires_in": 3600
}
}
```
#### Refresh Access Token
```http
POST /auth/refresh
Content-Type: application/json
{
"refresh_token": "eyJhbGc..."
}
```
**Response:**
```json
{
"data": {
"access_token": "eyJhbGc...",
"refresh_token": "eyJhbGc...",
"token_type": "Bearer",
"expires_in": 3600
}
}
```
### Protected Endpoints (Authentication Required)
All protected endpoints require an `Authorization` header with a valid access token:
```http
Authorization: Bearer <access_token>
```
#### Get Current User
```http
GET /auth/me
Authorization: Bearer eyJhbGc...
```
**Response:**
```json
{
"data": {
"id": 1,
"login": "username",
"display_name": "John Doe"
}
}
```
#### Change Password
```http
POST /auth/change-password
Authorization: Bearer eyJhbGc...
Content-Type: application/json
{
"current_password": "oldpassword123",
"new_password": "newpassword456"
}
```
**Response:**
```json
{
"data": {
"success": true,
"message": "Password changed successfully"
}
}
```
## Error Responses
Authentication errors return appropriate HTTP status codes:
- **400 Bad Request**: Invalid request format or validation errors
- **401 Unauthorized**: Missing, invalid, or expired token; invalid credentials
- **403 Forbidden**: Insufficient permissions (future RBAC implementation)
- **409 Conflict**: Username already exists during registration
Example error response:
```json
{
"error": "Invalid authentication token",
"code": "UNAUTHORIZED"
}
```
## Usage in Route Handlers
### Protecting Routes
Add the authentication middleware to routes that require authentication:
```rust
use crate::auth::middleware::RequireAuth;
async fn protected_handler(
RequireAuth(user): RequireAuth,
) -> Result<Json<ApiResponse<MyData>>, ApiError> {
let identity_id = user.identity_id()?;
let login = user.login();
// Your handler logic here
Ok(Json(ApiResponse::new(data)))
}
```
### Accessing User Information
The `RequireAuth` extractor provides access to the authenticated user's claims:
```rust
pub struct AuthenticatedUser {
pub claims: Claims,
}
impl AuthenticatedUser {
pub fn identity_id(&self) -> Result<i64, ParseIntError>
pub fn login(&self) -> &str
}
```
## Database Schema
### Identity Table
The `identity` table stores user authentication information:
```sql
CREATE TABLE attune.identity (
id BIGSERIAL PRIMARY KEY,
login TEXT NOT NULL UNIQUE,
display_name TEXT,
attributes JSONB NOT NULL DEFAULT '{}'::jsonb,
password_hash TEXT, -- Added in migration 20240102000001
created TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```
**Note**: The `password_hash` column is optional to support:
- External authentication providers (OAuth, SAML, etc.)
- Service accounts that don't use password authentication
- API key-based authentication (future implementation)
## Security Best Practices
1. **JWT Secret**
- Use a strong, random secret (minimum 256 bits)
- Never commit secrets to version control
- Rotate secrets periodically in production
2. **Token Storage (Client-Side)**
- Store tokens securely (e.g., httpOnly cookies or secure storage)
- Never expose tokens in URLs or localStorage (if using web clients)
- Clear tokens on logout
3. **Password Requirements**
- Minimum 8 characters (enforced by validation)
- Consider implementing additional requirements (uppercase, numbers, symbols)
- Implement rate limiting on login attempts (future enhancement)
4. **HTTPS**
- Always use HTTPS in production to protect tokens in transit
- Configure proper TLS/SSL certificates
5. **Token Expiration**
- Keep access tokens short-lived (1 hour recommended)
- Use refresh tokens for long-lived sessions
- Implement token revocation for logout (future enhancement)
## Future Enhancements
### Planned Features
1. **Role-Based Access Control (RBAC)**
- Permission set assignments
- Fine-grained authorization middleware
- Resource-level permissions
2. **Multi-Factor Authentication (MFA)**
- TOTP support
- SMS/Email verification codes
3. **OAuth/OIDC Integration**
- Support for external identity providers
- Single Sign-On (SSO)
4. **Token Revocation**
- Blacklist/whitelist mechanisms
- Force logout functionality
5. **Account Security**
- Password reset via email
- Account lockout after failed attempts
- Security audit logs
6. **API Keys**
- Service-to-service authentication
- Scoped API keys for automation
## Testing
### Manual Testing with cURL
```bash
# Register a new user
curl -X POST http://localhost:8080/auth/register \
-H "Content-Type: application/json" \
-d '{
"login": "testuser",
"password": "testpass123",
"display_name": "Test User"
}'
# Login
curl -X POST http://localhost:8080/auth/login \
-H "Content-Type: application/json" \
-d '{
"login": "testuser",
"password": "testpass123"
}'
# Get current user (replace TOKEN with actual access token)
curl http://localhost:8080/auth/me \
-H "Authorization: Bearer TOKEN"
# Change password
curl -X POST http://localhost:8080/auth/change-password \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{
"current_password": "testpass123",
"new_password": "newpass456"
}'
# Refresh token
curl -X POST http://localhost:8080/auth/refresh \
-H "Content-Type: application/json" \
-d '{
"refresh_token": "REFRESH_TOKEN"
}'
```
### Unit Tests
Password hashing and JWT utilities include comprehensive unit tests:
```bash
# Run auth-related tests
cargo test --package attune-api password
cargo test --package attune-api jwt
cargo test --package attune-api middleware
```
## Troubleshooting
### Common Issues
1. **"Missing authentication token"**
- Ensure you're including the `Authorization` header
- Verify the header format: `Bearer <token>`
2. **"Authentication token expired"**
- Use the refresh token endpoint to get a new access token
- Check token expiration configuration
3. **"Invalid login or password"**
- Verify credentials are correct
- Check if the identity has a password set (some accounts may use external auth)
4. **"JWT_SECRET not set" warning**
- Set the `JWT_SECRET` environment variable before starting the server
- Use a strong, random value in production
### Debug Logging
Enable debug logging to troubleshoot authentication issues:
```bash
RUST_LOG=attune_api=debug cargo run --bin attune-api
```
## References
- [RFC 7519: JSON Web Token (JWT)](https://datatracker.ietf.org/doc/html/rfc7519)
- [Argon2 Password Hashing](https://en.wikipedia.org/wiki/Argon2)
- [OWASP Authentication Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html)

View File

@@ -0,0 +1,367 @@
# Secrets Management in Attune Worker Service
## Overview
The Attune Worker Service includes a robust secrets management system that securely stores, retrieves, and injects secrets into action execution environments. Secrets are encrypted at rest in the database and decrypted on-demand during execution.
## Architecture
### Components
1. **SecretManager** (`crates/worker/src/secrets.rs`)
- Core component responsible for secret operations
- Handles fetching, decryption, and environment variable preparation
- Integrated into `ActionExecutor` for seamless secret injection
2. **Database Storage** (`attune.key` table)
- Stores secrets with ownership scoping (system, pack, action, sensor, identity)
- Supports both encrypted and plaintext values
- Tracks encryption key hash for validation
3. **Encryption System**
- Uses **AES-256-GCM** for authenticated encryption
- Derives encryption key from configured password using SHA-256
- Generates random nonces for each encryption operation
## Secret Ownership Hierarchy
Secrets are organized in a hierarchical ownership model with increasing specificity:
### 1. System-Level Secrets
- **Owner Type**: `system`
- **Scope**: Available to all actions across all packs
- **Use Case**: Global configuration (API endpoints, common credentials)
### 2. Pack-Level Secrets
- **Owner Type**: `pack`
- **Scope**: Available to all actions within a specific pack
- **Use Case**: Pack-specific credentials, service endpoints
### 3. Action-Level Secrets
- **Owner Type**: `action`
- **Scope**: Available only to a specific action
- **Use Case**: Action-specific credentials, sensitive parameters
### Override Behavior
When an action is executed, secrets are fetched in the following order:
1. System secrets
2. Pack secrets (override system secrets with same name)
3. Action secrets (override pack/system secrets with same name)
This allows for flexible secret management where more specific secrets override less specific ones.
## Encryption Format
### Encrypted Value Format
```
nonce:ciphertext
```
Both components are Base64-encoded:
- **Nonce**: 12-byte random value (96 bits) for AES-GCM
- **Ciphertext**: Encrypted payload with authentication tag
Example:
```
Xk3mP9qRsT6uVwYz:SGVsbG8gV29ybGQhIFRoaXMgaXMgYW4gZW5jcnlwdGVkIG1lc3NhZ2U=
```
### Encryption Key Derivation
The encryption key is derived from the configured password using SHA-256:
```
encryption_key = SHA256(password)
```
This produces a 32-byte (256-bit) key suitable for AES-256.
### Key Hash Validation
Each encrypted secret can optionally store the hash of the encryption key used to encrypt it:
```
key_hash = SHA256(encryption_key)
```
This allows validation that the correct key is being used for decryption.
## Configuration
### Security Configuration
Add to your `config.yaml`:
```yaml
security:
# Encryption key for secrets (REQUIRED for encrypted secrets)
encryption_key: "your-secret-encryption-password-here"
# Or use environment variable
# ATTUNE__SECURITY__ENCRYPTION_KEY=your-secret-encryption-password-here
```
⚠️ **Important Security Notes:**
- The encryption key should be a strong, random password (minimum 32 characters recommended)
- Store the encryption key securely (e.g., using a secrets manager, not in version control)
- If the encryption key is lost, encrypted secrets cannot be recovered
- Changing the encryption key requires re-encrypting all secrets
### Environment Variables
Override configuration via environment variables:
```bash
export ATTUNE__SECURITY__ENCRYPTION_KEY="your-encryption-key"
```
## Usage Examples
### Storing Secrets (via API)
#### System-Level Secret
```bash
curl -X POST http://localhost:8080/api/v1/keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"ref": "system.api_endpoint",
"owner_type": "system",
"name": "api_endpoint",
"value": "https://api.example.com",
"encrypted": false
}'
```
#### Pack-Level Secret (Encrypted)
```bash
curl -X POST http://localhost:8080/api/v1/keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"ref": "mypack.api_key",
"owner_type": "pack",
"owner_pack": 1,
"name": "api_key",
"value": "sk_live_abc123def456",
"encrypted": true
}'
```
#### Action-Level Secret
```bash
curl -X POST http://localhost:8080/api/v1/keys \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"ref": "mypack.myaction.oauth_token",
"owner_type": "action",
"owner_action": 42,
"name": "oauth_token",
"value": "ya29.a0AfH6SMBx...",
"encrypted": true
}'
```
### Accessing Secrets in Actions
Secrets are automatically injected as environment variables during execution. The secret name is converted to uppercase and prefixed with `SECRET_`.
#### Python Action Example
```python
#!/usr/bin/env python3
import os
# Access secrets via environment variables
api_key = os.environ.get('SECRET_API_KEY')
db_password = os.environ.get('SECRET_DB_PASSWORD')
oauth_token = os.environ.get('SECRET_OAUTH_TOKEN')
if not api_key:
print("Error: SECRET_API_KEY not found")
exit(1)
# Use the secrets
print(f"Connecting to API with key: {api_key[:8]}...")
```
#### Shell Action Example
```bash
#!/bin/bash
# Access secrets
echo "API Key: ${SECRET_API_KEY:0:8}..."
echo "Database: ${SECRET_DB_HOST}"
# Use in commands
curl -H "Authorization: Bearer $SECRET_API_TOKEN" \
https://api.example.com/data
```
### Environment Variable Naming Rules
Secret names are transformed as follows:
- Prefix: `SECRET_`
- Convert to uppercase
- Replace hyphens with underscores
Examples:
- `api_key``SECRET_API_KEY`
- `db-password``SECRET_DB_PASSWORD`
- `oauth_token``SECRET_OAUTH_TOKEN`
## Security Best Practices
### 1. Encryption Key Management
- **Generate Strong Keys**: Use at least 32 random characters
- **Secure Storage**: Store in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.)
- **Rotation**: Plan for key rotation (requires re-encrypting all secrets)
- **Backup**: Keep encrypted backup of the encryption key
### 2. Secret Storage
- **Always Encrypt Sensitive Data**: Use `encrypted: true` for passwords, tokens, API keys
- **Plaintext for Non-Sensitive**: Use `encrypted: false` for URLs, usernames, configuration
- **Least Privilege**: Use action-level secrets for the most sensitive data
### 3. Action Development
- **Never Log Secrets**: Avoid printing secret values in action output
- **Mask in Errors**: Don't include secrets in error messages
- **Clear After Use**: In long-running processes, clear secrets from memory when done
### 4. Access Control
- **RBAC**: Limit who can create/read secrets using Attune's RBAC system
- **Audit Logging**: Enable audit logging for secret access (future feature)
- **Regular Reviews**: Periodically review and rotate secrets
## Implementation Details
### Encryption Process
```rust
// 1. Derive encryption key from password
let key = SHA256(password);
// 2. Generate random nonce
let nonce = random_bytes(12);
// 3. Encrypt plaintext
let ciphertext = AES256GCM.encrypt(key, nonce, plaintext);
// 4. Format as "nonce:ciphertext" (base64-encoded)
let encrypted_value = format!("{}:{}",
base64(nonce),
base64(ciphertext)
);
```
### Decryption Process
```rust
// 1. Parse "nonce:ciphertext" format
let (nonce_b64, ciphertext_b64) = encrypted_value.split_once(':');
let nonce = base64_decode(nonce_b64);
let ciphertext = base64_decode(ciphertext_b64);
// 2. Validate encryption key hash (if present)
if key_hash != SHA256(encryption_key) {
return Error("Key mismatch");
}
// 3. Decrypt ciphertext
let plaintext = AES256GCM.decrypt(encryption_key, nonce, ciphertext);
```
### Secret Injection Flow
```
1. ActionExecutor prepares execution context
2. SecretManager fetches secrets for action
a. Query system-level secrets
b. Query pack-level secrets
c. Query action-level secrets
d. Merge with later overriding earlier
3. Decrypt encrypted secrets
4. Transform to environment variables
5. Inject into execution context
6. Action executes with secrets available
```
## Troubleshooting
### "No encryption key configured"
**Problem**: Worker service cannot decrypt secrets.
**Solution**: Set the encryption key in configuration:
```yaml
security:
encryption_key: "your-encryption-key-here"
```
### "Encryption key hash mismatch"
**Problem**: The encryption key used to decrypt doesn't match the key used to encrypt.
**Solution**:
- Verify you're using the correct encryption key
- Check if encryption key was recently changed
- May need to re-encrypt secrets with new key
### "Decryption failed"
**Problem**: Secret cannot be decrypted.
**Causes**:
- Wrong encryption key
- Corrupted encrypted value
- Invalid format
**Solution**:
- Verify encryption key is correct
- Check secret value format (should be "nonce:ciphertext")
- Try re-encrypting the secret
### Secrets Not Available in Action
**Problem**: Environment variables like `SECRET_API_KEY` are not set.
**Checklist**:
- Verify secret exists in database with correct owner type
- Check secret name matches expected format
- Ensure action's pack has access to the secret
- Check worker logs for "Failed to fetch secrets" warnings
## API Reference
### SecretManager Methods
#### `fetch_secrets_for_action(action: &Action) -> Result<HashMap<String, String>>`
Fetches all secrets relevant to an action (system + pack + action level).
#### `encrypt_value(plaintext: &str) -> Result<String>`
Encrypts a plaintext value using the configured encryption key.
#### `prepare_secret_env(secrets: &HashMap<String, String>) -> HashMap<String, String>`
Transforms secret names to environment variable format.
## Future Enhancements
### Planned Features
- [ ] Secret versioning and rollback
- [ ] Audit logging for secret access
- [ ] Integration with external secret managers (Vault, AWS Secrets Manager)
- [ ] Automatic secret rotation
- [ ] Secret expiration and TTL
- [ ] Multi-key encryption (key per pack/action)
- [ ] Secret templates and inheritance
### Under Consideration
- [ ] Dynamic secret generation
- [ ] Just-in-time secret provisioning
- [ ] Secret usage analytics
- [ ] Integration with certificate management
## References
- [AES-GCM Encryption](https://en.wikipedia.org/wiki/Galois/Counter_Mode)
- [NIST SP 800-38D](https://csrc.nist.gov/publications/detail/sp/800-38d/final) - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM)
- [Key Management Best Practices](https://www.owasp.org/index.php/Key_Management_Cheat_Sheet)

View File

@@ -0,0 +1,273 @@
# Security Review: StackStorm Pitfall Analysis
**Date:** 2024-01-02
**Classification:** CONFIDENTIAL - Security Review
**Status:** CRITICAL ISSUES IDENTIFIED - PRODUCTION BLOCKED
---
## Executive Summary
A comprehensive security and architecture review of the Attune platform has identified **2 critical vulnerabilities** that must be addressed before any production deployment. This review was conducted by analyzing lessons learned from StackStorm (a similar automation platform) and comparing against our current implementation.
### Critical Findings
🔴 **CRITICAL - PRODUCTION BLOCKER**
- **Secret Exposure Vulnerability (P0)**: User secrets are visible to any system user with shell access
- **Dependency Conflicts (P1)**: System upgrades can break existing user workflows
⚠️ **HIGH PRIORITY - v1.0 BLOCKER**
- **Resource Exhaustion Risk (P1)**: Unbounded log collection can crash worker processes
- **Limited Ecosystem Support (P2)**: No automated dependency management for user packs
**GOOD NEWS**
- 2 major pitfalls successfully avoided due to Rust implementation
- Issues caught in development phase, before production deployment
- Clear remediation path with detailed implementation plan
---
## Business Impact
### Immediate Impact (Next 4-6 Weeks)
- **Production deployment BLOCKED** until critical security fix completed
- **Timeline adjustment required**: +3-5 weeks to development schedule
- **Resource allocation needed**: 1-2 senior engineers for remediation work
### Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Secret theft by malicious insider | High | Critical | Fix P0 immediately |
| Customer workflow breaks on upgrade | High | High | Implement P1 before release |
| Worker crashes under load | Medium | High | Implement P1 before release |
| Limited pack ecosystem adoption | Medium | Medium | Address in v1.0 |
### Cost of Inaction
**If P0 (Secret Exposure) is not fixed:**
- Any user with server access can steal API keys, passwords, credentials
- Potential data breach with legal/compliance implications
- Loss of customer trust and reputation damage
- Regulatory violations (SOC 2, GDPR, etc.)
**If P1 (Dependency Conflicts) is not fixed:**
- Customer workflows break unexpectedly during system maintenance
- Increased support burden and customer frustration
- Competitive disadvantage vs. alternatives (Temporal, Prefect)
---
## Technical Summary
### P0: Secret Exposure Vulnerability
**Current State:**
```rust
// Secrets passed as environment variables - INSECURE!
cmd.env("SECRET_API_KEY", "my-secret-value"); // ← Visible to all users
```
**Attack Vector:**
Any user with SSH access can execute:
```bash
ps auxwwe | grep SECRET_ # Shows all secrets
cat /proc/{pid}/environ # Shows all environment variables
```
**Proposed Fix:**
Pass secrets via stdin as JSON instead of environment variables.
**Effort:** 3-5 days
**Priority:** P0 (BLOCKING ALL OTHER WORK)
---
### P1: Dependency Hell
**Current State:**
All user packs share system Python runtime. When we upgrade Python for security patches, user code may break.
**Business Scenario:**
1. Customer creates workflow using Python 3.9 libraries
2. We upgrade server to Python 3.11 for security patch
3. Customer's workflow breaks due to library incompatibilities
4. Customer blames our platform for unreliability
**Proposed Fix:**
Each pack gets isolated virtual environment with pinned dependencies.
**Effort:** 7-10 days
**Priority:** P1 (REQUIRED FOR v1.0)
---
## Remediation Plan
### Phase 1: Security Critical (Week 1-2)
**Fix secret passing vulnerability**
- Estimated effort: 3-5 days
- Priority: P0 - BLOCKS ALL OTHER WORK
- Deliverable: Secrets passed securely via stdin
- Verification: Security tests pass
### Phase 2: Dependency Isolation (Week 3-4)
**Implement per-pack virtual environments**
- Estimated effort: 7-10 days
- Priority: P1 - REQUIRED FOR v1.0
- Deliverable: Isolated Python environments per pack
- Verification: System upgrade doesn't break packs
### Phase 3: Operational Hardening (Week 5-6)
**Add log limits and language support**
- Estimated effort: 8-11 days
- Priority: P1-P2
- Deliverable: Worker stability improvements
- Verification: Worker handles large logs gracefully
**Total Timeline:** 3.5-5 weeks
---
## Resource Requirements
### Development Resources
- **Primary:** 1 senior Rust engineer (full-time, 5 weeks)
- **Secondary:** 1 senior engineer for code review (20% time)
- **Security:** External security consultant (1 week for audit)
- **Documentation:** Technical writer (part-time, 1 week)
### Infrastructure Resources
- Staging environment for security testing
- CI/CD pipeline updates for security checks
- Penetration testing tools
### Budget Impact
- **Engineering Time:** ~$50-70K (5 weeks × 2 engineers)
- **Security Audit:** ~$10-15K
- **Tools/Infrastructure:** ~$2-5K
- **Total Estimated Cost:** $62-90K
---
## Recommendations
### Immediate Actions (This Week)
1.**STOP** all production deployment plans
2. **Communicate** timeline changes to stakeholders
3. **Assign** engineering resources to remediation work
4. **Schedule** security audit for Phase 1 completion
### Development Process Changes
1. **Add security review** to design phase (before implementation)
2. **Require security tests** in CI/CD pipeline
3. **Mandate code review** for security-critical changes
4. **Schedule quarterly** security audits
### Go/No-Go Criteria for v1.0
- ✅ P0 (Secret Security) - MUST be fixed
- ✅ P1 (Dependency Isolation) - MUST be fixed
- ✅ P1 (Log Limits) - MUST be fixed
- ⚠️ P2 (Language Support) - SHOULD be fixed
- ✅ Security audit - MUST pass
- ✅ All security tests - MUST pass
---
## Comparison with Alternatives
### How We Compare to Competitors
**vs. StackStorm:**
- ✅ We identified and can fix these issues BEFORE production
- ✅ Rust provides memory safety and type safety they lack
- ⚠️ We risk repeating their mistakes if not careful
**vs. Temporal/Prefect:**
- ✅ Our architecture is sound - just needs hardening
- ⚠️ They have mature dependency isolation already
- ⚠️ They've invested heavily in security features
**Market Impact:**
Fixing these issues puts us on par with mature alternatives and positions Attune as a secure, enterprise-ready platform.
---
## Success Metrics
### Security Metrics (Post-Remediation)
- 0 secrets visible in process table
- 0 dependency conflicts between packs
- 0 worker OOM incidents due to logs
- 100% security test pass rate
### Business Metrics
- No security incidents in first 6 months
- <5% customer workflows broken by system upgrades
- 95%+ uptime for worker processes
- Positive security audit results
---
## Timeline
```
Week 1-2: Phase 1 - Security Critical (P0)
- Fix secret passing vulnerability
- Security testing and verification
Week 3-4: Phase 2 - Dependency Isolation (P1)
- Implement per-pack virtual environments
- Integration testing
Week 5-6: Phase 3 - Operational Hardening (P1-P2)
- Log size limits
- Language support improvements
- External security audit
Week 7: Final testing and v1.0 release candidate
```
---
## Stakeholder Communication
### For Engineering Leadership
- **Message:** Critical issues found, but fixable. Timeline +5 weeks.
- **Ask:** Approve resource allocation and budget for remediation
- **Next Steps:** Kickoff meeting to assign tasks and set milestones
### For Product Management
- **Message:** v1.0 delayed 5 weeks for critical security fixes
- **Impact:** Better to delay than launch with vulnerabilities
- **Benefit:** Enterprise-ready security features for market differentiation
### For Executive Team
- **Message:** Security review prevented potential data breach
- **Cost:** $62-90K and 5 weeks delay
- **ROI:** Avoid reputational damage, legal liability, customer churn
- **Decision Needed:** Approve timeline extension and budget increase
---
## Conclusion
This security review has identified critical issues that would have caused significant problems in production. The good news is we caught them early, have a clear remediation plan, and the Rust architecture has already prevented other common pitfalls.
**Recommended Decision:** Approve the 3.5-5 week remediation timeline and allocate necessary resources to fix critical security issues before v1.0 release.
**Risk of NOT fixing:** Potential security breach, customer data loss, regulatory violations, and reputational damage far exceed the cost of remediation.
**Next Steps:**
1. Review and approve remediation plan
2. Assign engineering resources
3. Communicate timeline changes
4. Begin Phase 1 (Security Critical) work immediately
---
**Prepared By:** Engineering Team
**Reviewed By:** [Pending]
**Approved By:** [Pending]
**Distribution:** Engineering Leadership, Product Management, Security Team
**CONFIDENTIAL - Do Not Distribute Outside Approved Recipients**

View File

@@ -0,0 +1,782 @@
# Service Accounts and Transient API Tokens
**Version:** 1.0
**Last Updated:** 2025-01-27
**Status:** Draft
## Overview
Service accounts provide programmatic access to the Attune API for sensors, action executions, and other automated processes. Unlike user accounts, service accounts:
- Have no password (token-based authentication only)
- Have limited scopes (principle of least privilege)
- Can be short-lived or long-lived depending on use case
- Are not tied to a human user
- Can be easily revoked without affecting user access
## Use Cases
1. **Sensors**: Long-lived tokens for sensor daemons to emit events
2. **Action Executions**: Short-lived tokens scoped to a single execution
3. **CLI Tools**: User-scoped tokens for command-line operations
4. **Webhooks**: Tokens for external systems to trigger actions
5. **Monitoring**: Tokens for health checks and metrics collection
## Token Types
### 1. Sensor Tokens
**Purpose**: Authentication for sensor daemon processes
**Characteristics**:
- **Lifetime**: Long-lived (90 days, auto-expires)
- **Scope**: `sensor`
- **Permissions**: Create events, read rules/triggers for specific trigger types
- **Revocable**: Yes (manual revocation via API)
- **Renewable**: Yes (automatic refresh via API, no restart required)
- **Rotation**: Automatic (sensor refreshes token when 80% of TTL elapsed)
**Example Usage**:
```bash
ATTUNE_API_TOKEN=sensor_abc123... ./attune-sensor --sensor-ref core.timer
```
### 2. Action Execution Tokens
**Purpose**: Authentication for action scripts during execution
**Characteristics**:
- **Lifetime**: Short-lived (matches execution timeout, typically 5-60 minutes)
- **Scope**: `action_execution`
- **Permissions**: Read keys, update execution status, limited to specific execution_id
- **Revocable**: Yes (auto-revoked on execution completion or timeout)
- **Renewable**: No (single-use, expires when execution completes or times out)
- **Auto-Cleanup**: Token revocation records are auto-deleted after expiration
**Example Usage**:
```python
# Action script receives token via environment variable
import os
import requests
api_url = os.environ['ATTUNE_API_URL']
api_token = os.environ['ATTUNE_API_TOKEN']
execution_id = os.environ['ATTUNE_EXECUTION_ID']
# Fetch encrypted key
response = requests.get(
f"{api_url}/keys/myapp.api_key",
headers={"Authorization": f"Bearer {api_token}"}
)
secret = response.json()['value']
```
### 3. User CLI Tokens
**Purpose**: Authentication for CLI tools on behalf of a user
**Characteristics**:
- **Lifetime**: Medium-lived (7-30 days)
- **Scope**: `user`
- **Permissions**: Full user permissions (RBAC-based)
- **Revocable**: Yes
- **Renewable**: Yes (via refresh token)
**Example Usage**:
```bash
attune auth login # Stores token in ~/.attune/token
attune action execute core.echo --param message="Hello"
```
### 4. Webhook Tokens
**Purpose**: Authentication for external systems calling Attune webhooks
**Characteristics**:
- **Lifetime**: Long-lived (90-365 days, auto-expires)
- **Scope**: `webhook`
- **Permissions**: Trigger specific actions or create events
- **Revocable**: Yes
- **Renewable**: Yes (generate new token before expiration)
- **Rotation**: Recommended every 90 days
**Example Usage**:
```bash
curl -X POST https://attune.example.com/api/webhooks/deploy \
-H "Authorization: Bearer webhook_xyz789..." \
-d '{"status": "deployed"}'
```
## Token Scopes and Permissions
| Scope | Permissions | Use Case |
|-------|-------------|----------|
| `admin` | Full access to all resources | System administrators, web UI |
| `user` | RBAC-based permissions | CLI tools, user sessions |
| `sensor` | Create events, read rules/triggers | Sensor daemons |
| `action_execution` | Read keys, update execution (scoped to execution_id) | Action scripts |
| `webhook` | Create events, trigger actions | External integrations |
| `readonly` | Read-only access to all resources | Monitoring, auditing |
## Database Schema
### Identity Table
Service accounts are stored in the `identity` table with `identity_type = 'service_account'`:
```sql
CREATE TABLE identity (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL UNIQUE,
identity_type identity_type NOT NULL, -- 'user' or 'service_account'
email VARCHAR(255), -- NULL for service accounts
password_hash VARCHAR(255), -- NULL for service accounts
metadata JSONB DEFAULT '{}',
created TIMESTAMPTZ DEFAULT NOW(),
updated TIMESTAMPTZ DEFAULT NOW()
);
```
Service account metadata includes:
```json
{
"scope": "sensor",
"description": "Timer sensor service account",
"created_by": 1, // identity_id of creator
"expires_at": "2025-04-27T12:34:56Z",
"trigger_types": ["core.timer"], // For sensor scope
"execution_id": 123 // For action_execution scope
}
```
### Token Storage
Tokens are **not** stored in the database (they are stateless JWTs). However, revocation is tracked:
```sql
CREATE TABLE token_revocation (
id BIGSERIAL PRIMARY KEY,
identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
token_jti VARCHAR(255) NOT NULL, -- JWT ID (jti claim)
token_exp TIMESTAMPTZ NOT NULL, -- Token expiration (from exp claim)
revoked_at TIMESTAMPTZ DEFAULT NOW(),
revoked_by BIGINT REFERENCES identity(id),
reason VARCHAR(500),
UNIQUE(token_jti)
);
CREATE INDEX idx_token_revocation_jti ON token_revocation(token_jti);
CREATE INDEX idx_token_revocation_identity ON token_revocation(identity_id);
CREATE INDEX idx_token_revocation_exp ON token_revocation(token_exp); -- For cleanup queries
```
## JWT Token Format
### Claims
All service account tokens include these claims:
```json
{
"sub": "sensor:core.timer", // Subject: "type:name"
"jti": "abc123...", // JWT ID (for revocation)
"iat": 1706356496, // Issued at (Unix timestamp)
"exp": 1714132496, // Expires at (Unix timestamp)
"identity_id": 123,
"identity_type": "service_account",
"scope": "sensor",
"metadata": {
"trigger_types": ["core.timer"]
}
}
```
### Scope-Specific Claims
**Sensor tokens** (restricted to declared trigger types):
```json
{
"scope": "sensor",
"metadata": {
"trigger_types": ["core.timer", "core.interval"]
}
}
```
The API enforces that sensors can only create events for trigger types listed in `metadata.trigger_types`. Attempting to create an event for an unauthorized trigger type will result in a `403 Forbidden` error.
**Action execution tokens**:
```json
{
"scope": "action_execution",
"metadata": {
"execution_id": 456,
"action_ref": "core.echo",
"workflow_id": 789 // Optional, if part of workflow
}
}
```
**Webhook tokens**:
```json
{
"scope": "webhook",
"metadata": {
"allowed_paths": ["/webhooks/deploy", "/webhooks/alert"],
"ip_whitelist": ["203.0.113.0/24"] // Optional
}
}
```
## API Endpoints
### Create Service Account
**Admin only**
```http
POST /service-accounts
Authorization: Bearer {admin_token}
Content-Type: application/json
{
"name": "sensor:core.timer",
"scope": "sensor",
"description": "Timer sensor service account",
"ttl_days": 90, // Sensor tokens: 90 days, auto-refresh before expiration
"metadata": {
"trigger_types": ["core.timer"]
}
}
```
**Response**:
```json
{
"identity_id": 123,
"name": "sensor:core.timer",
"scope": "sensor",
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires_at": "2025-04-27T12:34:56Z" // 90 days from now
}
```
**Important**: The token is only shown once. Store it securely.
### List Service Accounts
**Admin only**
```http
GET /service-accounts
Authorization: Bearer {admin_token}
```
**Response**:
```json
{
"data": [
{
"identity_id": 123,
"name": "sensor:core.timer",
"scope": "sensor",
"created_at": "2025-01-27T12:34:56Z",
"expires_at": "2025-04-27T12:34:56Z",
"metadata": {
"trigger_types": ["core.timer"]
}
}
]
}
```
### Refresh Token (Self-Service)
**Sensor/User tokens can refresh themselves**
```http
POST /auth/refresh
Authorization: Bearer {current_token}
Content-Type: application/json
{}
```
**Response**:
```json
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires_at": "2025-04-27T12:34:56Z"
}
```
**Notes**:
- Current token must be valid (not expired, not revoked)
- New token has same scope and metadata as current token
- New token has same TTL as original token type (e.g., 90 days for sensors)
- Old token remains valid until its original expiration (allows zero-downtime refresh)
- Only `sensor` and `user` scopes can refresh (not `action_execution` or `webhook`)
### Revoke Service Account Token
**Admin only**
```http
DELETE /service-accounts/{identity_id}
Authorization: Bearer {admin_token}
Content-Type: application/json
{
"reason": "Token compromised"
}
```
**Response**:
```json
{
"message": "Service account revoked",
"identity_id": 123
}
```
### Create Execution Token (Internal)
**Called by executor service, not exposed in API**
```rust
// In executor service
let execution_timeout_minutes = get_action_timeout(action_ref); // e.g., 30 minutes
let token = create_execution_token(
execution_id,
action_ref,
ttl_minutes: execution_timeout_minutes
)?;
```
This token is passed to the worker service, which injects it into the action's environment.
## Token Creation Workflow
### 1. Sensor Token Creation
```
Admin → POST /service-accounts (scope=sensor) → API
API → Create identity record → Database
API → Generate JWT with sensor scope → Response
Admin → Store token in secure config → Sensor deployment
Sensor → Use token for API calls → Event emission
```
### 2. Execution Token Creation
```
Rule fires → Executor creates enforcement → Executor
Executor → Schedule execution → Database
Executor → Create execution token (internal) → JWT library
Executor → Send execution request to worker → RabbitMQ
Worker → Receive message with token → Action runner
Action → Use token to fetch keys → API
Execution completes → Token expires (TTL) → Automatic cleanup
```
## Token Validation
### Middleware (API Service)
```rust
// In API service
pub async fn validate_token(
token: &str,
required_scope: Option<&str>
) -> Result<Claims> {
// 1. Verify JWT signature
let claims = decode_jwt(token)?;
// 2. Check expiration (JWT library handles this, but explicit check for clarity)
if claims.exp < now() {
return Err(Error::TokenExpired);
}
// 3. Check revocation (only check non-expired tokens)
if is_revoked(&claims.jti, claims.exp).await? {
return Err(Error::TokenRevoked);
}
// 4. Check scope
if let Some(scope) = required_scope {
if claims.scope != scope {
return Err(Error::InsufficientPermissions);
}
}
Ok(claims)
}
```
### Scope-Based Authorization
```rust
// Execution-scoped token can only access its own execution
if claims.scope == "action_execution" {
let allowed_execution_id = claims.metadata
.get("execution_id")
.and_then(|v| v.as_i64())
.ok_or(Error::InvalidToken)?;
if execution_id != allowed_execution_id {
return Err(Error::InsufficientPermissions);
}
}
// Sensor-scoped token can only create events for declared trigger types
if claims.scope == "sensor" {
let allowed_trigger_types = claims.metadata
.get("trigger_types")
.and_then(|v| v.as_array())
.ok_or(Error::InvalidToken)?;
let allowed_types: Vec<String> = allowed_trigger_types
.iter()
.filter_map(|v| v.as_str().map(String::from))
.collect();
if !allowed_types.contains(&trigger_type) {
return Err(Error::InsufficientPermissions);
}
}
```
## Security Best Practices
### Token Generation
**Generation:**
1. **Use Strong Secrets**: JWT signing key must be 256+ bits, randomly generated
2. **Include JTI**: Always include `jti` claim for revocation support
3. **REQUIRED Expiration**: All tokens MUST have `exp` claim - no exceptions
- Sensor tokens: 90 days (auto-refresh before expiration)
- Action execution tokens: Match execution timeout (5-60 minutes)
- User CLI tokens: 7-30 days (auto-refresh before expiration)
- Webhook tokens: 90-365 days (manual rotation)
4. **Minimal Scope**: Grant least privilege necessary
5. **Restrict Trigger Types**: For sensor tokens, only include necessary trigger types in metadata
### Token Storage
1. **Environment Variables**: Preferred method for sensors and actions
2. **Never Log**: Redact tokens from logs (show only last 4 chars)
3. **Never Commit**: Don't commit tokens to version control
4. **Secure Config**: Store in encrypted config management (Vault, k8s secrets)
### Token Transmission
1. **HTTPS Only**: Never send tokens over unencrypted connections
2. **Authorization Header**: Use `Authorization: Bearer {token}` header
3. **No Query Params**: Don't pass tokens in URL query parameters
4. **No Cookies**: For service accounts, avoid cookie-based auth
### Token Revocation
1. **Immediate Revocation**: Check revocation list on every request
2. **Audit Trail**: Log who revoked, when, and why
3. **Cascade Delete**: Revoke all tokens when service account is deleted
4. **Automatic Cleanup**: Delete revocation records for expired tokens (run hourly)
- Query: `DELETE FROM token_revocation WHERE token_exp < NOW()`
- Prevents indefinite table bloat
- Expired tokens are already invalid, no need to track revocation
5. **Validate Permissions**: Enforce trigger type restrictions for sensor tokens on event creation
## Implementation Checklist
- [ ] Add `identity_type` enum to database schema
- [ ] Add `token_revocation` table (with `token_exp` column)
- [ ] Create `POST /service-accounts` endpoint
- [ ] Create `GET /service-accounts` endpoint
- [ ] Create `DELETE /service-accounts/{id}` endpoint
- [ ] Create `POST /auth/refresh` endpoint (for automatic token refresh)
- [ ] Add scope validation middleware
- [ ] Add token revocation check middleware (skip check for expired tokens)
- [ ] Implement execution token creation in executor (TTL = action timeout)
- [ ] Pass execution token to worker via RabbitMQ
- [ ] Inject execution token into action environment
- [ ] Add CLI commands: `attune service-account create/list/revoke`
- [ ] Document token creation for sensor deployment
- [ ] Implement automatic token refresh in sensors (refresh at 80% of TTL)
- [ ] Implement cleanup job for expired token revocations (hourly cron)
## Migration Path
### Phase 1: Database Schema
```sql
-- Add identity_type enum if not exists
DO $$ BEGIN
CREATE TYPE identity_type AS ENUM ('user', 'service_account');
EXCEPTION
WHEN duplicate_object THEN null;
END $$;
-- Add identity_type column to identity table
ALTER TABLE identity
ADD COLUMN IF NOT EXISTS identity_type identity_type DEFAULT 'user';
-- Create token_revocation table
CREATE TABLE IF NOT EXISTS token_revocation (
id BIGSERIAL PRIMARY KEY,
identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
token_jti VARCHAR(255) NOT NULL,
token_exp TIMESTAMPTZ NOT NULL, -- For cleanup queries
revoked_at TIMESTAMPTZ DEFAULT NOW(),
revoked_by BIGINT REFERENCES identity(id),
reason VARCHAR(500),
UNIQUE(token_jti)
);
CREATE INDEX IF NOT EXISTS idx_token_revocation_jti ON token_revocation(token_jti);
CREATE INDEX IF NOT EXISTS idx_token_revocation_exp ON token_revocation(token_exp);
```
### Phase 2: API Implementation
1. Add service account repository
2. Add JWT utilities for scope-based tokens
3. Implement service account CRUD endpoints
4. Add middleware for token validation and revocation
### Phase 3: Integration
1. Update executor to create execution tokens
2. Update worker to receive and use execution tokens
3. Update sensor to accept and use sensor tokens
4. Update CLI to support service account management
## Examples
### Python Action Using Execution Token
```python
#!/usr/bin/env python3
import os
import requests
import sys
# Token is injected by worker
api_url = os.environ['ATTUNE_API_URL']
api_token = os.environ['ATTUNE_API_TOKEN']
execution_id = os.environ['ATTUNE_EXECUTION_ID']
# Fetch encrypted secret
response = requests.get(
f"{api_url}/keys/myapp.database_password",
headers={"Authorization": f"Bearer {api_token}"}
)
if response.status_code != 200:
print(f"Failed to fetch key: {response.text}", file=sys.stderr)
sys.exit(1)
db_password = response.json()['value']
# Use the secret...
print("Successfully connected to database")
```
### Sensor Using Sensor Token
```rust
// In sensor initialization
let api_token = env::var("ATTUNE_API_TOKEN")?;
let api_url = env::var("ATTUNE_API_URL")?;
let client = reqwest::Client::new();
// Fetch active rules
let response = client
.get(format!("{}/rules?trigger_type=core.timer", api_url))
.header("Authorization", format!("Bearer {}", api_token))
.send()
.await?;
let rules: Vec<Rule> = response.json().await?;
```
## Token Lifecycle Management
### Expiration Strategy
**All tokens MUST expire** to prevent indefinite revocation table bloat and reduce attack surface:
| Token Type | Expiration | Rationale |
|------------|------------|-----------|
| Sensor | 90 days | Perpetually running service, auto-refresh before expiration |
| Action Execution | 5-60 minutes | Matches action timeout, auto-cleanup on completion |
| User CLI | 7-30 days | Balance between convenience and security, auto-refresh |
| Webhook | 90-365 days | External integration, manual rotation required |
### Revocation Table Cleanup
Cleanup job runs hourly to prevent table bloat:
```sql
-- Delete revocation records for expired tokens
DELETE FROM token_revocation
WHERE token_exp < NOW();
```
**Why this works:**
- Expired tokens are already invalid (enforced by JWT `exp` claim)
- No need to track revocation status for invalid tokens
- Keeps revocation table small and queries fast
- Typical size: <1000 rows instead of millions
### Sensor Token Refresh
Sensors automatically refresh their own tokens without human intervention:
**Automatic Process:**
1. Sensor starts with 90-day token
2. Background task monitors token expiration
3. When 80% of TTL elapsed (72 days), sensor requests new token via `POST /auth/refresh`
4. New token is hot-loaded without restart
5. Old token remains valid until original expiration
6. Process repeats indefinitely
**Refresh Timing Example:**
- Token issued: Day 0, expires Day 90
- Refresh trigger: Day 72 (80% of 90 days)
- New token issued: Day 72, expires Day 162
- Old token still valid: Day 72-90 (overlap period)
- Next refresh: Day 144 (80% of new token)
**Zero-Downtime:**
- No service interruption during refresh
- Old token valid during transition
- Graceful fallback on refresh failure
## Cleanup Job Implementation
### Purpose
Prevent indefinite growth of the `token_revocation` table by removing revocation records for expired tokens.
### Why Cleanup Is Safe
- Expired tokens are already invalid (enforced by JWT `exp` claim)
- Token validation checks expiration before checking revocation
- No security risk in deleting expired token revocations
- Significantly reduces table size and improves query performance
### Implementation
**Frequency**: Hourly cron job or background task
**SQL Query**:
```sql
DELETE FROM token_revocation
WHERE token_exp < NOW();
```
**Expected Impact**:
- Typical table size: <1,000 rows instead of millions over time
- Fast revocation checks (indexed queries on small dataset)
- Reduced storage and backup costs
### Rust Implementation Example
```rust
use tokio::time::{interval, Duration};
/// Background task to clean up expired token revocations
pub async fn start_revocation_cleanup_task(db: PgPool) {
let mut interval = interval(Duration::from_secs(3600)); // Every hour
loop {
interval.tick().await;
match cleanup_expired_revocations(&db).await {
Ok(count) => {
info!("Cleaned up {} expired token revocations", count);
}
Err(e) => {
error!("Failed to clean up expired token revocations: {}", e);
}
}
}
}
/// Delete token revocation records for expired tokens
async fn cleanup_expired_revocations(db: &PgPool) -> Result<u64> {
let result = sqlx::query!(
"DELETE FROM token_revocation WHERE token_exp < NOW()"
)
.execute(db)
.await?;
Ok(result.rows_affected())
}
```
### Monitoring
Track cleanup job metrics:
- Number of records deleted per run
- Job execution time
- Job failures (alert if consecutive failures)
**Prometheus Metrics Example**:
```rust
// Define metrics
lazy_static! {
static ref REVOCATION_CLEANUP_COUNT: IntCounter = register_int_counter!(
"attune_revocation_cleanup_total",
"Total number of expired token revocations cleaned up"
).unwrap();
static ref REVOCATION_CLEANUP_DURATION: Histogram = register_histogram!(
"attune_revocation_cleanup_duration_seconds",
"Duration of token revocation cleanup job"
).unwrap();
}
// In cleanup function
let timer = REVOCATION_CLEANUP_DURATION.start_timer();
let count = cleanup_expired_revocations(&db).await?;
REVOCATION_CLEANUP_COUNT.inc_by(count);
timer.observe_duration();
```
### Alternative: Database Trigger
For automatic cleanup without application code:
```sql
-- Create function to delete old revocations
CREATE OR REPLACE FUNCTION cleanup_expired_token_revocations()
RETURNS trigger AS $$
BEGIN
DELETE FROM token_revocation WHERE token_exp < NOW() - INTERVAL '1 hour';
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Trigger on insert (cleanup when new revocations are added)
CREATE TRIGGER trigger_cleanup_expired_revocations
AFTER INSERT ON token_revocation
EXECUTE FUNCTION cleanup_expired_token_revocations();
```
**Note**: Application-level cleanup is preferred for better observability and control.
## Future Enhancements
1. **Rate Limiting**: Per-token rate limits to prevent abuse
2. **Audit Logging**: Comprehensive audit trail of token usage and refresh events
3. **OAuth 2.0**: Support OAuth 2.0 client credentials flow
4. **mTLS**: Mutual TLS authentication for high-security deployments
5. **Token Introspection**: RFC 7662-compliant token introspection endpoint
6. **Scope Hierarchies**: More granular permission scopes
7. **IP Whitelisting**: Restrict token usage to specific IP ranges
8. **Configurable Refresh Timing**: Allow custom refresh thresholds per token type
9. **Token Lineage Tracking**: Track token refresh chains for security audits
8. **Refresh Failure Alerts**: Notify operators when automatic refresh fails
9. **Token Lineage Tracking**: Track token refresh chains for audit purposes

View File

@@ -0,0 +1,356 @@
# Token Refresh System - Quick Reference
**Last Updated:** 2025-01-27
**Component:** Web UI Authentication
## Overview
The web UI implements automatic and proactive JWT token refresh to provide seamless authentication for active users.
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ User Activity → API Request │
│ ↓ │
│ Axios Interceptor (adds JWT) │
│ ↓ │
│ Server Response │
│ ├─ 200 OK → Continue │
│ ├─ 401 Unauthorized → Auto-refresh & retry │
│ └─ 403 Forbidden → Show permission error │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Background: Token Monitor (every 60s) │
│ ↓ │
│ Token expires in < 5 min? │
│ ├─ Yes → Proactive refresh │
│ └─ No → Continue monitoring │
└─────────────────────────────────────────────────────┘
```
## Key Components
### 1. API Wrapper (`web/src/lib/api-wrapper.ts`)
- **Purpose**: Configure axios with token refresh interceptors
- **Features**:
- Global axios defaults configuration
- Request interceptor (adds token)
- Response interceptor (handles 401/403)
- Proactive refresh monitor
### 2. ErrorDisplay Component (`web/src/components/common/ErrorDisplay.tsx`)
- **Purpose**: User-friendly error messages
- **Distinguishes**:
- 401: "Session expired" (handled automatically)
- 403: "Access denied - insufficient permissions"
- Other: Generic error with details
### 3. Auth Context (`web/src/contexts/AuthContext.tsx`)
- **Purpose**: Manage authentication state
- **Lifecycle**:
- `user` set → Start token refresh monitor
- `user` cleared → Stop token refresh monitor
## Token Lifecycle
### Access Token
- **Duration**: 1 hour (configured on backend)
- **Storage**: `localStorage.getItem('access_token')`
- **Refresh Trigger**: Automatic on 401 response
- **Proactive Refresh**: 5 minutes before expiration
### Refresh Token
- **Duration**: 7 days (configured on backend)
- **Storage**: `localStorage.getItem('refresh_token')`
- **Used**: To obtain new access token
- **Rotation**: Optional (backend can return new refresh token)
## Configuration
### Proactive Refresh Settings
```typescript
// File: web/src/lib/api-wrapper.ts
// Check every 60 seconds
const MONITOR_INTERVAL = 60000; // ms
// Refresh if expiring within 5 minutes
const REFRESH_THRESHOLD = 300; // seconds
```
### API Endpoint
```typescript
// Refresh endpoint
POST /auth/refresh
Content-Type: application/json
{
"refresh_token": "..."
}
// Response
{
"data": {
"access_token": "...",
"refresh_token": "..." // Optional - for rotation
}
}
```
## Error Handling
### 401 Unauthorized (Token Expired/Invalid)
```typescript
// Automatic handling:
1. Interceptor detects 401
2. Attempts token refresh with refresh_token
3. On success: Retry original request
4. On failure: Clear tokens, redirect to /login
```
### 403 Forbidden (Insufficient Permissions)
```typescript
// Manual handling in components:
<ErrorDisplay error={error} />
// Shows: "Access Denied - You do not have permission..."
```
### Network/Server Errors
```typescript
// Generic error display:
<ErrorDisplay
error={error}
showRetry={true}
onRetry={() => refetch()}
/>
```
## Usage in Components
### Detecting Error Types
```typescript
// In React components using TanStack Query
const { data, error, isLoading } = useActions();
if (error) {
// ErrorDisplay component handles type detection
return <ErrorDisplay error={error} />;
}
```
### Custom Error Handling
```typescript
// Check for 403 errors
const is403 = error?.response?.status === 403 ||
error?.isAuthorizationError;
if (is403) {
// Show permission-specific UI
}
// Check for 401 errors (rare - usually handled by interceptor)
const is401 = error?.response?.status === 401;
```
## Debugging
### Console Logs
```bash
# Initialization
🔧 Initializing API wrapper
✓ Axios defaults configured with interceptors
✓ API wrapper initialized
# Token Refresh
🔄 Access token expired, attempting refresh...
✓ Token refreshed successfully
# Monitor
🔄 Starting token refresh monitor
✓ Token proactively refreshed
⏹️ Stopping token refresh monitor
# Errors
⚠️ No refresh token available, redirecting to login
Token refresh failed, clearing session and redirecting to login
Access forbidden - insufficient permissions for this resource
```
### Browser DevTools
```bash
# Check tokens
Application → Local Storage → localhost
- access_token: "eyJ..."
- refresh_token: "eyJ..."
# Watch refresh requests
Network → Filter: refresh
- POST /auth/refresh
- Status: 200 OK
- Response: { data: { access_token, refresh_token } }
# Monitor console
Console → Filter: Token|refresh|Unauthorized
```
## Common Scenarios
### Scenario 1: Active User
```
User logged in → Using app normally
Every 60s: Monitor checks token expiration
Token expires in 4 minutes
Proactive refresh triggered
User continues seamlessly (no interruption)
```
### Scenario 2: Idle User Returns
```
User logged in → Leaves tab idle for 70 minutes
Access token expired (after 60 min)
User returns, clicks action
API returns 401
Interceptor attempts refresh
If refresh token valid: Success, retry request
If refresh token expired: Redirect to login
```
### Scenario 3: Permission Denied
```
User logged in → Tries restricted action
API returns 403 Forbidden
ErrorDisplay shows: "Access Denied"
User sees clear message (not "Unauthorized")
```
### Scenario 4: Network Failure During Refresh
```
User action → 401 response → Refresh attempt
Network error / API down
Refresh fails → Tokens cleared
Redirect to login
SessionStorage saves current path
After login → Redirect back to original page
```
## Testing
### Manual Test: Token Expiration
```bash
# 1. Log in to web UI
# 2. Open DevTools → Application → Local Storage
# 3. Copy access_token value
# 4. Decode at jwt.io - note expiration time
# 5. Wait until near expiration
# 6. Perform action (view page, click button)
# 7. Watch Network tab for /auth/refresh call
# 8. Verify action completes successfully
```
### Manual Test: Permission Denied
```bash
# 1. Log in as limited user
# 2. Try to access admin-only resource
# 3. Verify: See "Access Denied" (not "Unauthorized")
# 4. Verify: Amber/yellow UI (not red)
# 5. Verify: Helpful message about permissions
```
### Manual Test: Proactive Refresh
```bash
# 1. Log in
# 2. Open Console
# 3. Look for "🔄 Starting token refresh monitor"
# 4. Wait 60 seconds
# 5. If token expires within 5 min, see:
# "✓ Token proactively refreshed"
# 6. Logout
# 7. See: "⏹️ Stopping token refresh monitor"
```
## Troubleshooting
### Issue: Redirect loop to /login
**Cause**: Both access_token and refresh_token expired
**Solution**: Expected behavior - user must log in again
### Issue: Token not refreshing automatically
**Check**:
1. Axios interceptors configured? → See console for init logs
2. Token exists in localStorage?
3. Refresh token valid?
4. Network connectivity?
5. Backend /auth/refresh endpoint working?
### Issue: Monitor not running
**Check**:
1. User authenticated? → Monitor only runs when `user` is set
2. Check console for "Starting token refresh monitor"
3. Verify AuthContext lifecycle in React DevTools
### Issue: Wrong error message (401 vs 403)
**Check**:
1. Using ErrorDisplay component?
2. Error object has `response.status` property?
3. Interceptor properly marking 403 errors?
## Security Notes
1. **Token Storage**: Currently uses localStorage
- ✅ Works across tabs
- ⚠️ Vulnerable to XSS
- 🔒 Consider httpOnly cookies for production
2. **Token Exposure**: Tokens only in Authorization header
- ✅ Never in URL parameters
- ✅ Not logged to console
3. **Automatic Cleanup**: Failed refresh clears all tokens
- ✅ No stale authentication state
4. **Single Sign-Out**: Clearing tokens stops all access
- ✅ Immediate effect
## API Requirements
The backend must provide:
1. **Login Endpoint**: Returns access_token + refresh_token
2. **Refresh Endpoint**: Accepts refresh_token, returns new access_token
3. **Token Format**: Standard JWT with `exp` claim
4. **Error Codes**:
- 401 for expired/invalid tokens
- 403 for permission denied
## Related Files
- `web/src/lib/api-wrapper.ts` - Core token refresh logic
- `web/src/lib/api-client.ts` - Axios instance configuration
- `web/src/components/common/ErrorDisplay.tsx` - Error UI
- `web/src/contexts/AuthContext.tsx` - Auth state management
- `web/src/pages/auth/LoginPage.tsx` - Login with redirect
## Related Documentation
- Full details: `work-summary/2025-01-token-refresh-improvements.md`
- Authentication: `docs/authentication/authentication.md`
- Token rotation: `docs/authentication/token-rotation.md`

View File

@@ -0,0 +1,479 @@
# Token Rotation Guide
**Version:** 1.0
**Last Updated:** 2025-01-27
**Audience:** System Administrators, DevOps Engineers
## Overview
This guide provides procedures for rotating service account tokens in Attune to maintain security and prevent token revocation table bloat. All tokens in Attune have expiration times and require periodic rotation.
## Token Expiration Policy
**All tokens MUST expire.** This is a hard requirement to prevent:
- Indefinite growth of the `token_revocation` table
- Long-lived compromised credentials
- Security debt accumulation
### Token Lifetimes
| Token Type | Lifetime | Rotation Frequency | Auto-Cleanup |
|------------|----------|-------------------|--------------|
| Sensor | 24-72 hours | Every 24-72 hours | Yes (on expiration) |
| Action Execution | 5-60 minutes | N/A (single-use) | Yes (on completion) |
| User CLI | 7-30 days | Every 7-30 days | No (manual revocation) |
| Webhook | 90-365 days | Every 90-365 days | No (manual revocation) |
## Sensor Token Rotation
### Why Rotation is Required
Sensor tokens expire after 24-72 hours to:
- Limit the impact of compromised credentials
- Force regular security reviews
- Prevent revocation table bloat
- Align with security best practices
### Rotation Process
#### Manual Rotation (Current)
**Preparation:**
```bash
# Set admin token
export ADMIN_TOKEN="your_admin_token"
# Note the current sensor name
SENSOR_NAME="sensor:core.timer"
```
**Step 1: Create New Service Account**
```bash
# Create new token
curl -X POST http://localhost:8080/service-accounts \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"${SENSOR_NAME}\",
\"scope\": \"sensor\",
\"description\": \"Timer sensor (rotated $(date +%Y-%m-%d))\",
\"ttl_hours\": 72,
\"metadata\": {
\"trigger_types\": [\"core.timer\"]
}
}"
# Save the response
# {
# "identity_id": 456,
# "name": "sensor:core.timer",
# "token": "eyJhbGci...", <-- COPY THIS
# "expires_at": "2025-01-30T12:34:56Z"
# }
export NEW_TOKEN="eyJhbGci..."
```
**Step 2: Update Sensor Configuration**
**For systemd deployments:**
```bash
# Update environment file
sudo nano /etc/attune/sensor-timer.env
# Replace old token with new token
ATTUNE_API_TOKEN=eyJhbGci... # <-- NEW TOKEN HERE
```
**For Docker/Kubernetes:**
```bash
# Update secret
kubectl create secret generic sensor-timer-token \
--from-literal=token="${NEW_TOKEN}" \
--dry-run=client -o yaml | kubectl apply -f -
# Or update Docker environment variable
docker service update attune-core-timer-sensor \
--env-add ATTUNE_API_TOKEN="${NEW_TOKEN}"
```
**For environment variables:**
```bash
# Update environment variable
export ATTUNE_API_TOKEN="${NEW_TOKEN}"
```
**Step 3: Restart Sensor**
```bash
# systemd
sudo systemctl restart attune-core-timer-sensor
# Docker
docker restart attune-core-timer-sensor
# Kubernetes
kubectl rollout restart deployment/sensor-timer
```
**Step 4: Verify New Token is Working**
```bash
# Check sensor logs
sudo journalctl -u attune-core-timer-sensor -f --since "1 minute ago"
# Look for:
# - "API connectivity verified"
# - "Connected to RabbitMQ"
# - "Started consuming messages"
# - No authentication errors
```
**Step 5: Revoke Old Token (Optional)**
The old token will expire automatically after 72 hours. For immediate revocation:
```bash
# Get old identity_id from previous creation response
OLD_IDENTITY_ID=123
# Revoke old token
curl -X DELETE http://localhost:8080/service-accounts/${OLD_IDENTITY_ID} \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"reason\": \"Token rotation\"
}"
```
### Rotation Schedule
**Recommended Schedule:**
- **Production:** Every 48 hours (allows 24-hour margin before expiration)
- **Staging:** Every 72 hours
- **Development:** Every 72 hours
**Calendar Reminder:**
Set up recurring calendar events or use cron to remind operators:
```bash
# Add to crontab (runs every 48 hours)
0 */48 * * * /usr/local/bin/rotate-sensor-token.sh
```
### Monitoring Token Expiration
**Check Token Expiration:**
```bash
# Decode JWT to check expiration
echo "${ATTUNE_API_TOKEN}" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq -r '.exp'
# Output: 1738886400 (Unix timestamp)
# Convert to human-readable
date -d @1738886400
# Output: 2025-01-30 12:00:00
```
**Set Up Alerts:**
```bash
#!/bin/bash
# check-token-expiration.sh
# Run this hourly via cron
TOKEN="${ATTUNE_API_TOKEN}"
EXP=$(echo "${TOKEN}" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq -r '.exp')
NOW=$(date +%s)
HOURS_REMAINING=$(( ($EXP - $NOW) / 3600 ))
if [ "$HOURS_REMAINING" -lt 6 ]; then
echo "WARNING: Sensor token expires in ${HOURS_REMAINING} hours!"
# Send alert to monitoring system
curl -X POST https://monitoring.example.com/alerts \
-d "message=Sensor token expires in ${HOURS_REMAINING} hours"
fi
```
**Add to crontab:**
```bash
0 * * * * /usr/local/bin/check-token-expiration.sh
```
## Action Execution Token Lifecycle
Action execution tokens are automatically managed:
**Creation:** Executor service creates token when scheduling execution
```rust
let token = create_execution_token(
execution_id,
action_ref,
ttl_minutes: action_timeout_minutes
)?;
```
**Usage:** Worker injects token into action environment
```bash
ATTUNE_API_TOKEN=eyJhbGci...
ATTUNE_EXECUTION_ID=123
```
**Expiration:** Token expires when execution times out or completes
**Cleanup:** Revocation record (if created) is automatically deleted after expiration
**No manual intervention required.**
## User CLI Token Rotation
### When to Rotate
- Every 7-30 days (based on TTL)
- When user credentials change
- When token is compromised
- When user leaves organization
### Rotation Process
**Step 1: Login Again**
```bash
# User logs in to get new token
attune auth login
# Enter credentials
# New token is stored in ~/.attune/token
```
**Step 2: Verify New Token**
```bash
# Test with simple command
attune pack list
# Should succeed without errors
```
**Old token is automatically revoked during login (if configured).**
## Webhook Token Rotation
### When to Rotate
- Every 90-365 days (based on TTL)
- When webhook is compromised
- When integrating system changes
- During security audits
### Rotation Process
**Step 1: Create New Webhook Token**
```bash
curl -X POST http://localhost:8080/service-accounts \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "webhook:deployment-notifications",
"scope": "webhook",
"description": "GitHub deployment webhook",
"ttl_days": 90,
"metadata": {
"allowed_paths": ["/webhooks/deploy"]
}
}'
# Save the new token
export NEW_WEBHOOK_TOKEN="eyJhbGci..."
```
**Step 2: Update External System**
Update the webhook configuration in the external system (GitHub, GitLab, etc.) with the new token.
**Step 3: Test Webhook**
```bash
# Send test webhook
curl -X POST https://attune.example.com/webhooks/deploy \
-H "Authorization: Bearer ${NEW_WEBHOOK_TOKEN}" \
-d '{"status": "deployed"}'
# Should succeed
```
**Step 4: Revoke Old Token**
After confirming the new token works:
```bash
curl -X DELETE http://localhost:8080/service-accounts/${OLD_IDENTITY_ID} \
-H "Authorization: Bearer ${ADMIN_TOKEN}"
```
## Automation Scripts
### Sensor Token Rotation Script
```bash
#!/bin/bash
# rotate-sensor-token.sh
# Automated sensor token rotation
set -e
SENSOR_NAME="${1:-sensor:core.timer}"
ADMIN_TOKEN="${ADMIN_TOKEN}"
API_URL="${ATTUNE_API_URL:-http://localhost:8080}"
if [ -z "$ADMIN_TOKEN" ]; then
echo "Error: ADMIN_TOKEN environment variable not set"
exit 1
fi
echo "Rotating token for ${SENSOR_NAME}..."
# Create new token
RESPONSE=$(curl -s -X POST "${API_URL}/service-accounts" \
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
-H "Content-Type: application/json" \
-d "{
\"name\": \"${SENSOR_NAME}\",
\"scope\": \"sensor\",
\"description\": \"Auto-rotated $(date +%Y-%m-%d)\",
\"ttl_hours\": 72,
\"metadata\": {
\"trigger_types\": [\"core.timer\"]
}
}")
NEW_TOKEN=$(echo "$RESPONSE" | jq -r '.token')
EXPIRES_AT=$(echo "$RESPONSE" | jq -r '.expires_at')
if [ -z "$NEW_TOKEN" ] || [ "$NEW_TOKEN" = "null" ]; then
echo "Error: Failed to create new token"
echo "$RESPONSE"
exit 1
fi
echo "New token created, expires at: ${EXPIRES_AT}"
# Update configuration file
echo "ATTUNE_API_TOKEN=${NEW_TOKEN}" | sudo tee /etc/attune/sensor-timer.env
# Restart service
echo "Restarting sensor service..."
sudo systemctl restart attune-core-timer-sensor
# Wait for service to start
sleep 5
# Check status
if sudo systemctl is-active --quiet attune-core-timer-sensor; then
echo "✓ Sensor token rotated successfully"
echo " New token expires: ${EXPIRES_AT}"
else
echo "✗ Sensor failed to start, check logs"
sudo journalctl -u attune-core-timer-sensor -n 50
exit 1
fi
```
### Token Expiration Check Script
```bash
#!/bin/bash
# check-all-tokens.sh
# Check expiration for all active service accounts
API_URL="${ATTUNE_API_URL:-http://localhost:8080}"
ADMIN_TOKEN="${ADMIN_TOKEN}"
WARN_HOURS=6
# Fetch all service accounts
ACCOUNTS=$(curl -s -X GET "${API_URL}/service-accounts" \
-H "Authorization: Bearer ${ADMIN_TOKEN}")
echo "$ACCOUNTS" | jq -r '.data[] | "\(.name)\t\(.expires_at)"' | \
while IFS=$'\t' read -r name expires_at; do
exp_timestamp=$(date -d "$expires_at" +%s)
now=$(date +%s)
hours_remaining=$(( ($exp_timestamp - $now) / 3600 ))
if [ "$hours_remaining" -lt "$WARN_HOURS" ]; then
echo "⚠️ WARNING: ${name} expires in ${hours_remaining} hours (${expires_at})"
else
echo "${name} expires in ${hours_remaining} hours (${expires_at})"
fi
done
```
## Troubleshooting
### "Token expired" Error
**Symptom:** Sensor logs show "401 Unauthorized" or "Token expired"
**Solution:**
1. Verify current time is correct: `date`
2. Check token expiration: `echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .exp`
3. Create new token and restart sensor (see rotation process above)
### Sensor Won't Start After Rotation
**Symptom:** Sensor fails to start after updating token
**Troubleshooting:**
1. Verify token is correctly formatted (JWT with 3 parts: header.payload.signature)
2. Check token hasn't already expired
3. Verify token has correct scope and metadata
4. Check sensor logs for specific error message
### Token Revocation Table Growing Too Large
**Symptom:** `token_revocation` table has millions of rows
**Solution:**
1. Ensure cleanup job is running (hourly)
2. Manually run cleanup: `DELETE FROM token_revocation WHERE token_exp < NOW()`
3. Verify all tokens have expiration set
4. Check for tokens with very long TTLs
## Best Practices
1. **Set Calendar Reminders:** Don't rely on memory, set recurring calendar events
2. **Automate Where Possible:** Use cron jobs and scripts for rotation
3. **Monitor Expiration:** Set up alerts 6-12 hours before expiration
4. **Test Rotation:** Practice rotation in staging before production
5. **Document Tokens:** Keep inventory of active service accounts and their purposes
6. **Minimal TTL:** Use shortest acceptable TTL for each token type
7. **Rotate on Compromise:** Immediately rotate if token is compromised
8. **Clean Up:** Revoke old tokens after rotation (or let them expire)
## Security Considerations
- **Never commit tokens to version control**
- **Use encrypted storage for tokens** (e.g., Vault, AWS Secrets Manager)
- **Rotate immediately if compromised**
- **Audit token usage regularly**
- **Minimize token scope and permissions**
- **Use separate tokens for each sensor/webhook**
- **Monitor for unauthorized token usage**
## Future Enhancements
1. **Automatic Rotation:** Hot-reload tokens without sensor restart
2. **Token Renewal API:** Extend token TTL without creating new token
3. **Token Rotation Hooks:** Webhook notifications before expiration
4. **Managed Tokens:** Orchestrator handles rotation automatically
5. **Token Rotation Dashboard:** Web UI for monitoring and rotating tokens
## See Also
- [Service Accounts Documentation](./service-accounts.md)
- [Sensor Interface Specification](./sensor-interface.md)
- [Sensor Authentication Overview](./sensor-authentication-overview.md)
- [Timer Sensor README](../crates/core-timer-sensor/README.md)