re-uploading work
This commit is contained in:
206
docs/authentication/auth-quick-reference.md
Normal file
206
docs/authentication/auth-quick-reference.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Authentication Quick Reference
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
JWT_SECRET=your-secret-key-here # Required in production!
|
||||
JWT_ACCESS_EXPIRATION=3600 # Optional (1 hour default)
|
||||
JWT_REFRESH_EXPIRATION=604800 # Optional (7 days default)
|
||||
```
|
||||
|
||||
## Endpoints
|
||||
|
||||
### Register New User
|
||||
```http
|
||||
POST /auth/register
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"login": "username",
|
||||
"password": "securepass123",
|
||||
"display_name": "Full Name" // optional
|
||||
}
|
||||
```
|
||||
|
||||
### Login
|
||||
```http
|
||||
POST /auth/login
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"login": "username",
|
||||
"password": "securepass123"
|
||||
}
|
||||
```
|
||||
|
||||
### Refresh Token
|
||||
```http
|
||||
POST /auth/refresh
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"refresh_token": "eyJhbGc..."
|
||||
}
|
||||
```
|
||||
|
||||
### Get Current User (Protected)
|
||||
```http
|
||||
GET /auth/me
|
||||
Authorization: Bearer <access_token>
|
||||
```
|
||||
|
||||
### Change Password (Protected)
|
||||
```http
|
||||
POST /auth/change-password
|
||||
Authorization: Bearer <access_token>
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"current_password": "oldpass123",
|
||||
"new_password": "newpass456"
|
||||
}
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
### Success (Register/Login/Refresh)
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"access_token": "eyJhbGciOiJIUzI1NiIs...",
|
||||
"refresh_token": "eyJhbGciOiJIUzI1NiIs...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Success (Get Current User)
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": 1,
|
||||
"login": "username",
|
||||
"display_name": "Full Name"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Error
|
||||
```json
|
||||
{
|
||||
"error": "Invalid login or password",
|
||||
"code": "UNAUTHORIZED"
|
||||
}
|
||||
```
|
||||
|
||||
## HTTP Status Codes
|
||||
|
||||
- `200 OK` - Success
|
||||
- `400 Bad Request` - Invalid request format
|
||||
- `401 Unauthorized` - Missing/invalid/expired token or bad credentials
|
||||
- `403 Forbidden` - Insufficient permissions
|
||||
- `409 Conflict` - Username already exists
|
||||
- `422 Unprocessable Entity` - Validation error
|
||||
|
||||
## Common Errors
|
||||
|
||||
| Error | Cause | Solution |
|
||||
|-------|-------|----------|
|
||||
| Missing authentication token | No Authorization header | Add `Authorization: Bearer <token>` |
|
||||
| Invalid authentication token | Malformed or wrong secret | Verify token format and JWT_SECRET |
|
||||
| Authentication token expired | Access token expired | Use refresh token to get new one |
|
||||
| Invalid login or password | Wrong credentials | Check username and password |
|
||||
| Username already exists | Duplicate registration | Use different username |
|
||||
| Validation failed | Password too short, etc. | Check validation requirements |
|
||||
|
||||
## Validation Rules
|
||||
|
||||
- **Login:** 3-255 characters
|
||||
- **Password:** 8-128 characters
|
||||
- **Display Name:** 0-255 characters (optional)
|
||||
|
||||
## cURL Examples
|
||||
|
||||
```bash
|
||||
# Register
|
||||
curl -X POST http://localhost:8080/auth/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"login":"alice","password":"secure123","display_name":"Alice"}'
|
||||
|
||||
# Login
|
||||
curl -X POST http://localhost:8080/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"login":"alice","password":"secure123"}'
|
||||
|
||||
# Get Current User (replace TOKEN)
|
||||
curl http://localhost:8080/auth/me \
|
||||
-H "Authorization: Bearer TOKEN"
|
||||
|
||||
# Change Password
|
||||
curl -X POST http://localhost:8080/auth/change-password \
|
||||
-H "Authorization: Bearer TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"current_password":"secure123","new_password":"newsecure456"}'
|
||||
|
||||
# Refresh Token
|
||||
curl -X POST http://localhost:8080/auth/refresh \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"refresh_token":"REFRESH_TOKEN"}'
|
||||
```
|
||||
|
||||
## Using in Route Handlers
|
||||
|
||||
```rust
|
||||
use crate::auth::middleware::RequireAuth;
|
||||
|
||||
async fn protected_handler(
|
||||
RequireAuth(user): RequireAuth,
|
||||
) -> Result<Json<ApiResponse<Data>>, ApiError> {
|
||||
let identity_id = user.identity_id()?;
|
||||
let login = user.login();
|
||||
|
||||
// Your handler logic
|
||||
Ok(Json(ApiResponse::new(data)))
|
||||
}
|
||||
```
|
||||
|
||||
## Security Checklist
|
||||
|
||||
- [ ] Use HTTPS in production
|
||||
- [ ] Set strong JWT_SECRET (256+ bits)
|
||||
- [ ] Store tokens securely on client
|
||||
- [ ] Implement rate limiting
|
||||
- [ ] Never log tokens
|
||||
- [ ] Rotate secrets periodically
|
||||
- [ ] Clear tokens on logout
|
||||
|
||||
## Token Lifecycle
|
||||
|
||||
1. **Register/Login** → Receive access + refresh tokens
|
||||
2. **API Call** → Use access token in Authorization header
|
||||
3. **Token Expires** → Use refresh token to get new access token
|
||||
4. **Refresh Expires** → User must login again
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Server won't start?**
|
||||
- Check DATABASE_URL is set
|
||||
- Verify database is running
|
||||
- Run migrations: `sqlx migrate run`
|
||||
|
||||
**Auth fails with valid credentials?**
|
||||
- Check password hash in database
|
||||
- Verify JWT_SECRET matches
|
||||
- Check token expiration
|
||||
|
||||
**Debug logging:**
|
||||
```bash
|
||||
RUST_LOG=attune_api=debug cargo run --bin attune-api
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
- Full docs: `docs/authentication.md`
|
||||
- Testing guide: `docs/testing-authentication.md`
|
||||
- Implementation: `crates/api/src/routes/auth.rs`
|
||||
381
docs/authentication/authentication.md
Normal file
381
docs/authentication/authentication.md
Normal file
@@ -0,0 +1,381 @@
|
||||
# Authentication & Authorization
|
||||
|
||||
## Overview
|
||||
|
||||
Attune uses JWT (JSON Web Token) based authentication for securing API endpoints. The authentication system supports user registration, login, token refresh, and password management.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **JWT Tokens**
|
||||
- **Access Tokens**: Short-lived tokens (default: 1 hour) used for API authentication
|
||||
- **Refresh Tokens**: Long-lived tokens (default: 7 days) used to obtain new access tokens
|
||||
|
||||
2. **Password Security**
|
||||
- Passwords are hashed using **Argon2id** (industry-standard, memory-hard algorithm)
|
||||
- Password hashes are stored in the `attributes` JSONB field of the `identity` table
|
||||
- Minimum password length: 8 characters
|
||||
|
||||
3. **Middleware**
|
||||
- `require_auth`: Middleware function that validates JWT tokens on protected routes
|
||||
- `RequireAuth`: Extractor for accessing authenticated user information in handlers
|
||||
|
||||
## Configuration
|
||||
|
||||
Authentication is configured via environment variables:
|
||||
|
||||
```bash
|
||||
# JWT Secret Key (REQUIRED in production)
|
||||
JWT_SECRET=your-secret-key-here
|
||||
|
||||
# Token Expiration (in seconds)
|
||||
JWT_ACCESS_EXPIRATION=3600 # 1 hour (default)
|
||||
JWT_REFRESH_EXPIRATION=604800 # 7 days (default)
|
||||
```
|
||||
|
||||
**Security Warning**: Always set a strong, random `JWT_SECRET` in production. The default value is insecure and should only be used for development.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Public Endpoints (No Authentication Required)
|
||||
|
||||
#### Register a New User
|
||||
|
||||
```http
|
||||
POST /auth/register
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"login": "username",
|
||||
"password": "securepassword123",
|
||||
"display_name": "John Doe" // optional
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"access_token": "eyJhbGc...",
|
||||
"refresh_token": "eyJhbGc...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Login
|
||||
|
||||
```http
|
||||
POST /auth/login
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"login": "username",
|
||||
"password": "securepassword123"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"access_token": "eyJhbGc...",
|
||||
"refresh_token": "eyJhbGc...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Refresh Access Token
|
||||
|
||||
```http
|
||||
POST /auth/refresh
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"refresh_token": "eyJhbGc..."
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"access_token": "eyJhbGc...",
|
||||
"refresh_token": "eyJhbGc...",
|
||||
"token_type": "Bearer",
|
||||
"expires_in": 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Protected Endpoints (Authentication Required)
|
||||
|
||||
All protected endpoints require an `Authorization` header with a valid access token:
|
||||
|
||||
```http
|
||||
Authorization: Bearer <access_token>
|
||||
```
|
||||
|
||||
#### Get Current User
|
||||
|
||||
```http
|
||||
GET /auth/me
|
||||
Authorization: Bearer eyJhbGc...
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": 1,
|
||||
"login": "username",
|
||||
"display_name": "John Doe"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Change Password
|
||||
|
||||
```http
|
||||
POST /auth/change-password
|
||||
Authorization: Bearer eyJhbGc...
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"current_password": "oldpassword123",
|
||||
"new_password": "newpassword456"
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"success": true,
|
||||
"message": "Password changed successfully"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Responses
|
||||
|
||||
Authentication errors return appropriate HTTP status codes:
|
||||
|
||||
- **400 Bad Request**: Invalid request format or validation errors
|
||||
- **401 Unauthorized**: Missing, invalid, or expired token; invalid credentials
|
||||
- **403 Forbidden**: Insufficient permissions (future RBAC implementation)
|
||||
- **409 Conflict**: Username already exists during registration
|
||||
|
||||
Example error response:
|
||||
```json
|
||||
{
|
||||
"error": "Invalid authentication token",
|
||||
"code": "UNAUTHORIZED"
|
||||
}
|
||||
```
|
||||
|
||||
## Usage in Route Handlers
|
||||
|
||||
### Protecting Routes
|
||||
|
||||
Add the authentication middleware to routes that require authentication:
|
||||
|
||||
```rust
|
||||
use crate::auth::middleware::RequireAuth;
|
||||
|
||||
async fn protected_handler(
|
||||
RequireAuth(user): RequireAuth,
|
||||
) -> Result<Json<ApiResponse<MyData>>, ApiError> {
|
||||
let identity_id = user.identity_id()?;
|
||||
let login = user.login();
|
||||
|
||||
// Your handler logic here
|
||||
Ok(Json(ApiResponse::new(data)))
|
||||
}
|
||||
```
|
||||
|
||||
### Accessing User Information
|
||||
|
||||
The `RequireAuth` extractor provides access to the authenticated user's claims:
|
||||
|
||||
```rust
|
||||
pub struct AuthenticatedUser {
|
||||
pub claims: Claims,
|
||||
}
|
||||
|
||||
impl AuthenticatedUser {
|
||||
pub fn identity_id(&self) -> Result<i64, ParseIntError>
|
||||
pub fn login(&self) -> &str
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Identity Table
|
||||
|
||||
The `identity` table stores user authentication information:
|
||||
|
||||
```sql
|
||||
CREATE TABLE attune.identity (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
login TEXT NOT NULL UNIQUE,
|
||||
display_name TEXT,
|
||||
attributes JSONB NOT NULL DEFAULT '{}'::jsonb,
|
||||
password_hash TEXT, -- Added in migration 20240102000001
|
||||
created TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
**Note**: The `password_hash` column is optional to support:
|
||||
- External authentication providers (OAuth, SAML, etc.)
|
||||
- Service accounts that don't use password authentication
|
||||
- API key-based authentication (future implementation)
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **JWT Secret**
|
||||
- Use a strong, random secret (minimum 256 bits)
|
||||
- Never commit secrets to version control
|
||||
- Rotate secrets periodically in production
|
||||
|
||||
2. **Token Storage (Client-Side)**
|
||||
- Store tokens securely (e.g., httpOnly cookies or secure storage)
|
||||
- Never expose tokens in URLs or localStorage (if using web clients)
|
||||
- Clear tokens on logout
|
||||
|
||||
3. **Password Requirements**
|
||||
- Minimum 8 characters (enforced by validation)
|
||||
- Consider implementing additional requirements (uppercase, numbers, symbols)
|
||||
- Implement rate limiting on login attempts (future enhancement)
|
||||
|
||||
4. **HTTPS**
|
||||
- Always use HTTPS in production to protect tokens in transit
|
||||
- Configure proper TLS/SSL certificates
|
||||
|
||||
5. **Token Expiration**
|
||||
- Keep access tokens short-lived (1 hour recommended)
|
||||
- Use refresh tokens for long-lived sessions
|
||||
- Implement token revocation for logout (future enhancement)
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
|
||||
1. **Role-Based Access Control (RBAC)**
|
||||
- Permission set assignments
|
||||
- Fine-grained authorization middleware
|
||||
- Resource-level permissions
|
||||
|
||||
2. **Multi-Factor Authentication (MFA)**
|
||||
- TOTP support
|
||||
- SMS/Email verification codes
|
||||
|
||||
3. **OAuth/OIDC Integration**
|
||||
- Support for external identity providers
|
||||
- Single Sign-On (SSO)
|
||||
|
||||
4. **Token Revocation**
|
||||
- Blacklist/whitelist mechanisms
|
||||
- Force logout functionality
|
||||
|
||||
5. **Account Security**
|
||||
- Password reset via email
|
||||
- Account lockout after failed attempts
|
||||
- Security audit logs
|
||||
|
||||
6. **API Keys**
|
||||
- Service-to-service authentication
|
||||
- Scoped API keys for automation
|
||||
|
||||
## Testing
|
||||
|
||||
### Manual Testing with cURL
|
||||
|
||||
```bash
|
||||
# Register a new user
|
||||
curl -X POST http://localhost:8080/auth/register \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"login": "testuser",
|
||||
"password": "testpass123",
|
||||
"display_name": "Test User"
|
||||
}'
|
||||
|
||||
# Login
|
||||
curl -X POST http://localhost:8080/auth/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"login": "testuser",
|
||||
"password": "testpass123"
|
||||
}'
|
||||
|
||||
# Get current user (replace TOKEN with actual access token)
|
||||
curl http://localhost:8080/auth/me \
|
||||
-H "Authorization: Bearer TOKEN"
|
||||
|
||||
# Change password
|
||||
curl -X POST http://localhost:8080/auth/change-password \
|
||||
-H "Authorization: Bearer TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"current_password": "testpass123",
|
||||
"new_password": "newpass456"
|
||||
}'
|
||||
|
||||
# Refresh token
|
||||
curl -X POST http://localhost:8080/auth/refresh \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"refresh_token": "REFRESH_TOKEN"
|
||||
}'
|
||||
```
|
||||
|
||||
### Unit Tests
|
||||
|
||||
Password hashing and JWT utilities include comprehensive unit tests:
|
||||
|
||||
```bash
|
||||
# Run auth-related tests
|
||||
cargo test --package attune-api password
|
||||
cargo test --package attune-api jwt
|
||||
cargo test --package attune-api middleware
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **"Missing authentication token"**
|
||||
- Ensure you're including the `Authorization` header
|
||||
- Verify the header format: `Bearer <token>`
|
||||
|
||||
2. **"Authentication token expired"**
|
||||
- Use the refresh token endpoint to get a new access token
|
||||
- Check token expiration configuration
|
||||
|
||||
3. **"Invalid login or password"**
|
||||
- Verify credentials are correct
|
||||
- Check if the identity has a password set (some accounts may use external auth)
|
||||
|
||||
4. **"JWT_SECRET not set" warning**
|
||||
- Set the `JWT_SECRET` environment variable before starting the server
|
||||
- Use a strong, random value in production
|
||||
|
||||
### Debug Logging
|
||||
|
||||
Enable debug logging to troubleshoot authentication issues:
|
||||
|
||||
```bash
|
||||
RUST_LOG=attune_api=debug cargo run --bin attune-api
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [RFC 7519: JSON Web Token (JWT)](https://datatracker.ietf.org/doc/html/rfc7519)
|
||||
- [Argon2 Password Hashing](https://en.wikipedia.org/wiki/Argon2)
|
||||
- [OWASP Authentication Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html)
|
||||
367
docs/authentication/secrets-management.md
Normal file
367
docs/authentication/secrets-management.md
Normal file
@@ -0,0 +1,367 @@
|
||||
# Secrets Management in Attune Worker Service
|
||||
|
||||
## Overview
|
||||
|
||||
The Attune Worker Service includes a robust secrets management system that securely stores, retrieves, and injects secrets into action execution environments. Secrets are encrypted at rest in the database and decrypted on-demand during execution.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **SecretManager** (`crates/worker/src/secrets.rs`)
|
||||
- Core component responsible for secret operations
|
||||
- Handles fetching, decryption, and environment variable preparation
|
||||
- Integrated into `ActionExecutor` for seamless secret injection
|
||||
|
||||
2. **Database Storage** (`attune.key` table)
|
||||
- Stores secrets with ownership scoping (system, pack, action, sensor, identity)
|
||||
- Supports both encrypted and plaintext values
|
||||
- Tracks encryption key hash for validation
|
||||
|
||||
3. **Encryption System**
|
||||
- Uses **AES-256-GCM** for authenticated encryption
|
||||
- Derives encryption key from configured password using SHA-256
|
||||
- Generates random nonces for each encryption operation
|
||||
|
||||
## Secret Ownership Hierarchy
|
||||
|
||||
Secrets are organized in a hierarchical ownership model with increasing specificity:
|
||||
|
||||
### 1. System-Level Secrets
|
||||
- **Owner Type**: `system`
|
||||
- **Scope**: Available to all actions across all packs
|
||||
- **Use Case**: Global configuration (API endpoints, common credentials)
|
||||
|
||||
### 2. Pack-Level Secrets
|
||||
- **Owner Type**: `pack`
|
||||
- **Scope**: Available to all actions within a specific pack
|
||||
- **Use Case**: Pack-specific credentials, service endpoints
|
||||
|
||||
### 3. Action-Level Secrets
|
||||
- **Owner Type**: `action`
|
||||
- **Scope**: Available only to a specific action
|
||||
- **Use Case**: Action-specific credentials, sensitive parameters
|
||||
|
||||
### Override Behavior
|
||||
|
||||
When an action is executed, secrets are fetched in the following order:
|
||||
1. System secrets
|
||||
2. Pack secrets (override system secrets with same name)
|
||||
3. Action secrets (override pack/system secrets with same name)
|
||||
|
||||
This allows for flexible secret management where more specific secrets override less specific ones.
|
||||
|
||||
## Encryption Format
|
||||
|
||||
### Encrypted Value Format
|
||||
```
|
||||
nonce:ciphertext
|
||||
```
|
||||
|
||||
Both components are Base64-encoded:
|
||||
- **Nonce**: 12-byte random value (96 bits) for AES-GCM
|
||||
- **Ciphertext**: Encrypted payload with authentication tag
|
||||
|
||||
Example:
|
||||
```
|
||||
Xk3mP9qRsT6uVwYz:SGVsbG8gV29ybGQhIFRoaXMgaXMgYW4gZW5jcnlwdGVkIG1lc3NhZ2U=
|
||||
```
|
||||
|
||||
### Encryption Key Derivation
|
||||
|
||||
The encryption key is derived from the configured password using SHA-256:
|
||||
|
||||
```
|
||||
encryption_key = SHA256(password)
|
||||
```
|
||||
|
||||
This produces a 32-byte (256-bit) key suitable for AES-256.
|
||||
|
||||
### Key Hash Validation
|
||||
|
||||
Each encrypted secret can optionally store the hash of the encryption key used to encrypt it:
|
||||
|
||||
```
|
||||
key_hash = SHA256(encryption_key)
|
||||
```
|
||||
|
||||
This allows validation that the correct key is being used for decryption.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Security Configuration
|
||||
|
||||
Add to your `config.yaml`:
|
||||
|
||||
```yaml
|
||||
security:
|
||||
# Encryption key for secrets (REQUIRED for encrypted secrets)
|
||||
encryption_key: "your-secret-encryption-password-here"
|
||||
|
||||
# Or use environment variable
|
||||
# ATTUNE__SECURITY__ENCRYPTION_KEY=your-secret-encryption-password-here
|
||||
```
|
||||
|
||||
⚠️ **Important Security Notes:**
|
||||
- The encryption key should be a strong, random password (minimum 32 characters recommended)
|
||||
- Store the encryption key securely (e.g., using a secrets manager, not in version control)
|
||||
- If the encryption key is lost, encrypted secrets cannot be recovered
|
||||
- Changing the encryption key requires re-encrypting all secrets
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Override configuration via environment variables:
|
||||
|
||||
```bash
|
||||
export ATTUNE__SECURITY__ENCRYPTION_KEY="your-encryption-key"
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Storing Secrets (via API)
|
||||
|
||||
#### System-Level Secret
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/keys \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"ref": "system.api_endpoint",
|
||||
"owner_type": "system",
|
||||
"name": "api_endpoint",
|
||||
"value": "https://api.example.com",
|
||||
"encrypted": false
|
||||
}'
|
||||
```
|
||||
|
||||
#### Pack-Level Secret (Encrypted)
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/keys \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"ref": "mypack.api_key",
|
||||
"owner_type": "pack",
|
||||
"owner_pack": 1,
|
||||
"name": "api_key",
|
||||
"value": "sk_live_abc123def456",
|
||||
"encrypted": true
|
||||
}'
|
||||
```
|
||||
|
||||
#### Action-Level Secret
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/keys \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"ref": "mypack.myaction.oauth_token",
|
||||
"owner_type": "action",
|
||||
"owner_action": 42,
|
||||
"name": "oauth_token",
|
||||
"value": "ya29.a0AfH6SMBx...",
|
||||
"encrypted": true
|
||||
}'
|
||||
```
|
||||
|
||||
### Accessing Secrets in Actions
|
||||
|
||||
Secrets are automatically injected as environment variables during execution. The secret name is converted to uppercase and prefixed with `SECRET_`.
|
||||
|
||||
#### Python Action Example
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
import os
|
||||
|
||||
# Access secrets via environment variables
|
||||
api_key = os.environ.get('SECRET_API_KEY')
|
||||
db_password = os.environ.get('SECRET_DB_PASSWORD')
|
||||
oauth_token = os.environ.get('SECRET_OAUTH_TOKEN')
|
||||
|
||||
if not api_key:
|
||||
print("Error: SECRET_API_KEY not found")
|
||||
exit(1)
|
||||
|
||||
# Use the secrets
|
||||
print(f"Connecting to API with key: {api_key[:8]}...")
|
||||
```
|
||||
|
||||
#### Shell Action Example
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Access secrets
|
||||
echo "API Key: ${SECRET_API_KEY:0:8}..."
|
||||
echo "Database: ${SECRET_DB_HOST}"
|
||||
|
||||
# Use in commands
|
||||
curl -H "Authorization: Bearer $SECRET_API_TOKEN" \
|
||||
https://api.example.com/data
|
||||
```
|
||||
|
||||
### Environment Variable Naming Rules
|
||||
|
||||
Secret names are transformed as follows:
|
||||
- Prefix: `SECRET_`
|
||||
- Convert to uppercase
|
||||
- Replace hyphens with underscores
|
||||
|
||||
Examples:
|
||||
- `api_key` → `SECRET_API_KEY`
|
||||
- `db-password` → `SECRET_DB_PASSWORD`
|
||||
- `oauth_token` → `SECRET_OAUTH_TOKEN`
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### 1. Encryption Key Management
|
||||
- **Generate Strong Keys**: Use at least 32 random characters
|
||||
- **Secure Storage**: Store in a secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.)
|
||||
- **Rotation**: Plan for key rotation (requires re-encrypting all secrets)
|
||||
- **Backup**: Keep encrypted backup of the encryption key
|
||||
|
||||
### 2. Secret Storage
|
||||
- **Always Encrypt Sensitive Data**: Use `encrypted: true` for passwords, tokens, API keys
|
||||
- **Plaintext for Non-Sensitive**: Use `encrypted: false` for URLs, usernames, configuration
|
||||
- **Least Privilege**: Use action-level secrets for the most sensitive data
|
||||
|
||||
### 3. Action Development
|
||||
- **Never Log Secrets**: Avoid printing secret values in action output
|
||||
- **Mask in Errors**: Don't include secrets in error messages
|
||||
- **Clear After Use**: In long-running processes, clear secrets from memory when done
|
||||
|
||||
### 4. Access Control
|
||||
- **RBAC**: Limit who can create/read secrets using Attune's RBAC system
|
||||
- **Audit Logging**: Enable audit logging for secret access (future feature)
|
||||
- **Regular Reviews**: Periodically review and rotate secrets
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Encryption Process
|
||||
|
||||
```rust
|
||||
// 1. Derive encryption key from password
|
||||
let key = SHA256(password);
|
||||
|
||||
// 2. Generate random nonce
|
||||
let nonce = random_bytes(12);
|
||||
|
||||
// 3. Encrypt plaintext
|
||||
let ciphertext = AES256GCM.encrypt(key, nonce, plaintext);
|
||||
|
||||
// 4. Format as "nonce:ciphertext" (base64-encoded)
|
||||
let encrypted_value = format!("{}:{}",
|
||||
base64(nonce),
|
||||
base64(ciphertext)
|
||||
);
|
||||
```
|
||||
|
||||
### Decryption Process
|
||||
|
||||
```rust
|
||||
// 1. Parse "nonce:ciphertext" format
|
||||
let (nonce_b64, ciphertext_b64) = encrypted_value.split_once(':');
|
||||
let nonce = base64_decode(nonce_b64);
|
||||
let ciphertext = base64_decode(ciphertext_b64);
|
||||
|
||||
// 2. Validate encryption key hash (if present)
|
||||
if key_hash != SHA256(encryption_key) {
|
||||
return Error("Key mismatch");
|
||||
}
|
||||
|
||||
// 3. Decrypt ciphertext
|
||||
let plaintext = AES256GCM.decrypt(encryption_key, nonce, ciphertext);
|
||||
```
|
||||
|
||||
### Secret Injection Flow
|
||||
|
||||
```
|
||||
1. ActionExecutor prepares execution context
|
||||
2. SecretManager fetches secrets for action
|
||||
a. Query system-level secrets
|
||||
b. Query pack-level secrets
|
||||
c. Query action-level secrets
|
||||
d. Merge with later overriding earlier
|
||||
3. Decrypt encrypted secrets
|
||||
4. Transform to environment variables
|
||||
5. Inject into execution context
|
||||
6. Action executes with secrets available
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No encryption key configured"
|
||||
**Problem**: Worker service cannot decrypt secrets.
|
||||
|
||||
**Solution**: Set the encryption key in configuration:
|
||||
```yaml
|
||||
security:
|
||||
encryption_key: "your-encryption-key-here"
|
||||
```
|
||||
|
||||
### "Encryption key hash mismatch"
|
||||
**Problem**: The encryption key used to decrypt doesn't match the key used to encrypt.
|
||||
|
||||
**Solution**:
|
||||
- Verify you're using the correct encryption key
|
||||
- Check if encryption key was recently changed
|
||||
- May need to re-encrypt secrets with new key
|
||||
|
||||
### "Decryption failed"
|
||||
**Problem**: Secret cannot be decrypted.
|
||||
|
||||
**Causes**:
|
||||
- Wrong encryption key
|
||||
- Corrupted encrypted value
|
||||
- Invalid format
|
||||
|
||||
**Solution**:
|
||||
- Verify encryption key is correct
|
||||
- Check secret value format (should be "nonce:ciphertext")
|
||||
- Try re-encrypting the secret
|
||||
|
||||
### Secrets Not Available in Action
|
||||
**Problem**: Environment variables like `SECRET_API_KEY` are not set.
|
||||
|
||||
**Checklist**:
|
||||
- Verify secret exists in database with correct owner type
|
||||
- Check secret name matches expected format
|
||||
- Ensure action's pack has access to the secret
|
||||
- Check worker logs for "Failed to fetch secrets" warnings
|
||||
|
||||
## API Reference
|
||||
|
||||
### SecretManager Methods
|
||||
|
||||
#### `fetch_secrets_for_action(action: &Action) -> Result<HashMap<String, String>>`
|
||||
Fetches all secrets relevant to an action (system + pack + action level).
|
||||
|
||||
#### `encrypt_value(plaintext: &str) -> Result<String>`
|
||||
Encrypts a plaintext value using the configured encryption key.
|
||||
|
||||
#### `prepare_secret_env(secrets: &HashMap<String, String>) -> HashMap<String, String>`
|
||||
Transforms secret names to environment variable format.
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
- [ ] Secret versioning and rollback
|
||||
- [ ] Audit logging for secret access
|
||||
- [ ] Integration with external secret managers (Vault, AWS Secrets Manager)
|
||||
- [ ] Automatic secret rotation
|
||||
- [ ] Secret expiration and TTL
|
||||
- [ ] Multi-key encryption (key per pack/action)
|
||||
- [ ] Secret templates and inheritance
|
||||
|
||||
### Under Consideration
|
||||
- [ ] Dynamic secret generation
|
||||
- [ ] Just-in-time secret provisioning
|
||||
- [ ] Secret usage analytics
|
||||
- [ ] Integration with certificate management
|
||||
|
||||
## References
|
||||
|
||||
- [AES-GCM Encryption](https://en.wikipedia.org/wiki/Galois/Counter_Mode)
|
||||
- [NIST SP 800-38D](https://csrc.nist.gov/publications/detail/sp/800-38d/final) - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM)
|
||||
- [Key Management Best Practices](https://www.owasp.org/index.php/Key_Management_Cheat_Sheet)
|
||||
273
docs/authentication/security-review-2024-01-02.md
Normal file
273
docs/authentication/security-review-2024-01-02.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# Security Review: StackStorm Pitfall Analysis
|
||||
**Date:** 2024-01-02
|
||||
**Classification:** CONFIDENTIAL - Security Review
|
||||
**Status:** CRITICAL ISSUES IDENTIFIED - PRODUCTION BLOCKED
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
A comprehensive security and architecture review of the Attune platform has identified **2 critical vulnerabilities** that must be addressed before any production deployment. This review was conducted by analyzing lessons learned from StackStorm (a similar automation platform) and comparing against our current implementation.
|
||||
|
||||
### Critical Findings
|
||||
|
||||
🔴 **CRITICAL - PRODUCTION BLOCKER**
|
||||
- **Secret Exposure Vulnerability (P0)**: User secrets are visible to any system user with shell access
|
||||
- **Dependency Conflicts (P1)**: System upgrades can break existing user workflows
|
||||
|
||||
⚠️ **HIGH PRIORITY - v1.0 BLOCKER**
|
||||
- **Resource Exhaustion Risk (P1)**: Unbounded log collection can crash worker processes
|
||||
- **Limited Ecosystem Support (P2)**: No automated dependency management for user packs
|
||||
|
||||
✅ **GOOD NEWS**
|
||||
- 2 major pitfalls successfully avoided due to Rust implementation
|
||||
- Issues caught in development phase, before production deployment
|
||||
- Clear remediation path with detailed implementation plan
|
||||
|
||||
---
|
||||
|
||||
## Business Impact
|
||||
|
||||
### Immediate Impact (Next 4-6 Weeks)
|
||||
- **Production deployment BLOCKED** until critical security fix completed
|
||||
- **Timeline adjustment required**: +3-5 weeks to development schedule
|
||||
- **Resource allocation needed**: 1-2 senior engineers for remediation work
|
||||
|
||||
### Risk Assessment
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| Secret theft by malicious insider | High | Critical | Fix P0 immediately |
|
||||
| Customer workflow breaks on upgrade | High | High | Implement P1 before release |
|
||||
| Worker crashes under load | Medium | High | Implement P1 before release |
|
||||
| Limited pack ecosystem adoption | Medium | Medium | Address in v1.0 |
|
||||
|
||||
### Cost of Inaction
|
||||
|
||||
**If P0 (Secret Exposure) is not fixed:**
|
||||
- Any user with server access can steal API keys, passwords, credentials
|
||||
- Potential data breach with legal/compliance implications
|
||||
- Loss of customer trust and reputation damage
|
||||
- Regulatory violations (SOC 2, GDPR, etc.)
|
||||
|
||||
**If P1 (Dependency Conflicts) is not fixed:**
|
||||
- Customer workflows break unexpectedly during system maintenance
|
||||
- Increased support burden and customer frustration
|
||||
- Competitive disadvantage vs. alternatives (Temporal, Prefect)
|
||||
|
||||
---
|
||||
|
||||
## Technical Summary
|
||||
|
||||
### P0: Secret Exposure Vulnerability
|
||||
|
||||
**Current State:**
|
||||
```rust
|
||||
// Secrets passed as environment variables - INSECURE!
|
||||
cmd.env("SECRET_API_KEY", "my-secret-value"); // ← Visible to all users
|
||||
```
|
||||
|
||||
**Attack Vector:**
|
||||
Any user with SSH access can execute:
|
||||
```bash
|
||||
ps auxwwe | grep SECRET_ # Shows all secrets
|
||||
cat /proc/{pid}/environ # Shows all environment variables
|
||||
```
|
||||
|
||||
**Proposed Fix:**
|
||||
Pass secrets via stdin as JSON instead of environment variables.
|
||||
|
||||
**Effort:** 3-5 days
|
||||
**Priority:** P0 (BLOCKING ALL OTHER WORK)
|
||||
|
||||
---
|
||||
|
||||
### P1: Dependency Hell
|
||||
|
||||
**Current State:**
|
||||
All user packs share system Python runtime. When we upgrade Python for security patches, user code may break.
|
||||
|
||||
**Business Scenario:**
|
||||
1. Customer creates workflow using Python 3.9 libraries
|
||||
2. We upgrade server to Python 3.11 for security patch
|
||||
3. Customer's workflow breaks due to library incompatibilities
|
||||
4. Customer blames our platform for unreliability
|
||||
|
||||
**Proposed Fix:**
|
||||
Each pack gets isolated virtual environment with pinned dependencies.
|
||||
|
||||
**Effort:** 7-10 days
|
||||
**Priority:** P1 (REQUIRED FOR v1.0)
|
||||
|
||||
---
|
||||
|
||||
## Remediation Plan
|
||||
|
||||
### Phase 1: Security Critical (Week 1-2)
|
||||
**Fix secret passing vulnerability**
|
||||
- Estimated effort: 3-5 days
|
||||
- Priority: P0 - BLOCKS ALL OTHER WORK
|
||||
- Deliverable: Secrets passed securely via stdin
|
||||
- Verification: Security tests pass
|
||||
|
||||
### Phase 2: Dependency Isolation (Week 3-4)
|
||||
**Implement per-pack virtual environments**
|
||||
- Estimated effort: 7-10 days
|
||||
- Priority: P1 - REQUIRED FOR v1.0
|
||||
- Deliverable: Isolated Python environments per pack
|
||||
- Verification: System upgrade doesn't break packs
|
||||
|
||||
### Phase 3: Operational Hardening (Week 5-6)
|
||||
**Add log limits and language support**
|
||||
- Estimated effort: 8-11 days
|
||||
- Priority: P1-P2
|
||||
- Deliverable: Worker stability improvements
|
||||
- Verification: Worker handles large logs gracefully
|
||||
|
||||
**Total Timeline:** 3.5-5 weeks
|
||||
|
||||
---
|
||||
|
||||
## Resource Requirements
|
||||
|
||||
### Development Resources
|
||||
- **Primary:** 1 senior Rust engineer (full-time, 5 weeks)
|
||||
- **Secondary:** 1 senior engineer for code review (20% time)
|
||||
- **Security:** External security consultant (1 week for audit)
|
||||
- **Documentation:** Technical writer (part-time, 1 week)
|
||||
|
||||
### Infrastructure Resources
|
||||
- Staging environment for security testing
|
||||
- CI/CD pipeline updates for security checks
|
||||
- Penetration testing tools
|
||||
|
||||
### Budget Impact
|
||||
- **Engineering Time:** ~$50-70K (5 weeks × 2 engineers)
|
||||
- **Security Audit:** ~$10-15K
|
||||
- **Tools/Infrastructure:** ~$2-5K
|
||||
- **Total Estimated Cost:** $62-90K
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate Actions (This Week)
|
||||
1. ✅ **STOP** all production deployment plans
|
||||
2. **Communicate** timeline changes to stakeholders
|
||||
3. **Assign** engineering resources to remediation work
|
||||
4. **Schedule** security audit for Phase 1 completion
|
||||
|
||||
### Development Process Changes
|
||||
1. **Add security review** to design phase (before implementation)
|
||||
2. **Require security tests** in CI/CD pipeline
|
||||
3. **Mandate code review** for security-critical changes
|
||||
4. **Schedule quarterly** security audits
|
||||
|
||||
### Go/No-Go Criteria for v1.0
|
||||
- ✅ P0 (Secret Security) - MUST be fixed
|
||||
- ✅ P1 (Dependency Isolation) - MUST be fixed
|
||||
- ✅ P1 (Log Limits) - MUST be fixed
|
||||
- ⚠️ P2 (Language Support) - SHOULD be fixed
|
||||
- ✅ Security audit - MUST pass
|
||||
- ✅ All security tests - MUST pass
|
||||
|
||||
---
|
||||
|
||||
## Comparison with Alternatives
|
||||
|
||||
### How We Compare to Competitors
|
||||
|
||||
**vs. StackStorm:**
|
||||
- ✅ We identified and can fix these issues BEFORE production
|
||||
- ✅ Rust provides memory safety and type safety they lack
|
||||
- ⚠️ We risk repeating their mistakes if not careful
|
||||
|
||||
**vs. Temporal/Prefect:**
|
||||
- ✅ Our architecture is sound - just needs hardening
|
||||
- ⚠️ They have mature dependency isolation already
|
||||
- ⚠️ They've invested heavily in security features
|
||||
|
||||
**Market Impact:**
|
||||
Fixing these issues puts us on par with mature alternatives and positions Attune as a secure, enterprise-ready platform.
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Security Metrics (Post-Remediation)
|
||||
- 0 secrets visible in process table
|
||||
- 0 dependency conflicts between packs
|
||||
- 0 worker OOM incidents due to logs
|
||||
- 100% security test pass rate
|
||||
|
||||
### Business Metrics
|
||||
- No security incidents in first 6 months
|
||||
- <5% customer workflows broken by system upgrades
|
||||
- 95%+ uptime for worker processes
|
||||
- Positive security audit results
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
```
|
||||
Week 1-2: Phase 1 - Security Critical (P0)
|
||||
- Fix secret passing vulnerability
|
||||
- Security testing and verification
|
||||
|
||||
Week 3-4: Phase 2 - Dependency Isolation (P1)
|
||||
- Implement per-pack virtual environments
|
||||
- Integration testing
|
||||
|
||||
Week 5-6: Phase 3 - Operational Hardening (P1-P2)
|
||||
- Log size limits
|
||||
- Language support improvements
|
||||
- External security audit
|
||||
|
||||
Week 7: Final testing and v1.0 release candidate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stakeholder Communication
|
||||
|
||||
### For Engineering Leadership
|
||||
- **Message:** Critical issues found, but fixable. Timeline +5 weeks.
|
||||
- **Ask:** Approve resource allocation and budget for remediation
|
||||
- **Next Steps:** Kickoff meeting to assign tasks and set milestones
|
||||
|
||||
### For Product Management
|
||||
- **Message:** v1.0 delayed 5 weeks for critical security fixes
|
||||
- **Impact:** Better to delay than launch with vulnerabilities
|
||||
- **Benefit:** Enterprise-ready security features for market differentiation
|
||||
|
||||
### For Executive Team
|
||||
- **Message:** Security review prevented potential data breach
|
||||
- **Cost:** $62-90K and 5 weeks delay
|
||||
- **ROI:** Avoid reputational damage, legal liability, customer churn
|
||||
- **Decision Needed:** Approve timeline extension and budget increase
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
This security review has identified critical issues that would have caused significant problems in production. The good news is we caught them early, have a clear remediation plan, and the Rust architecture has already prevented other common pitfalls.
|
||||
|
||||
**Recommended Decision:** Approve the 3.5-5 week remediation timeline and allocate necessary resources to fix critical security issues before v1.0 release.
|
||||
|
||||
**Risk of NOT fixing:** Potential security breach, customer data loss, regulatory violations, and reputational damage far exceed the cost of remediation.
|
||||
|
||||
**Next Steps:**
|
||||
1. Review and approve remediation plan
|
||||
2. Assign engineering resources
|
||||
3. Communicate timeline changes
|
||||
4. Begin Phase 1 (Security Critical) work immediately
|
||||
|
||||
---
|
||||
|
||||
**Prepared By:** Engineering Team
|
||||
**Reviewed By:** [Pending]
|
||||
**Approved By:** [Pending]
|
||||
**Distribution:** Engineering Leadership, Product Management, Security Team
|
||||
|
||||
**CONFIDENTIAL - Do Not Distribute Outside Approved Recipients**
|
||||
782
docs/authentication/service-accounts.md
Normal file
782
docs/authentication/service-accounts.md
Normal file
@@ -0,0 +1,782 @@
|
||||
# Service Accounts and Transient API Tokens
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2025-01-27
|
||||
**Status:** Draft
|
||||
|
||||
## Overview
|
||||
|
||||
Service accounts provide programmatic access to the Attune API for sensors, action executions, and other automated processes. Unlike user accounts, service accounts:
|
||||
|
||||
- Have no password (token-based authentication only)
|
||||
- Have limited scopes (principle of least privilege)
|
||||
- Can be short-lived or long-lived depending on use case
|
||||
- Are not tied to a human user
|
||||
- Can be easily revoked without affecting user access
|
||||
|
||||
## Use Cases
|
||||
|
||||
1. **Sensors**: Long-lived tokens for sensor daemons to emit events
|
||||
2. **Action Executions**: Short-lived tokens scoped to a single execution
|
||||
3. **CLI Tools**: User-scoped tokens for command-line operations
|
||||
4. **Webhooks**: Tokens for external systems to trigger actions
|
||||
5. **Monitoring**: Tokens for health checks and metrics collection
|
||||
|
||||
## Token Types
|
||||
|
||||
### 1. Sensor Tokens
|
||||
|
||||
**Purpose**: Authentication for sensor daemon processes
|
||||
|
||||
**Characteristics**:
|
||||
- **Lifetime**: Long-lived (90 days, auto-expires)
|
||||
- **Scope**: `sensor`
|
||||
- **Permissions**: Create events, read rules/triggers for specific trigger types
|
||||
- **Revocable**: Yes (manual revocation via API)
|
||||
- **Renewable**: Yes (automatic refresh via API, no restart required)
|
||||
- **Rotation**: Automatic (sensor refreshes token when 80% of TTL elapsed)
|
||||
|
||||
**Example Usage**:
|
||||
```bash
|
||||
ATTUNE_API_TOKEN=sensor_abc123... ./attune-sensor --sensor-ref core.timer
|
||||
```
|
||||
|
||||
### 2. Action Execution Tokens
|
||||
|
||||
**Purpose**: Authentication for action scripts during execution
|
||||
|
||||
**Characteristics**:
|
||||
- **Lifetime**: Short-lived (matches execution timeout, typically 5-60 minutes)
|
||||
- **Scope**: `action_execution`
|
||||
- **Permissions**: Read keys, update execution status, limited to specific execution_id
|
||||
- **Revocable**: Yes (auto-revoked on execution completion or timeout)
|
||||
- **Renewable**: No (single-use, expires when execution completes or times out)
|
||||
- **Auto-Cleanup**: Token revocation records are auto-deleted after expiration
|
||||
|
||||
**Example Usage**:
|
||||
```python
|
||||
# Action script receives token via environment variable
|
||||
import os
|
||||
import requests
|
||||
|
||||
api_url = os.environ['ATTUNE_API_URL']
|
||||
api_token = os.environ['ATTUNE_API_TOKEN']
|
||||
execution_id = os.environ['ATTUNE_EXECUTION_ID']
|
||||
|
||||
# Fetch encrypted key
|
||||
response = requests.get(
|
||||
f"{api_url}/keys/myapp.api_key",
|
||||
headers={"Authorization": f"Bearer {api_token}"}
|
||||
)
|
||||
secret = response.json()['value']
|
||||
```
|
||||
|
||||
### 3. User CLI Tokens
|
||||
|
||||
**Purpose**: Authentication for CLI tools on behalf of a user
|
||||
|
||||
**Characteristics**:
|
||||
- **Lifetime**: Medium-lived (7-30 days)
|
||||
- **Scope**: `user`
|
||||
- **Permissions**: Full user permissions (RBAC-based)
|
||||
- **Revocable**: Yes
|
||||
- **Renewable**: Yes (via refresh token)
|
||||
|
||||
**Example Usage**:
|
||||
```bash
|
||||
attune auth login # Stores token in ~/.attune/token
|
||||
attune action execute core.echo --param message="Hello"
|
||||
```
|
||||
|
||||
### 4. Webhook Tokens
|
||||
|
||||
**Purpose**: Authentication for external systems calling Attune webhooks
|
||||
|
||||
**Characteristics**:
|
||||
- **Lifetime**: Long-lived (90-365 days, auto-expires)
|
||||
- **Scope**: `webhook`
|
||||
- **Permissions**: Trigger specific actions or create events
|
||||
- **Revocable**: Yes
|
||||
- **Renewable**: Yes (generate new token before expiration)
|
||||
- **Rotation**: Recommended every 90 days
|
||||
|
||||
**Example Usage**:
|
||||
```bash
|
||||
curl -X POST https://attune.example.com/api/webhooks/deploy \
|
||||
-H "Authorization: Bearer webhook_xyz789..." \
|
||||
-d '{"status": "deployed"}'
|
||||
```
|
||||
|
||||
## Token Scopes and Permissions
|
||||
|
||||
| Scope | Permissions | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| `admin` | Full access to all resources | System administrators, web UI |
|
||||
| `user` | RBAC-based permissions | CLI tools, user sessions |
|
||||
| `sensor` | Create events, read rules/triggers | Sensor daemons |
|
||||
| `action_execution` | Read keys, update execution (scoped to execution_id) | Action scripts |
|
||||
| `webhook` | Create events, trigger actions | External integrations |
|
||||
| `readonly` | Read-only access to all resources | Monitoring, auditing |
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Identity Table
|
||||
|
||||
Service accounts are stored in the `identity` table with `identity_type = 'service_account'`:
|
||||
|
||||
```sql
|
||||
CREATE TABLE identity (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
name VARCHAR(255) NOT NULL UNIQUE,
|
||||
identity_type identity_type NOT NULL, -- 'user' or 'service_account'
|
||||
email VARCHAR(255), -- NULL for service accounts
|
||||
password_hash VARCHAR(255), -- NULL for service accounts
|
||||
metadata JSONB DEFAULT '{}',
|
||||
created TIMESTAMPTZ DEFAULT NOW(),
|
||||
updated TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
Service account metadata includes:
|
||||
```json
|
||||
{
|
||||
"scope": "sensor",
|
||||
"description": "Timer sensor service account",
|
||||
"created_by": 1, // identity_id of creator
|
||||
"expires_at": "2025-04-27T12:34:56Z",
|
||||
"trigger_types": ["core.timer"], // For sensor scope
|
||||
"execution_id": 123 // For action_execution scope
|
||||
}
|
||||
```
|
||||
|
||||
### Token Storage
|
||||
|
||||
Tokens are **not** stored in the database (they are stateless JWTs). However, revocation is tracked:
|
||||
|
||||
```sql
|
||||
CREATE TABLE token_revocation (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
|
||||
token_jti VARCHAR(255) NOT NULL, -- JWT ID (jti claim)
|
||||
token_exp TIMESTAMPTZ NOT NULL, -- Token expiration (from exp claim)
|
||||
revoked_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
revoked_by BIGINT REFERENCES identity(id),
|
||||
reason VARCHAR(500),
|
||||
UNIQUE(token_jti)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_token_revocation_jti ON token_revocation(token_jti);
|
||||
CREATE INDEX idx_token_revocation_identity ON token_revocation(identity_id);
|
||||
CREATE INDEX idx_token_revocation_exp ON token_revocation(token_exp); -- For cleanup queries
|
||||
```
|
||||
|
||||
## JWT Token Format
|
||||
|
||||
### Claims
|
||||
|
||||
All service account tokens include these claims:
|
||||
|
||||
```json
|
||||
{
|
||||
"sub": "sensor:core.timer", // Subject: "type:name"
|
||||
"jti": "abc123...", // JWT ID (for revocation)
|
||||
"iat": 1706356496, // Issued at (Unix timestamp)
|
||||
"exp": 1714132496, // Expires at (Unix timestamp)
|
||||
"identity_id": 123,
|
||||
"identity_type": "service_account",
|
||||
"scope": "sensor",
|
||||
"metadata": {
|
||||
"trigger_types": ["core.timer"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Scope-Specific Claims
|
||||
|
||||
**Sensor tokens** (restricted to declared trigger types):
|
||||
```json
|
||||
{
|
||||
"scope": "sensor",
|
||||
"metadata": {
|
||||
"trigger_types": ["core.timer", "core.interval"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The API enforces that sensors can only create events for trigger types listed in `metadata.trigger_types`. Attempting to create an event for an unauthorized trigger type will result in a `403 Forbidden` error.
|
||||
|
||||
**Action execution tokens**:
|
||||
```json
|
||||
{
|
||||
"scope": "action_execution",
|
||||
"metadata": {
|
||||
"execution_id": 456,
|
||||
"action_ref": "core.echo",
|
||||
"workflow_id": 789 // Optional, if part of workflow
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Webhook tokens**:
|
||||
```json
|
||||
{
|
||||
"scope": "webhook",
|
||||
"metadata": {
|
||||
"allowed_paths": ["/webhooks/deploy", "/webhooks/alert"],
|
||||
"ip_whitelist": ["203.0.113.0/24"] // Optional
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Create Service Account
|
||||
|
||||
**Admin only**
|
||||
|
||||
```http
|
||||
POST /service-accounts
|
||||
Authorization: Bearer {admin_token}
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "sensor:core.timer",
|
||||
"scope": "sensor",
|
||||
"description": "Timer sensor service account",
|
||||
"ttl_days": 90, // Sensor tokens: 90 days, auto-refresh before expiration
|
||||
"metadata": {
|
||||
"trigger_types": ["core.timer"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"identity_id": 123,
|
||||
"name": "sensor:core.timer",
|
||||
"scope": "sensor",
|
||||
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"expires_at": "2025-04-27T12:34:56Z" // 90 days from now
|
||||
}
|
||||
```
|
||||
|
||||
**Important**: The token is only shown once. Store it securely.
|
||||
|
||||
### List Service Accounts
|
||||
|
||||
**Admin only**
|
||||
|
||||
```http
|
||||
GET /service-accounts
|
||||
Authorization: Bearer {admin_token}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"identity_id": 123,
|
||||
"name": "sensor:core.timer",
|
||||
"scope": "sensor",
|
||||
"created_at": "2025-01-27T12:34:56Z",
|
||||
"expires_at": "2025-04-27T12:34:56Z",
|
||||
"metadata": {
|
||||
"trigger_types": ["core.timer"]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Refresh Token (Self-Service)
|
||||
|
||||
**Sensor/User tokens can refresh themselves**
|
||||
|
||||
```http
|
||||
POST /auth/refresh
|
||||
Authorization: Bearer {current_token}
|
||||
Content-Type: application/json
|
||||
|
||||
{}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||
"expires_at": "2025-04-27T12:34:56Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- Current token must be valid (not expired, not revoked)
|
||||
- New token has same scope and metadata as current token
|
||||
- New token has same TTL as original token type (e.g., 90 days for sensors)
|
||||
- Old token remains valid until its original expiration (allows zero-downtime refresh)
|
||||
- Only `sensor` and `user` scopes can refresh (not `action_execution` or `webhook`)
|
||||
|
||||
### Revoke Service Account Token
|
||||
|
||||
**Admin only**
|
||||
|
||||
```http
|
||||
DELETE /service-accounts/{identity_id}
|
||||
Authorization: Bearer {admin_token}
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"reason": "Token compromised"
|
||||
}
|
||||
```
|
||||
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"message": "Service account revoked",
|
||||
"identity_id": 123
|
||||
}
|
||||
```
|
||||
|
||||
### Create Execution Token (Internal)
|
||||
|
||||
**Called by executor service, not exposed in API**
|
||||
|
||||
```rust
|
||||
// In executor service
|
||||
let execution_timeout_minutes = get_action_timeout(action_ref); // e.g., 30 minutes
|
||||
let token = create_execution_token(
|
||||
execution_id,
|
||||
action_ref,
|
||||
ttl_minutes: execution_timeout_minutes
|
||||
)?;
|
||||
```
|
||||
|
||||
This token is passed to the worker service, which injects it into the action's environment.
|
||||
|
||||
## Token Creation Workflow
|
||||
|
||||
### 1. Sensor Token Creation
|
||||
|
||||
```
|
||||
Admin → POST /service-accounts (scope=sensor) → API
|
||||
API → Create identity record → Database
|
||||
API → Generate JWT with sensor scope → Response
|
||||
Admin → Store token in secure config → Sensor deployment
|
||||
Sensor → Use token for API calls → Event emission
|
||||
```
|
||||
|
||||
### 2. Execution Token Creation
|
||||
|
||||
```
|
||||
Rule fires → Executor creates enforcement → Executor
|
||||
Executor → Schedule execution → Database
|
||||
Executor → Create execution token (internal) → JWT library
|
||||
Executor → Send execution request to worker → RabbitMQ
|
||||
Worker → Receive message with token → Action runner
|
||||
Action → Use token to fetch keys → API
|
||||
Execution completes → Token expires (TTL) → Automatic cleanup
|
||||
```
|
||||
|
||||
## Token Validation
|
||||
|
||||
### Middleware (API Service)
|
||||
|
||||
```rust
|
||||
// In API service
|
||||
pub async fn validate_token(
|
||||
token: &str,
|
||||
required_scope: Option<&str>
|
||||
) -> Result<Claims> {
|
||||
// 1. Verify JWT signature
|
||||
let claims = decode_jwt(token)?;
|
||||
|
||||
// 2. Check expiration (JWT library handles this, but explicit check for clarity)
|
||||
if claims.exp < now() {
|
||||
return Err(Error::TokenExpired);
|
||||
}
|
||||
|
||||
// 3. Check revocation (only check non-expired tokens)
|
||||
if is_revoked(&claims.jti, claims.exp).await? {
|
||||
return Err(Error::TokenRevoked);
|
||||
}
|
||||
|
||||
// 4. Check scope
|
||||
if let Some(scope) = required_scope {
|
||||
if claims.scope != scope {
|
||||
return Err(Error::InsufficientPermissions);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(claims)
|
||||
}
|
||||
```
|
||||
|
||||
### Scope-Based Authorization
|
||||
|
||||
```rust
|
||||
// Execution-scoped token can only access its own execution
|
||||
if claims.scope == "action_execution" {
|
||||
let allowed_execution_id = claims.metadata
|
||||
.get("execution_id")
|
||||
.and_then(|v| v.as_i64())
|
||||
.ok_or(Error::InvalidToken)?;
|
||||
|
||||
if execution_id != allowed_execution_id {
|
||||
return Err(Error::InsufficientPermissions);
|
||||
}
|
||||
}
|
||||
|
||||
// Sensor-scoped token can only create events for declared trigger types
|
||||
if claims.scope == "sensor" {
|
||||
let allowed_trigger_types = claims.metadata
|
||||
.get("trigger_types")
|
||||
.and_then(|v| v.as_array())
|
||||
.ok_or(Error::InvalidToken)?;
|
||||
|
||||
let allowed_types: Vec<String> = allowed_trigger_types
|
||||
.iter()
|
||||
.filter_map(|v| v.as_str().map(String::from))
|
||||
.collect();
|
||||
|
||||
if !allowed_types.contains(&trigger_type) {
|
||||
return Err(Error::InsufficientPermissions);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Token Generation
|
||||
**Generation:**
|
||||
|
||||
1. **Use Strong Secrets**: JWT signing key must be 256+ bits, randomly generated
|
||||
2. **Include JTI**: Always include `jti` claim for revocation support
|
||||
3. **REQUIRED Expiration**: All tokens MUST have `exp` claim - no exceptions
|
||||
- Sensor tokens: 90 days (auto-refresh before expiration)
|
||||
- Action execution tokens: Match execution timeout (5-60 minutes)
|
||||
- User CLI tokens: 7-30 days (auto-refresh before expiration)
|
||||
- Webhook tokens: 90-365 days (manual rotation)
|
||||
4. **Minimal Scope**: Grant least privilege necessary
|
||||
5. **Restrict Trigger Types**: For sensor tokens, only include necessary trigger types in metadata
|
||||
|
||||
### Token Storage
|
||||
|
||||
1. **Environment Variables**: Preferred method for sensors and actions
|
||||
2. **Never Log**: Redact tokens from logs (show only last 4 chars)
|
||||
3. **Never Commit**: Don't commit tokens to version control
|
||||
4. **Secure Config**: Store in encrypted config management (Vault, k8s secrets)
|
||||
|
||||
### Token Transmission
|
||||
|
||||
1. **HTTPS Only**: Never send tokens over unencrypted connections
|
||||
2. **Authorization Header**: Use `Authorization: Bearer {token}` header
|
||||
3. **No Query Params**: Don't pass tokens in URL query parameters
|
||||
4. **No Cookies**: For service accounts, avoid cookie-based auth
|
||||
|
||||
### Token Revocation
|
||||
|
||||
1. **Immediate Revocation**: Check revocation list on every request
|
||||
2. **Audit Trail**: Log who revoked, when, and why
|
||||
3. **Cascade Delete**: Revoke all tokens when service account is deleted
|
||||
4. **Automatic Cleanup**: Delete revocation records for expired tokens (run hourly)
|
||||
- Query: `DELETE FROM token_revocation WHERE token_exp < NOW()`
|
||||
- Prevents indefinite table bloat
|
||||
- Expired tokens are already invalid, no need to track revocation
|
||||
5. **Validate Permissions**: Enforce trigger type restrictions for sensor tokens on event creation
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Add `identity_type` enum to database schema
|
||||
- [ ] Add `token_revocation` table (with `token_exp` column)
|
||||
- [ ] Create `POST /service-accounts` endpoint
|
||||
- [ ] Create `GET /service-accounts` endpoint
|
||||
- [ ] Create `DELETE /service-accounts/{id}` endpoint
|
||||
- [ ] Create `POST /auth/refresh` endpoint (for automatic token refresh)
|
||||
- [ ] Add scope validation middleware
|
||||
- [ ] Add token revocation check middleware (skip check for expired tokens)
|
||||
- [ ] Implement execution token creation in executor (TTL = action timeout)
|
||||
- [ ] Pass execution token to worker via RabbitMQ
|
||||
- [ ] Inject execution token into action environment
|
||||
- [ ] Add CLI commands: `attune service-account create/list/revoke`
|
||||
- [ ] Document token creation for sensor deployment
|
||||
- [ ] Implement automatic token refresh in sensors (refresh at 80% of TTL)
|
||||
- [ ] Implement cleanup job for expired token revocations (hourly cron)
|
||||
|
||||
## Migration Path
|
||||
|
||||
### Phase 1: Database Schema
|
||||
|
||||
```sql
|
||||
-- Add identity_type enum if not exists
|
||||
DO $$ BEGIN
|
||||
CREATE TYPE identity_type AS ENUM ('user', 'service_account');
|
||||
EXCEPTION
|
||||
WHEN duplicate_object THEN null;
|
||||
END $$;
|
||||
|
||||
-- Add identity_type column to identity table
|
||||
ALTER TABLE identity
|
||||
ADD COLUMN IF NOT EXISTS identity_type identity_type DEFAULT 'user';
|
||||
|
||||
-- Create token_revocation table
|
||||
CREATE TABLE IF NOT EXISTS token_revocation (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
identity_id BIGINT NOT NULL REFERENCES identity(id) ON DELETE CASCADE,
|
||||
token_jti VARCHAR(255) NOT NULL,
|
||||
token_exp TIMESTAMPTZ NOT NULL, -- For cleanup queries
|
||||
revoked_at TIMESTAMPTZ DEFAULT NOW(),
|
||||
revoked_by BIGINT REFERENCES identity(id),
|
||||
reason VARCHAR(500),
|
||||
UNIQUE(token_jti)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_token_revocation_jti ON token_revocation(token_jti);
|
||||
CREATE INDEX IF NOT EXISTS idx_token_revocation_exp ON token_revocation(token_exp);
|
||||
```
|
||||
|
||||
### Phase 2: API Implementation
|
||||
|
||||
1. Add service account repository
|
||||
2. Add JWT utilities for scope-based tokens
|
||||
3. Implement service account CRUD endpoints
|
||||
4. Add middleware for token validation and revocation
|
||||
|
||||
### Phase 3: Integration
|
||||
|
||||
1. Update executor to create execution tokens
|
||||
2. Update worker to receive and use execution tokens
|
||||
3. Update sensor to accept and use sensor tokens
|
||||
4. Update CLI to support service account management
|
||||
|
||||
## Examples
|
||||
|
||||
### Python Action Using Execution Token
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
import os
|
||||
import requests
|
||||
import sys
|
||||
|
||||
# Token is injected by worker
|
||||
api_url = os.environ['ATTUNE_API_URL']
|
||||
api_token = os.environ['ATTUNE_API_TOKEN']
|
||||
execution_id = os.environ['ATTUNE_EXECUTION_ID']
|
||||
|
||||
# Fetch encrypted secret
|
||||
response = requests.get(
|
||||
f"{api_url}/keys/myapp.database_password",
|
||||
headers={"Authorization": f"Bearer {api_token}"}
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
print(f"Failed to fetch key: {response.text}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
db_password = response.json()['value']
|
||||
|
||||
# Use the secret...
|
||||
print("Successfully connected to database")
|
||||
```
|
||||
|
||||
### Sensor Using Sensor Token
|
||||
|
||||
```rust
|
||||
// In sensor initialization
|
||||
let api_token = env::var("ATTUNE_API_TOKEN")?;
|
||||
let api_url = env::var("ATTUNE_API_URL")?;
|
||||
|
||||
let client = reqwest::Client::new();
|
||||
|
||||
// Fetch active rules
|
||||
let response = client
|
||||
.get(format!("{}/rules?trigger_type=core.timer", api_url))
|
||||
.header("Authorization", format!("Bearer {}", api_token))
|
||||
.send()
|
||||
.await?;
|
||||
|
||||
let rules: Vec<Rule> = response.json().await?;
|
||||
```
|
||||
|
||||
## Token Lifecycle Management
|
||||
|
||||
### Expiration Strategy
|
||||
|
||||
**All tokens MUST expire** to prevent indefinite revocation table bloat and reduce attack surface:
|
||||
|
||||
| Token Type | Expiration | Rationale |
|
||||
|------------|------------|-----------|
|
||||
| Sensor | 90 days | Perpetually running service, auto-refresh before expiration |
|
||||
| Action Execution | 5-60 minutes | Matches action timeout, auto-cleanup on completion |
|
||||
| User CLI | 7-30 days | Balance between convenience and security, auto-refresh |
|
||||
| Webhook | 90-365 days | External integration, manual rotation required |
|
||||
|
||||
### Revocation Table Cleanup
|
||||
|
||||
Cleanup job runs hourly to prevent table bloat:
|
||||
|
||||
```sql
|
||||
-- Delete revocation records for expired tokens
|
||||
DELETE FROM token_revocation
|
||||
WHERE token_exp < NOW();
|
||||
```
|
||||
|
||||
**Why this works:**
|
||||
- Expired tokens are already invalid (enforced by JWT `exp` claim)
|
||||
- No need to track revocation status for invalid tokens
|
||||
- Keeps revocation table small and queries fast
|
||||
- Typical size: <1000 rows instead of millions
|
||||
|
||||
### Sensor Token Refresh
|
||||
|
||||
Sensors automatically refresh their own tokens without human intervention:
|
||||
|
||||
**Automatic Process:**
|
||||
1. Sensor starts with 90-day token
|
||||
2. Background task monitors token expiration
|
||||
3. When 80% of TTL elapsed (72 days), sensor requests new token via `POST /auth/refresh`
|
||||
4. New token is hot-loaded without restart
|
||||
5. Old token remains valid until original expiration
|
||||
6. Process repeats indefinitely
|
||||
|
||||
**Refresh Timing Example:**
|
||||
- Token issued: Day 0, expires Day 90
|
||||
- Refresh trigger: Day 72 (80% of 90 days)
|
||||
- New token issued: Day 72, expires Day 162
|
||||
- Old token still valid: Day 72-90 (overlap period)
|
||||
- Next refresh: Day 144 (80% of new token)
|
||||
|
||||
**Zero-Downtime:**
|
||||
- No service interruption during refresh
|
||||
- Old token valid during transition
|
||||
- Graceful fallback on refresh failure
|
||||
|
||||
## Cleanup Job Implementation
|
||||
|
||||
### Purpose
|
||||
|
||||
Prevent indefinite growth of the `token_revocation` table by removing revocation records for expired tokens.
|
||||
|
||||
### Why Cleanup Is Safe
|
||||
|
||||
- Expired tokens are already invalid (enforced by JWT `exp` claim)
|
||||
- Token validation checks expiration before checking revocation
|
||||
- No security risk in deleting expired token revocations
|
||||
- Significantly reduces table size and improves query performance
|
||||
|
||||
### Implementation
|
||||
|
||||
**Frequency**: Hourly cron job or background task
|
||||
|
||||
**SQL Query**:
|
||||
```sql
|
||||
DELETE FROM token_revocation
|
||||
WHERE token_exp < NOW();
|
||||
```
|
||||
|
||||
**Expected Impact**:
|
||||
- Typical table size: <1,000 rows instead of millions over time
|
||||
- Fast revocation checks (indexed queries on small dataset)
|
||||
- Reduced storage and backup costs
|
||||
|
||||
### Rust Implementation Example
|
||||
|
||||
```rust
|
||||
use tokio::time::{interval, Duration};
|
||||
|
||||
/// Background task to clean up expired token revocations
|
||||
pub async fn start_revocation_cleanup_task(db: PgPool) {
|
||||
let mut interval = interval(Duration::from_secs(3600)); // Every hour
|
||||
|
||||
loop {
|
||||
interval.tick().await;
|
||||
|
||||
match cleanup_expired_revocations(&db).await {
|
||||
Ok(count) => {
|
||||
info!("Cleaned up {} expired token revocations", count);
|
||||
}
|
||||
Err(e) => {
|
||||
error!("Failed to clean up expired token revocations: {}", e);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Delete token revocation records for expired tokens
|
||||
async fn cleanup_expired_revocations(db: &PgPool) -> Result<u64> {
|
||||
let result = sqlx::query!(
|
||||
"DELETE FROM token_revocation WHERE token_exp < NOW()"
|
||||
)
|
||||
.execute(db)
|
||||
.await?;
|
||||
|
||||
Ok(result.rows_affected())
|
||||
}
|
||||
```
|
||||
|
||||
### Monitoring
|
||||
|
||||
Track cleanup job metrics:
|
||||
- Number of records deleted per run
|
||||
- Job execution time
|
||||
- Job failures (alert if consecutive failures)
|
||||
|
||||
**Prometheus Metrics Example**:
|
||||
```rust
|
||||
// Define metrics
|
||||
lazy_static! {
|
||||
static ref REVOCATION_CLEANUP_COUNT: IntCounter = register_int_counter!(
|
||||
"attune_revocation_cleanup_total",
|
||||
"Total number of expired token revocations cleaned up"
|
||||
).unwrap();
|
||||
|
||||
static ref REVOCATION_CLEANUP_DURATION: Histogram = register_histogram!(
|
||||
"attune_revocation_cleanup_duration_seconds",
|
||||
"Duration of token revocation cleanup job"
|
||||
).unwrap();
|
||||
}
|
||||
|
||||
// In cleanup function
|
||||
let timer = REVOCATION_CLEANUP_DURATION.start_timer();
|
||||
let count = cleanup_expired_revocations(&db).await?;
|
||||
REVOCATION_CLEANUP_COUNT.inc_by(count);
|
||||
timer.observe_duration();
|
||||
```
|
||||
|
||||
### Alternative: Database Trigger
|
||||
|
||||
For automatic cleanup without application code:
|
||||
|
||||
```sql
|
||||
-- Create function to delete old revocations
|
||||
CREATE OR REPLACE FUNCTION cleanup_expired_token_revocations()
|
||||
RETURNS trigger AS $$
|
||||
BEGIN
|
||||
DELETE FROM token_revocation WHERE token_exp < NOW() - INTERVAL '1 hour';
|
||||
RETURN NULL;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
-- Trigger on insert (cleanup when new revocations are added)
|
||||
CREATE TRIGGER trigger_cleanup_expired_revocations
|
||||
AFTER INSERT ON token_revocation
|
||||
EXECUTE FUNCTION cleanup_expired_token_revocations();
|
||||
```
|
||||
|
||||
**Note**: Application-level cleanup is preferred for better observability and control.
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Rate Limiting**: Per-token rate limits to prevent abuse
|
||||
2. **Audit Logging**: Comprehensive audit trail of token usage and refresh events
|
||||
3. **OAuth 2.0**: Support OAuth 2.0 client credentials flow
|
||||
4. **mTLS**: Mutual TLS authentication for high-security deployments
|
||||
5. **Token Introspection**: RFC 7662-compliant token introspection endpoint
|
||||
6. **Scope Hierarchies**: More granular permission scopes
|
||||
7. **IP Whitelisting**: Restrict token usage to specific IP ranges
|
||||
8. **Configurable Refresh Timing**: Allow custom refresh thresholds per token type
|
||||
9. **Token Lineage Tracking**: Track token refresh chains for security audits
|
||||
8. **Refresh Failure Alerts**: Notify operators when automatic refresh fails
|
||||
9. **Token Lineage Tracking**: Track token refresh chains for audit purposes
|
||||
356
docs/authentication/token-refresh-quickref.md
Normal file
356
docs/authentication/token-refresh-quickref.md
Normal file
@@ -0,0 +1,356 @@
|
||||
# Token Refresh System - Quick Reference
|
||||
|
||||
**Last Updated:** 2025-01-27
|
||||
**Component:** Web UI Authentication
|
||||
|
||||
## Overview
|
||||
|
||||
The web UI implements automatic and proactive JWT token refresh to provide seamless authentication for active users.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ User Activity → API Request │
|
||||
│ ↓ │
|
||||
│ Axios Interceptor (adds JWT) │
|
||||
│ ↓ │
|
||||
│ Server Response │
|
||||
│ ├─ 200 OK → Continue │
|
||||
│ ├─ 401 Unauthorized → Auto-refresh & retry │
|
||||
│ └─ 403 Forbidden → Show permission error │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Background: Token Monitor (every 60s) │
|
||||
│ ↓ │
|
||||
│ Token expires in < 5 min? │
|
||||
│ ├─ Yes → Proactive refresh │
|
||||
│ └─ No → Continue monitoring │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. API Wrapper (`web/src/lib/api-wrapper.ts`)
|
||||
- **Purpose**: Configure axios with token refresh interceptors
|
||||
- **Features**:
|
||||
- Global axios defaults configuration
|
||||
- Request interceptor (adds token)
|
||||
- Response interceptor (handles 401/403)
|
||||
- Proactive refresh monitor
|
||||
|
||||
### 2. ErrorDisplay Component (`web/src/components/common/ErrorDisplay.tsx`)
|
||||
- **Purpose**: User-friendly error messages
|
||||
- **Distinguishes**:
|
||||
- 401: "Session expired" (handled automatically)
|
||||
- 403: "Access denied - insufficient permissions"
|
||||
- Other: Generic error with details
|
||||
|
||||
### 3. Auth Context (`web/src/contexts/AuthContext.tsx`)
|
||||
- **Purpose**: Manage authentication state
|
||||
- **Lifecycle**:
|
||||
- `user` set → Start token refresh monitor
|
||||
- `user` cleared → Stop token refresh monitor
|
||||
|
||||
## Token Lifecycle
|
||||
|
||||
### Access Token
|
||||
- **Duration**: 1 hour (configured on backend)
|
||||
- **Storage**: `localStorage.getItem('access_token')`
|
||||
- **Refresh Trigger**: Automatic on 401 response
|
||||
- **Proactive Refresh**: 5 minutes before expiration
|
||||
|
||||
### Refresh Token
|
||||
- **Duration**: 7 days (configured on backend)
|
||||
- **Storage**: `localStorage.getItem('refresh_token')`
|
||||
- **Used**: To obtain new access token
|
||||
- **Rotation**: Optional (backend can return new refresh token)
|
||||
|
||||
## Configuration
|
||||
|
||||
### Proactive Refresh Settings
|
||||
```typescript
|
||||
// File: web/src/lib/api-wrapper.ts
|
||||
|
||||
// Check every 60 seconds
|
||||
const MONITOR_INTERVAL = 60000; // ms
|
||||
|
||||
// Refresh if expiring within 5 minutes
|
||||
const REFRESH_THRESHOLD = 300; // seconds
|
||||
```
|
||||
|
||||
### API Endpoint
|
||||
```typescript
|
||||
// Refresh endpoint
|
||||
POST /auth/refresh
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"refresh_token": "..."
|
||||
}
|
||||
|
||||
// Response
|
||||
{
|
||||
"data": {
|
||||
"access_token": "...",
|
||||
"refresh_token": "..." // Optional - for rotation
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### 401 Unauthorized (Token Expired/Invalid)
|
||||
```typescript
|
||||
// Automatic handling:
|
||||
1. Interceptor detects 401
|
||||
2. Attempts token refresh with refresh_token
|
||||
3. On success: Retry original request
|
||||
4. On failure: Clear tokens, redirect to /login
|
||||
```
|
||||
|
||||
### 403 Forbidden (Insufficient Permissions)
|
||||
```typescript
|
||||
// Manual handling in components:
|
||||
<ErrorDisplay error={error} />
|
||||
// Shows: "Access Denied - You do not have permission..."
|
||||
```
|
||||
|
||||
### Network/Server Errors
|
||||
```typescript
|
||||
// Generic error display:
|
||||
<ErrorDisplay
|
||||
error={error}
|
||||
showRetry={true}
|
||||
onRetry={() => refetch()}
|
||||
/>
|
||||
```
|
||||
|
||||
## Usage in Components
|
||||
|
||||
### Detecting Error Types
|
||||
```typescript
|
||||
// In React components using TanStack Query
|
||||
const { data, error, isLoading } = useActions();
|
||||
|
||||
if (error) {
|
||||
// ErrorDisplay component handles type detection
|
||||
return <ErrorDisplay error={error} />;
|
||||
}
|
||||
```
|
||||
|
||||
### Custom Error Handling
|
||||
```typescript
|
||||
// Check for 403 errors
|
||||
const is403 = error?.response?.status === 403 ||
|
||||
error?.isAuthorizationError;
|
||||
|
||||
if (is403) {
|
||||
// Show permission-specific UI
|
||||
}
|
||||
|
||||
// Check for 401 errors (rare - usually handled by interceptor)
|
||||
const is401 = error?.response?.status === 401;
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
### Console Logs
|
||||
```bash
|
||||
# Initialization
|
||||
🔧 Initializing API wrapper
|
||||
✓ Axios defaults configured with interceptors
|
||||
✓ API wrapper initialized
|
||||
|
||||
# Token Refresh
|
||||
🔄 Access token expired, attempting refresh...
|
||||
✓ Token refreshed successfully
|
||||
|
||||
# Monitor
|
||||
🔄 Starting token refresh monitor
|
||||
✓ Token proactively refreshed
|
||||
⏹️ Stopping token refresh monitor
|
||||
|
||||
# Errors
|
||||
⚠️ No refresh token available, redirecting to login
|
||||
Token refresh failed, clearing session and redirecting to login
|
||||
Access forbidden - insufficient permissions for this resource
|
||||
```
|
||||
|
||||
### Browser DevTools
|
||||
```bash
|
||||
# Check tokens
|
||||
Application → Local Storage → localhost
|
||||
- access_token: "eyJ..."
|
||||
- refresh_token: "eyJ..."
|
||||
|
||||
# Watch refresh requests
|
||||
Network → Filter: refresh
|
||||
- POST /auth/refresh
|
||||
- Status: 200 OK
|
||||
- Response: { data: { access_token, refresh_token } }
|
||||
|
||||
# Monitor console
|
||||
Console → Filter: Token|refresh|Unauthorized
|
||||
```
|
||||
|
||||
## Common Scenarios
|
||||
|
||||
### Scenario 1: Active User
|
||||
```
|
||||
User logged in → Using app normally
|
||||
↓
|
||||
Every 60s: Monitor checks token expiration
|
||||
↓
|
||||
Token expires in 4 minutes
|
||||
↓
|
||||
Proactive refresh triggered
|
||||
↓
|
||||
User continues seamlessly (no interruption)
|
||||
```
|
||||
|
||||
### Scenario 2: Idle User Returns
|
||||
```
|
||||
User logged in → Leaves tab idle for 70 minutes
|
||||
↓
|
||||
Access token expired (after 60 min)
|
||||
↓
|
||||
User returns, clicks action
|
||||
↓
|
||||
API returns 401
|
||||
↓
|
||||
Interceptor attempts refresh
|
||||
↓
|
||||
If refresh token valid: Success, retry request
|
||||
If refresh token expired: Redirect to login
|
||||
```
|
||||
|
||||
### Scenario 3: Permission Denied
|
||||
```
|
||||
User logged in → Tries restricted action
|
||||
↓
|
||||
API returns 403 Forbidden
|
||||
↓
|
||||
ErrorDisplay shows: "Access Denied"
|
||||
↓
|
||||
User sees clear message (not "Unauthorized")
|
||||
```
|
||||
|
||||
### Scenario 4: Network Failure During Refresh
|
||||
```
|
||||
User action → 401 response → Refresh attempt
|
||||
↓
|
||||
Network error / API down
|
||||
↓
|
||||
Refresh fails → Tokens cleared
|
||||
↓
|
||||
Redirect to login
|
||||
↓
|
||||
SessionStorage saves current path
|
||||
↓
|
||||
After login → Redirect back to original page
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Manual Test: Token Expiration
|
||||
```bash
|
||||
# 1. Log in to web UI
|
||||
# 2. Open DevTools → Application → Local Storage
|
||||
# 3. Copy access_token value
|
||||
# 4. Decode at jwt.io - note expiration time
|
||||
# 5. Wait until near expiration
|
||||
# 6. Perform action (view page, click button)
|
||||
# 7. Watch Network tab for /auth/refresh call
|
||||
# 8. Verify action completes successfully
|
||||
```
|
||||
|
||||
### Manual Test: Permission Denied
|
||||
```bash
|
||||
# 1. Log in as limited user
|
||||
# 2. Try to access admin-only resource
|
||||
# 3. Verify: See "Access Denied" (not "Unauthorized")
|
||||
# 4. Verify: Amber/yellow UI (not red)
|
||||
# 5. Verify: Helpful message about permissions
|
||||
```
|
||||
|
||||
### Manual Test: Proactive Refresh
|
||||
```bash
|
||||
# 1. Log in
|
||||
# 2. Open Console
|
||||
# 3. Look for "🔄 Starting token refresh monitor"
|
||||
# 4. Wait 60 seconds
|
||||
# 5. If token expires within 5 min, see:
|
||||
# "✓ Token proactively refreshed"
|
||||
# 6. Logout
|
||||
# 7. See: "⏹️ Stopping token refresh monitor"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Redirect loop to /login
|
||||
**Cause**: Both access_token and refresh_token expired
|
||||
**Solution**: Expected behavior - user must log in again
|
||||
|
||||
### Issue: Token not refreshing automatically
|
||||
**Check**:
|
||||
1. Axios interceptors configured? → See console for init logs
|
||||
2. Token exists in localStorage?
|
||||
3. Refresh token valid?
|
||||
4. Network connectivity?
|
||||
5. Backend /auth/refresh endpoint working?
|
||||
|
||||
### Issue: Monitor not running
|
||||
**Check**:
|
||||
1. User authenticated? → Monitor only runs when `user` is set
|
||||
2. Check console for "Starting token refresh monitor"
|
||||
3. Verify AuthContext lifecycle in React DevTools
|
||||
|
||||
### Issue: Wrong error message (401 vs 403)
|
||||
**Check**:
|
||||
1. Using ErrorDisplay component?
|
||||
2. Error object has `response.status` property?
|
||||
3. Interceptor properly marking 403 errors?
|
||||
|
||||
## Security Notes
|
||||
|
||||
1. **Token Storage**: Currently uses localStorage
|
||||
- ✅ Works across tabs
|
||||
- ⚠️ Vulnerable to XSS
|
||||
- 🔒 Consider httpOnly cookies for production
|
||||
|
||||
2. **Token Exposure**: Tokens only in Authorization header
|
||||
- ✅ Never in URL parameters
|
||||
- ✅ Not logged to console
|
||||
|
||||
3. **Automatic Cleanup**: Failed refresh clears all tokens
|
||||
- ✅ No stale authentication state
|
||||
|
||||
4. **Single Sign-Out**: Clearing tokens stops all access
|
||||
- ✅ Immediate effect
|
||||
|
||||
## API Requirements
|
||||
|
||||
The backend must provide:
|
||||
|
||||
1. **Login Endpoint**: Returns access_token + refresh_token
|
||||
2. **Refresh Endpoint**: Accepts refresh_token, returns new access_token
|
||||
3. **Token Format**: Standard JWT with `exp` claim
|
||||
4. **Error Codes**:
|
||||
- 401 for expired/invalid tokens
|
||||
- 403 for permission denied
|
||||
|
||||
## Related Files
|
||||
|
||||
- `web/src/lib/api-wrapper.ts` - Core token refresh logic
|
||||
- `web/src/lib/api-client.ts` - Axios instance configuration
|
||||
- `web/src/components/common/ErrorDisplay.tsx` - Error UI
|
||||
- `web/src/contexts/AuthContext.tsx` - Auth state management
|
||||
- `web/src/pages/auth/LoginPage.tsx` - Login with redirect
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- Full details: `work-summary/2025-01-token-refresh-improvements.md`
|
||||
- Authentication: `docs/authentication/authentication.md`
|
||||
- Token rotation: `docs/authentication/token-rotation.md`
|
||||
479
docs/authentication/token-rotation.md
Normal file
479
docs/authentication/token-rotation.md
Normal file
@@ -0,0 +1,479 @@
|
||||
# Token Rotation Guide
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2025-01-27
|
||||
**Audience:** System Administrators, DevOps Engineers
|
||||
|
||||
## Overview
|
||||
|
||||
This guide provides procedures for rotating service account tokens in Attune to maintain security and prevent token revocation table bloat. All tokens in Attune have expiration times and require periodic rotation.
|
||||
|
||||
## Token Expiration Policy
|
||||
|
||||
**All tokens MUST expire.** This is a hard requirement to prevent:
|
||||
- Indefinite growth of the `token_revocation` table
|
||||
- Long-lived compromised credentials
|
||||
- Security debt accumulation
|
||||
|
||||
### Token Lifetimes
|
||||
|
||||
| Token Type | Lifetime | Rotation Frequency | Auto-Cleanup |
|
||||
|------------|----------|-------------------|--------------|
|
||||
| Sensor | 24-72 hours | Every 24-72 hours | Yes (on expiration) |
|
||||
| Action Execution | 5-60 minutes | N/A (single-use) | Yes (on completion) |
|
||||
| User CLI | 7-30 days | Every 7-30 days | No (manual revocation) |
|
||||
| Webhook | 90-365 days | Every 90-365 days | No (manual revocation) |
|
||||
|
||||
## Sensor Token Rotation
|
||||
|
||||
### Why Rotation is Required
|
||||
|
||||
Sensor tokens expire after 24-72 hours to:
|
||||
- Limit the impact of compromised credentials
|
||||
- Force regular security reviews
|
||||
- Prevent revocation table bloat
|
||||
- Align with security best practices
|
||||
|
||||
### Rotation Process
|
||||
|
||||
#### Manual Rotation (Current)
|
||||
|
||||
**Preparation:**
|
||||
```bash
|
||||
# Set admin token
|
||||
export ADMIN_TOKEN="your_admin_token"
|
||||
|
||||
# Note the current sensor name
|
||||
SENSOR_NAME="sensor:core.timer"
|
||||
```
|
||||
|
||||
**Step 1: Create New Service Account**
|
||||
|
||||
```bash
|
||||
# Create new token
|
||||
curl -X POST http://localhost:8080/service-accounts \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"name\": \"${SENSOR_NAME}\",
|
||||
\"scope\": \"sensor\",
|
||||
\"description\": \"Timer sensor (rotated $(date +%Y-%m-%d))\",
|
||||
\"ttl_hours\": 72,
|
||||
\"metadata\": {
|
||||
\"trigger_types\": [\"core.timer\"]
|
||||
}
|
||||
}"
|
||||
|
||||
# Save the response
|
||||
# {
|
||||
# "identity_id": 456,
|
||||
# "name": "sensor:core.timer",
|
||||
# "token": "eyJhbGci...", <-- COPY THIS
|
||||
# "expires_at": "2025-01-30T12:34:56Z"
|
||||
# }
|
||||
|
||||
export NEW_TOKEN="eyJhbGci..."
|
||||
```
|
||||
|
||||
**Step 2: Update Sensor Configuration**
|
||||
|
||||
**For systemd deployments:**
|
||||
```bash
|
||||
# Update environment file
|
||||
sudo nano /etc/attune/sensor-timer.env
|
||||
|
||||
# Replace old token with new token
|
||||
ATTUNE_API_TOKEN=eyJhbGci... # <-- NEW TOKEN HERE
|
||||
```
|
||||
|
||||
**For Docker/Kubernetes:**
|
||||
```bash
|
||||
# Update secret
|
||||
kubectl create secret generic sensor-timer-token \
|
||||
--from-literal=token="${NEW_TOKEN}" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# Or update Docker environment variable
|
||||
docker service update attune-core-timer-sensor \
|
||||
--env-add ATTUNE_API_TOKEN="${NEW_TOKEN}"
|
||||
```
|
||||
|
||||
**For environment variables:**
|
||||
```bash
|
||||
# Update environment variable
|
||||
export ATTUNE_API_TOKEN="${NEW_TOKEN}"
|
||||
```
|
||||
|
||||
**Step 3: Restart Sensor**
|
||||
|
||||
```bash
|
||||
# systemd
|
||||
sudo systemctl restart attune-core-timer-sensor
|
||||
|
||||
# Docker
|
||||
docker restart attune-core-timer-sensor
|
||||
|
||||
# Kubernetes
|
||||
kubectl rollout restart deployment/sensor-timer
|
||||
```
|
||||
|
||||
**Step 4: Verify New Token is Working**
|
||||
|
||||
```bash
|
||||
# Check sensor logs
|
||||
sudo journalctl -u attune-core-timer-sensor -f --since "1 minute ago"
|
||||
|
||||
# Look for:
|
||||
# - "API connectivity verified"
|
||||
# - "Connected to RabbitMQ"
|
||||
# - "Started consuming messages"
|
||||
# - No authentication errors
|
||||
```
|
||||
|
||||
**Step 5: Revoke Old Token (Optional)**
|
||||
|
||||
The old token will expire automatically after 72 hours. For immediate revocation:
|
||||
|
||||
```bash
|
||||
# Get old identity_id from previous creation response
|
||||
OLD_IDENTITY_ID=123
|
||||
|
||||
# Revoke old token
|
||||
curl -X DELETE http://localhost:8080/service-accounts/${OLD_IDENTITY_ID} \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"reason\": \"Token rotation\"
|
||||
}"
|
||||
```
|
||||
|
||||
### Rotation Schedule
|
||||
|
||||
**Recommended Schedule:**
|
||||
- **Production:** Every 48 hours (allows 24-hour margin before expiration)
|
||||
- **Staging:** Every 72 hours
|
||||
- **Development:** Every 72 hours
|
||||
|
||||
**Calendar Reminder:**
|
||||
Set up recurring calendar events or use cron to remind operators:
|
||||
|
||||
```bash
|
||||
# Add to crontab (runs every 48 hours)
|
||||
0 */48 * * * /usr/local/bin/rotate-sensor-token.sh
|
||||
```
|
||||
|
||||
### Monitoring Token Expiration
|
||||
|
||||
**Check Token Expiration:**
|
||||
|
||||
```bash
|
||||
# Decode JWT to check expiration
|
||||
echo "${ATTUNE_API_TOKEN}" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq -r '.exp'
|
||||
|
||||
# Output: 1738886400 (Unix timestamp)
|
||||
|
||||
# Convert to human-readable
|
||||
date -d @1738886400
|
||||
# Output: 2025-01-30 12:00:00
|
||||
```
|
||||
|
||||
**Set Up Alerts:**
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# check-token-expiration.sh
|
||||
# Run this hourly via cron
|
||||
|
||||
TOKEN="${ATTUNE_API_TOKEN}"
|
||||
EXP=$(echo "${TOKEN}" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq -r '.exp')
|
||||
NOW=$(date +%s)
|
||||
HOURS_REMAINING=$(( ($EXP - $NOW) / 3600 ))
|
||||
|
||||
if [ "$HOURS_REMAINING" -lt 6 ]; then
|
||||
echo "WARNING: Sensor token expires in ${HOURS_REMAINING} hours!"
|
||||
# Send alert to monitoring system
|
||||
curl -X POST https://monitoring.example.com/alerts \
|
||||
-d "message=Sensor token expires in ${HOURS_REMAINING} hours"
|
||||
fi
|
||||
```
|
||||
|
||||
**Add to crontab:**
|
||||
```bash
|
||||
0 * * * * /usr/local/bin/check-token-expiration.sh
|
||||
```
|
||||
|
||||
## Action Execution Token Lifecycle
|
||||
|
||||
Action execution tokens are automatically managed:
|
||||
|
||||
**Creation:** Executor service creates token when scheduling execution
|
||||
```rust
|
||||
let token = create_execution_token(
|
||||
execution_id,
|
||||
action_ref,
|
||||
ttl_minutes: action_timeout_minutes
|
||||
)?;
|
||||
```
|
||||
|
||||
**Usage:** Worker injects token into action environment
|
||||
```bash
|
||||
ATTUNE_API_TOKEN=eyJhbGci...
|
||||
ATTUNE_EXECUTION_ID=123
|
||||
```
|
||||
|
||||
**Expiration:** Token expires when execution times out or completes
|
||||
|
||||
**Cleanup:** Revocation record (if created) is automatically deleted after expiration
|
||||
|
||||
**No manual intervention required.**
|
||||
|
||||
## User CLI Token Rotation
|
||||
|
||||
### When to Rotate
|
||||
|
||||
- Every 7-30 days (based on TTL)
|
||||
- When user credentials change
|
||||
- When token is compromised
|
||||
- When user leaves organization
|
||||
|
||||
### Rotation Process
|
||||
|
||||
**Step 1: Login Again**
|
||||
|
||||
```bash
|
||||
# User logs in to get new token
|
||||
attune auth login
|
||||
|
||||
# Enter credentials
|
||||
# New token is stored in ~/.attune/token
|
||||
```
|
||||
|
||||
**Step 2: Verify New Token**
|
||||
|
||||
```bash
|
||||
# Test with simple command
|
||||
attune pack list
|
||||
|
||||
# Should succeed without errors
|
||||
```
|
||||
|
||||
**Old token is automatically revoked during login (if configured).**
|
||||
|
||||
## Webhook Token Rotation
|
||||
|
||||
### When to Rotate
|
||||
|
||||
- Every 90-365 days (based on TTL)
|
||||
- When webhook is compromised
|
||||
- When integrating system changes
|
||||
- During security audits
|
||||
|
||||
### Rotation Process
|
||||
|
||||
**Step 1: Create New Webhook Token**
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/service-accounts \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "webhook:deployment-notifications",
|
||||
"scope": "webhook",
|
||||
"description": "GitHub deployment webhook",
|
||||
"ttl_days": 90,
|
||||
"metadata": {
|
||||
"allowed_paths": ["/webhooks/deploy"]
|
||||
}
|
||||
}'
|
||||
|
||||
# Save the new token
|
||||
export NEW_WEBHOOK_TOKEN="eyJhbGci..."
|
||||
```
|
||||
|
||||
**Step 2: Update External System**
|
||||
|
||||
Update the webhook configuration in the external system (GitHub, GitLab, etc.) with the new token.
|
||||
|
||||
**Step 3: Test Webhook**
|
||||
|
||||
```bash
|
||||
# Send test webhook
|
||||
curl -X POST https://attune.example.com/webhooks/deploy \
|
||||
-H "Authorization: Bearer ${NEW_WEBHOOK_TOKEN}" \
|
||||
-d '{"status": "deployed"}'
|
||||
|
||||
# Should succeed
|
||||
```
|
||||
|
||||
**Step 4: Revoke Old Token**
|
||||
|
||||
After confirming the new token works:
|
||||
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8080/service-accounts/${OLD_IDENTITY_ID} \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}"
|
||||
```
|
||||
|
||||
## Automation Scripts
|
||||
|
||||
### Sensor Token Rotation Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# rotate-sensor-token.sh
|
||||
# Automated sensor token rotation
|
||||
|
||||
set -e
|
||||
|
||||
SENSOR_NAME="${1:-sensor:core.timer}"
|
||||
ADMIN_TOKEN="${ADMIN_TOKEN}"
|
||||
API_URL="${ATTUNE_API_URL:-http://localhost:8080}"
|
||||
|
||||
if [ -z "$ADMIN_TOKEN" ]; then
|
||||
echo "Error: ADMIN_TOKEN environment variable not set"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Rotating token for ${SENSOR_NAME}..."
|
||||
|
||||
# Create new token
|
||||
RESPONSE=$(curl -s -X POST "${API_URL}/service-accounts" \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"name\": \"${SENSOR_NAME}\",
|
||||
\"scope\": \"sensor\",
|
||||
\"description\": \"Auto-rotated $(date +%Y-%m-%d)\",
|
||||
\"ttl_hours\": 72,
|
||||
\"metadata\": {
|
||||
\"trigger_types\": [\"core.timer\"]
|
||||
}
|
||||
}")
|
||||
|
||||
NEW_TOKEN=$(echo "$RESPONSE" | jq -r '.token')
|
||||
EXPIRES_AT=$(echo "$RESPONSE" | jq -r '.expires_at')
|
||||
|
||||
if [ -z "$NEW_TOKEN" ] || [ "$NEW_TOKEN" = "null" ]; then
|
||||
echo "Error: Failed to create new token"
|
||||
echo "$RESPONSE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "New token created, expires at: ${EXPIRES_AT}"
|
||||
|
||||
# Update configuration file
|
||||
echo "ATTUNE_API_TOKEN=${NEW_TOKEN}" | sudo tee /etc/attune/sensor-timer.env
|
||||
|
||||
# Restart service
|
||||
echo "Restarting sensor service..."
|
||||
sudo systemctl restart attune-core-timer-sensor
|
||||
|
||||
# Wait for service to start
|
||||
sleep 5
|
||||
|
||||
# Check status
|
||||
if sudo systemctl is-active --quiet attune-core-timer-sensor; then
|
||||
echo "✓ Sensor token rotated successfully"
|
||||
echo " New token expires: ${EXPIRES_AT}"
|
||||
else
|
||||
echo "✗ Sensor failed to start, check logs"
|
||||
sudo journalctl -u attune-core-timer-sensor -n 50
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### Token Expiration Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# check-all-tokens.sh
|
||||
# Check expiration for all active service accounts
|
||||
|
||||
API_URL="${ATTUNE_API_URL:-http://localhost:8080}"
|
||||
ADMIN_TOKEN="${ADMIN_TOKEN}"
|
||||
WARN_HOURS=6
|
||||
|
||||
# Fetch all service accounts
|
||||
ACCOUNTS=$(curl -s -X GET "${API_URL}/service-accounts" \
|
||||
-H "Authorization: Bearer ${ADMIN_TOKEN}")
|
||||
|
||||
echo "$ACCOUNTS" | jq -r '.data[] | "\(.name)\t\(.expires_at)"' | \
|
||||
while IFS=$'\t' read -r name expires_at; do
|
||||
exp_timestamp=$(date -d "$expires_at" +%s)
|
||||
now=$(date +%s)
|
||||
hours_remaining=$(( ($exp_timestamp - $now) / 3600 ))
|
||||
|
||||
if [ "$hours_remaining" -lt "$WARN_HOURS" ]; then
|
||||
echo "⚠️ WARNING: ${name} expires in ${hours_remaining} hours (${expires_at})"
|
||||
else
|
||||
echo "✓ ${name} expires in ${hours_remaining} hours (${expires_at})"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Token expired" Error
|
||||
|
||||
**Symptom:** Sensor logs show "401 Unauthorized" or "Token expired"
|
||||
|
||||
**Solution:**
|
||||
1. Verify current time is correct: `date`
|
||||
2. Check token expiration: `echo $TOKEN | cut -d'.' -f2 | base64 -d | jq .exp`
|
||||
3. Create new token and restart sensor (see rotation process above)
|
||||
|
||||
### Sensor Won't Start After Rotation
|
||||
|
||||
**Symptom:** Sensor fails to start after updating token
|
||||
|
||||
**Troubleshooting:**
|
||||
1. Verify token is correctly formatted (JWT with 3 parts: header.payload.signature)
|
||||
2. Check token hasn't already expired
|
||||
3. Verify token has correct scope and metadata
|
||||
4. Check sensor logs for specific error message
|
||||
|
||||
### Token Revocation Table Growing Too Large
|
||||
|
||||
**Symptom:** `token_revocation` table has millions of rows
|
||||
|
||||
**Solution:**
|
||||
1. Ensure cleanup job is running (hourly)
|
||||
2. Manually run cleanup: `DELETE FROM token_revocation WHERE token_exp < NOW()`
|
||||
3. Verify all tokens have expiration set
|
||||
4. Check for tokens with very long TTLs
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Set Calendar Reminders:** Don't rely on memory, set recurring calendar events
|
||||
2. **Automate Where Possible:** Use cron jobs and scripts for rotation
|
||||
3. **Monitor Expiration:** Set up alerts 6-12 hours before expiration
|
||||
4. **Test Rotation:** Practice rotation in staging before production
|
||||
5. **Document Tokens:** Keep inventory of active service accounts and their purposes
|
||||
6. **Minimal TTL:** Use shortest acceptable TTL for each token type
|
||||
7. **Rotate on Compromise:** Immediately rotate if token is compromised
|
||||
8. **Clean Up:** Revoke old tokens after rotation (or let them expire)
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Never commit tokens to version control**
|
||||
- **Use encrypted storage for tokens** (e.g., Vault, AWS Secrets Manager)
|
||||
- **Rotate immediately if compromised**
|
||||
- **Audit token usage regularly**
|
||||
- **Minimize token scope and permissions**
|
||||
- **Use separate tokens for each sensor/webhook**
|
||||
- **Monitor for unauthorized token usage**
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
1. **Automatic Rotation:** Hot-reload tokens without sensor restart
|
||||
2. **Token Renewal API:** Extend token TTL without creating new token
|
||||
3. **Token Rotation Hooks:** Webhook notifications before expiration
|
||||
4. **Managed Tokens:** Orchestrator handles rotation automatically
|
||||
5. **Token Rotation Dashboard:** Web UI for monitoring and rotating tokens
|
||||
|
||||
## See Also
|
||||
|
||||
- [Service Accounts Documentation](./service-accounts.md)
|
||||
- [Sensor Interface Specification](./sensor-interface.md)
|
||||
- [Sensor Authentication Overview](./sensor-authentication-overview.md)
|
||||
- [Timer Sensor README](../crates/core-timer-sensor/README.md)
|
||||
Reference in New Issue
Block a user