re-uploading work

This commit is contained in:
2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions

View File

@@ -0,0 +1,247 @@
# Enforcement Message Routing Fix
**Date:** 2026-01-16
**Status:** ✅ Completed
## Problem
Executions were not being created despite:
- Timer triggers generating events successfully
- Rules matching events and creating enforcements
- All services running without errors
When querying the executions API:
```bash
curl -X 'GET' 'http://localhost:8080/api/v1/executions?page=1&per_page=50'
```
Response showed no executions:
```json
{"data":[],"pagination":{"page":1,"page_size":50,"total_items":0,"total_pages":0}}
```
Database investigation revealed:
- ✅ Events: Being created every 10 seconds (128+ events)
- ✅ Enforcements: Being created by rule matcher (multiple enforcements)
- ❌ Executions: Zero executions in database
## Root Cause
**Message routing mismatch** between sensor and executor services:
1. **Sensor Service** (rule matcher):
- Published `EnforcementCreated` messages to `attune.events` exchange
- Routing key: `enforcement.created`
2. **Executor Service** (enforcement processor):
- Consumed from `attune.executions.queue`
- Queue bound to `attune.executions` exchange
- Expected messages on `attune.executions` exchange
3. **Result**: Messages published to wrong exchange → never reached executor → no executions created
### Message Flow (Before Fix)
```
Sensor Rule Matcher
↓ (publishes EnforcementCreated)
attune.events exchange
↓ (routed to)
attune.events.queue
↓ (NOT consumed by executor)
[Messages accumulate, executor never sees them]
Executor Enforcement Processor
↓ (consumes from)
attune.executions.queue ← (bound to attune.executions exchange)
↓ (waiting for messages that never arrive)
[No messages received, no executions created]
```
## Solution
Changed `EnforcementCreated` message to use the correct exchange:
**File**: `crates/common/src/mq/messages.rs`
**Before**:
```rust
pub fn exchange(&self) -> String {
match self {
Self::EventCreated | Self::EnforcementCreated => "attune.events".to_string(),
Self::ExecutionRequested | Self::ExecutionStatusChanged | Self::ExecutionCompleted => {
"attune.executions".to_string()
}
// ...
}
}
```
**After**:
```rust
pub fn exchange(&self) -> String {
match self {
Self::EventCreated => "attune.events".to_string(),
Self::EnforcementCreated => "attune.executions".to_string(),
Self::ExecutionRequested | Self::ExecutionStatusChanged | Self::ExecutionCompleted => {
"attune.executions".to_string()
}
// ...
}
}
```
### Message Flow (After Fix)
```
Sensor Rule Matcher
↓ (publishes EnforcementCreated)
attune.executions exchange
↓ (routed to)
attune.executions.queue
↓ (consumed by)
Executor Enforcement Processor
↓ (processes enforcement)
Execution Created ✓
```
## Implementation Details
### Files Modified
**`crates/common/src/mq/messages.rs`:**
- Moved `EnforcementCreated` from `attune.events` to `attune.executions` exchange
- Maintains routing key: `enforcement.created`
- All execution-related messages now use same exchange
### Architecture Rationale
**Exchange Purpose Clarification:**
- `attune.events`: For event generation and monitoring
- `EventCreated` messages
- `attune.executions`: For execution lifecycle management
- `EnforcementCreated` (triggers execution creation)
- `ExecutionRequested` (worker assignment)
- `ExecutionStatusChanged` (status updates)
- `ExecutionCompleted` (completion notifications)
- `InquiryCreated`/`InquiryResponded` (human-in-the-loop)
- `attune.notifications`: For notification delivery
- `NotificationCreated` messages
## Testing
After the fix, the complete flow should work:
1.**Timer triggers** generate events (already working)
2.**Rule matcher** creates enforcements (already working)
3.**Enforcement messages** published to correct exchange (FIXED)
4.**Executor** receives and processes enforcements (now works)
5.**Executions** are created in database
6.**Worker** receives execution requests
7.**Actions** are executed
### Verification Steps
After restarting services with the fix:
```bash
# Wait for a few timer events (10-20 seconds)
sleep 20
# Check enforcements (should have new ones)
psql -U postgres -d attune -c "SELECT COUNT(*) FROM attune.enforcement;"
# Check executions (should now have entries!)
psql -U postgres -d attune -c "SELECT COUNT(*) FROM attune.execution;"
# Query via API
curl -X 'GET' 'http://localhost:8080/api/v1/executions?page=1&per_page=50'
```
Expected result:
- Executions table has records
- API returns execution data
- Worker logs show action execution
## Impact
- **Critical Fix**: Enables the entire execution pipeline
- **No Breaking Changes**: Only affects internal message routing
- **Backward Compatible**: Existing events and enforcements unaffected
- **Performance**: No impact, messages now reach correct consumers
## Related Components
### Services Affected
-**Sensor Service**: Needs restart to publish to correct exchange
-**Executor Service**: No changes needed, already consuming from correct queue
- ⚠️ **API Service**: May need restart to show updated execution data
### Message Types Not Affected
- `EventCreated` - Still uses `attune.events` (correct)
- `ExecutionRequested`, `ExecutionStatusChanged`, `ExecutionCompleted` - Already using `attune.executions` (correct)
- `NotificationCreated` - Still uses `attune.notifications` (correct)
## Deployment Steps
1. **Rebuild affected services**:
```bash
cargo build -p attune-sensor
cargo build -p attune-executor # Already has new common lib
```
2. **Restart services** (in order):
```bash
# Stop old processes
pkill attune-sensor
pkill attune-executor
# Start with new binary
cargo run -p attune-sensor &
cargo run -p attune-executor &
```
3. **Verify** executions are being created:
```bash
# Wait for timer event (10 seconds)
sleep 15
# Check database
psql -U postgres -d attune -c \
"SELECT id, status, action_ref, created FROM attune.execution ORDER BY created DESC LIMIT 5;"
```
## Lessons Learned
### Message Routing Design Principles
1. **Group messages by lifecycle domain**, not by source service
2. **Enforcement is part of execution lifecycle**, not event monitoring
3. **Use exchange names that reflect message purpose**, not service names
4. **Document message routing** to prevent similar issues
### Debugging Message Queue Issues
1. **Check both producer and consumer** when messages aren't flowing
2. **Verify exchange bindings** match expected routing
3. **Monitor queue depths** to detect accumulation
4. **Use message tracing** for production debugging
### Architecture Documentation Needed
- [ ] Document message routing topology
- [ ] Create message flow diagrams
- [ ] Add routing decision matrix
- [ ] Document exchange purposes
## Next Steps
- [ ] Verify complete flow with worker execution
- [ ] Add integration test for enforcement → execution flow
- [ ] Document message routing in architecture docs
- [ ] Consider adding dead letter queue monitoring
- [ ] Add metrics for message routing success/failure
## Notes
- This was a subtle bug that only manifested in the integration between services
- Individual services were working correctly in isolation
- Proper message routing is critical for distributed system reliability
- Exchange naming should reflect message purpose, not producer service