Files
attune/docs/performance/performance-context-cloning-diagram.md
2026-02-04 17:46:30 -06:00

420 lines
17 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Workflow Context Cloning - Visual Explanation
## The Problem: O(N*C) Context Cloning
### Scenario: Processing 1000-item list in a workflow with 100 completed tasks
```
Workflow Execution Timeline
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Task 1 → Task 2 → ... → Task 100 → Process List (1000 items)
└─────────────────────┘ └─────────────────┘
Context grows to 1MB Each item clones 1MB
= 1GB of cloning!
```
### Current Implementation (Problematic)
```
┌─────────────────────────────────────────────────────────────┐
│ WorkflowContext │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ task_results: HashMap<String, JsonValue> │ │
│ │ - task_1: { output: "...", size: 10KB } │ │
│ │ - task_2: { output: "...", size: 10KB } │ │
│ │ - ... │ │
│ │ - task_100: { output: "...", size: 10KB } │ │
│ │ Total: 1MB │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ variables: HashMap<String, JsonValue> (+ 50KB) │
│ parameters: JsonValue (+ 10KB) │
└─────────────────────────────────────────────────────────────┘
│ .clone() called for EACH item
┌───────────────────────────────────────────────────────────────┐
│ Processing 1000 items with with-items: │
│ │
│ Item 0: context.clone() → Copy 1MB ┐ │
│ Item 1: context.clone() → Copy 1MB │ │
│ Item 2: context.clone() → Copy 1MB │ │
│ Item 3: context.clone() → Copy 1MB │ 1000 copies │
│ ... │ = 1GB memory │
│ Item 998: context.clone() → Copy 1MB │ allocated │
│ Item 999: context.clone() → Copy 1MB ┘ │
└───────────────────────────────────────────────────────────────┘
```
### Performance Characteristics
```
Memory Allocation Over Time
│ ╱─────────────
1GB│ ╱───
│ ╱───
│ ╱───
512MB│ ╱───
│ ╱───
│ ╱───
256MB│ ╱───
│ ╱───
│╱──
0 ─┴──────────────────────────────────────────────────► Time
0 200 400 600 800 1000 Items Processed
Legend:
╱─── Linear growth in memory allocation
(but all at once, causing potential OOM)
```
---
## The Solution: Arc-Based Context Sharing
### Proposed Implementation
```
┌─────────────────────────────────────────────────────────────┐
│ WorkflowContext (New) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ task_results: Arc<DashMap<String, JsonValue>> │ │
│ │ ↓ Reference counted pointer (8 bytes) │ │
│ │ └→ [Shared Data on Heap] │ │
│ │ - task_1: { ... } │ │
│ │ - task_2: { ... } │ │
│ │ - ... │ │
│ │ - task_100: { ... } │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ variables: Arc<DashMap<String, JsonValue>> (8 bytes) │
│ parameters: Arc<JsonValue> (8 bytes) │
│ │
│ current_item: Option<JsonValue> (cheap) │
│ current_index: Option<usize> (8 bytes) │
│ │
│ Total clone cost: ~40 bytes (just the Arc pointers!) │
└─────────────────────────────────────────────────────────────┘
```
### Memory Diagram
```
┌──────────────────────────────────────────────────────────────┐
│ HEAP (Shared Memory - Allocated Once) │
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ DashMap<String, JsonValue> │ │
│ │ task_results (1MB) │ │
│ │ [ref_count: 1001] │◄───────┐ │
│ └─────────────────────────────────────────┘ │ │
│ │ │
│ ┌─────────────────────────────────────────┐ │ │
│ │ DashMap<String, JsonValue> │ │ │
│ │ variables (50KB) │◄───┐ │ │
│ │ [ref_count: 1001] │ │ │ │
│ └─────────────────────────────────────────┘ │ │ │
│ │ │ │
└──────────────────────────────────────────────────│───│───────┘
│ │
┌──────────────────────────────────────────────────│───│───────┐
│ STACK (Per-Item Contexts) │ │ │
│ │ │ │
│ Item 0: WorkflowContext { │ │ │
│ task_results: Arc ptr ───────────────────────────┘ │
│ variables: Arc ptr ────────────────────┘ │
│ current_item: Some(item_0) │
│ current_index: Some(0) │
│ } Size: ~40 bytes │
│ │
│ Item 1: WorkflowContext { │
│ task_results: Arc ptr (points to same heap data) │
│ variables: Arc ptr (points to same heap data) │
│ current_item: Some(item_1) │
│ current_index: Some(1) │
│ } Size: ~40 bytes │
│ │
│ ... (1000 items × 40 bytes = 40KB total!) │
└──────────────────────────────────────────────────────────────┘
```
### Performance Improvement
```
Memory Allocation Over Time (After Optimization)
1GB│
512MB│
256MB│
│──────────────────────────────────────── (Constant!)
40KB│
0 ─┴──────────────────────────────────────────────────► Time
0 200 400 600 800 1000 Items Processed
Legend:
──── Flat line - memory stays constant
Only ~40KB overhead for item contexts
```
---
## Comparison: Before vs After
### Before (Current Implementation)
| Metric | Value |
|--------|-------|
| Memory per clone | 1.06 MB |
| Total memory for 1000 items | **1.06 GB** |
| Clone operation complexity | O(C) where C = context size |
| Time per clone (estimated) | ~100μs |
| Total clone time | ~100ms |
| Risk of OOM | **HIGH** |
### After (Arc-based Implementation)
| Metric | Value |
|--------|-------|
| Memory per clone | 40 bytes |
| Total memory for 1000 items | **40 KB** |
| Clone operation complexity | **O(1)** |
| Time per clone (estimated) | ~1μs |
| Total clone time | ~1ms |
| Risk of OOM | **NONE** |
### Performance Gain
```
BEFORE AFTER IMPROVEMENT
Memory: 1.06 GB → 40 KB 26,500x reduction
Clone Time: 100 ms → 1 ms 100x faster
Complexity: O(N*C) → O(N) Optimal
```
---
## Code Comparison
### Before (Current)
```rust
// In execute_with_items():
for (item_idx, item) in batch.iter().enumerate() {
let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
let task = task.clone();
// 🔴 EXPENSIVE: Clones entire context including all task results
let mut item_context = context.clone();
item_context.set_current_item(item.clone(), global_idx);
// ...
}
```
### After (Proposed)
```rust
// WorkflowContext now uses Arc for shared data:
#[derive(Clone)]
pub struct WorkflowContext {
task_results: Arc<DashMap<String, JsonValue>>, // Shared
variables: Arc<DashMap<String, JsonValue>>, // Shared
parameters: Arc<JsonValue>, // Shared
current_item: Option<JsonValue>, // Per-item
current_index: Option<usize>, // Per-item
}
// In execute_with_items():
for (item_idx, item) in batch.iter().enumerate() {
let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
let task = task.clone();
// ✅ CHEAP: Only clones Arc pointers (~40 bytes)
let mut item_context = context.clone();
item_context.set_current_item(item.clone(), global_idx);
// All items share the same underlying task_results via Arc
}
```
---
## Real-World Scenarios
### Scenario 1: Monitoring Workflow
```yaml
# Monitor 1000 servers every 5 minutes
workflow:
tasks:
- name: get_servers
action: cloud.list_servers
- name: check_health
action: monitoring.check_http
with-items: "{{ task.get_servers.output.servers }}" # 1000 items
input:
url: "{{ item.health_endpoint }}"
```
**Impact**:
- Before: 1GB memory allocation per health check cycle
- After: 40KB memory allocation per health check cycle
- **Improvement**: Can run 25,000 health checks with same memory
### Scenario 2: Data Processing Pipeline
```yaml
# Process 10,000 log entries after aggregation tasks
workflow:
tasks:
- name: aggregate_logs
action: logs.aggregate
- name: enrich_metadata
action: data.enrich
- name: extract_patterns
action: analytics.extract
- name: process_entries
action: logs.parse
with-items: "{{ task.aggregate_logs.output.entries }}" # 10,000 items
input:
entry: "{{ item }}"
```
**Impact**:
- Before: 10GB+ memory allocation (3 prior tasks with results)
- After: 400KB memory allocation
- **Improvement**: Prevents OOM, enables 100x larger datasets
### Scenario 3: Bulk API Operations
```yaml
# Send 5,000 notifications after complex workflow
workflow:
tasks:
- name: fetch_users
- name: filter_eligible
- name: prepare_messages
- name: send_batch
with-items: "{{ task.prepare_messages.output.messages }}" # 5,000
```
**Impact**:
- Before: 5GB memory spike during notification sending
- After: 200KB overhead
- **Improvement**: Stable memory usage, predictable performance
---
## Technical Details
### Arc<T> Behavior
```
┌─────────────────────────────────────────┐
│ Arc<DashMap<String, JsonValue>> │
│ │
│ [Reference Count: 1] │
│ [Pointer to Heap Data] │
│ │
│ When .clone() is called: │
│ 1. Increment ref count (atomic op) │
│ 2. Copy 8-byte pointer │
│ 3. Return new Arc handle │
│ │
│ Cost: O(1) - just atomic increment │
│ Memory: 0 bytes allocated │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ DashMap<K, V> Features │
│ │
│ ✓ Thread-safe concurrent HashMap │
│ ✓ Lock-free reads (most operations) │
│ ✓ Fine-grained locking on writes │
│ ✓ Iterator support │
│ ✓ Drop-in replacement for HashMap │
│ │
│ Perfect for shared workflow context! │
└─────────────────────────────────────────┘
```
### Memory Safety Guarantees
```
Item 0 Context ─┐
Item 1 Context ─┤
Item 2 Context ─┼──► Arc ──► Shared DashMap
│ [ref_count: 1000]
... │
Item 999 Context┘
When all items finish:
→ ref_count decrements to 0
→ DashMap is automatically deallocated
→ No memory leaks
→ No manual cleanup needed
```
---
## Migration Path
### Phase 1: Context Refactoring
1. Add Arc wrappers to WorkflowContext fields
2. Update template rendering to work with Arc<>
3. Update all context accessors
### Phase 2: Testing
1. Run existing unit tests (should pass)
2. Add performance benchmarks
3. Validate memory usage
### Phase 3: Validation
1. Measure improvement (expect 10-100x)
2. Test with real-world workflows
3. Deploy to staging
### Phase 4: Documentation
1. Update architecture docs
2. Document Arc-based patterns
3. Add performance guide
---
## Conclusion
The context cloning issue is a **critical performance bottleneck** that manifests as exponential-like behavior in real-world workflows. The Arc-based solution:
-**Eliminates the O(N*C) problem** → O(N)
-**Reduces memory by 1000-10,000x**
-**Increases speed by 100x**
-**Prevents OOM failures**
-**Is a well-established Rust pattern**
-**Requires no API changes**
-**Low implementation risk**
**Priority**: P0 (BLOCKING) - Must be fixed before production deployment.
**Estimated Effort**: 5-7 days
**Expected ROI**: 10-100x performance improvement for workflows with lists