re-uploading work
This commit is contained in:
420
docs/performance/performance-context-cloning-diagram.md
Normal file
420
docs/performance/performance-context-cloning-diagram.md
Normal file
@@ -0,0 +1,420 @@
|
||||
# Workflow Context Cloning - Visual Explanation
|
||||
|
||||
## The Problem: O(N*C) Context Cloning
|
||||
|
||||
### Scenario: Processing 1000-item list in a workflow with 100 completed tasks
|
||||
|
||||
```
|
||||
Workflow Execution Timeline
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
Task 1 → Task 2 → ... → Task 100 → Process List (1000 items)
|
||||
└─────────────────────┘ └─────────────────┘
|
||||
Context grows to 1MB Each item clones 1MB
|
||||
= 1GB of cloning!
|
||||
```
|
||||
|
||||
### Current Implementation (Problematic)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ WorkflowContext │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ task_results: HashMap<String, JsonValue> │ │
|
||||
│ │ - task_1: { output: "...", size: 10KB } │ │
|
||||
│ │ - task_2: { output: "...", size: 10KB } │ │
|
||||
│ │ - ... │ │
|
||||
│ │ - task_100: { output: "...", size: 10KB } │ │
|
||||
│ │ Total: 1MB │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ variables: HashMap<String, JsonValue> (+ 50KB) │
|
||||
│ parameters: JsonValue (+ 10KB) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ .clone() called for EACH item
|
||||
▼
|
||||
┌───────────────────────────────────────────────────────────────┐
|
||||
│ Processing 1000 items with with-items: │
|
||||
│ │
|
||||
│ Item 0: context.clone() → Copy 1MB ┐ │
|
||||
│ Item 1: context.clone() → Copy 1MB │ │
|
||||
│ Item 2: context.clone() → Copy 1MB │ │
|
||||
│ Item 3: context.clone() → Copy 1MB │ 1000 copies │
|
||||
│ ... │ = 1GB memory │
|
||||
│ Item 998: context.clone() → Copy 1MB │ allocated │
|
||||
│ Item 999: context.clone() → Copy 1MB ┘ │
|
||||
└───────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
```
|
||||
Memory Allocation Over Time
|
||||
│
|
||||
│ ╱─────────────
|
||||
1GB│ ╱───
|
||||
│ ╱───
|
||||
│ ╱───
|
||||
512MB│ ╱───
|
||||
│ ╱───
|
||||
│ ╱───
|
||||
256MB│ ╱───
|
||||
│ ╱───
|
||||
│╱──
|
||||
0 ─┴──────────────────────────────────────────────────► Time
|
||||
0 200 400 600 800 1000 Items Processed
|
||||
|
||||
Legend:
|
||||
╱─── Linear growth in memory allocation
|
||||
(but all at once, causing potential OOM)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Solution: Arc-Based Context Sharing
|
||||
|
||||
### Proposed Implementation
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ WorkflowContext (New) │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ task_results: Arc<DashMap<String, JsonValue>> │ │
|
||||
│ │ ↓ Reference counted pointer (8 bytes) │ │
|
||||
│ │ └→ [Shared Data on Heap] │ │
|
||||
│ │ - task_1: { ... } │ │
|
||||
│ │ - task_2: { ... } │ │
|
||||
│ │ - ... │ │
|
||||
│ │ - task_100: { ... } │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ variables: Arc<DashMap<String, JsonValue>> (8 bytes) │
|
||||
│ parameters: Arc<JsonValue> (8 bytes) │
|
||||
│ │
|
||||
│ current_item: Option<JsonValue> (cheap) │
|
||||
│ current_index: Option<usize> (8 bytes) │
|
||||
│ │
|
||||
│ Total clone cost: ~40 bytes (just the Arc pointers!) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Memory Diagram
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ HEAP (Shared Memory - Allocated Once) │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ DashMap<String, JsonValue> │ │
|
||||
│ │ task_results (1MB) │ │
|
||||
│ │ [ref_count: 1001] │◄───────┐ │
|
||||
│ └─────────────────────────────────────────┘ │ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────────┐ │ │
|
||||
│ │ DashMap<String, JsonValue> │ │ │
|
||||
│ │ variables (50KB) │◄───┐ │ │
|
||||
│ │ [ref_count: 1001] │ │ │ │
|
||||
│ └─────────────────────────────────────────┘ │ │ │
|
||||
│ │ │ │
|
||||
└──────────────────────────────────────────────────│───│───────┘
|
||||
│ │
|
||||
┌──────────────────────────────────────────────────│───│───────┐
|
||||
│ STACK (Per-Item Contexts) │ │ │
|
||||
│ │ │ │
|
||||
│ Item 0: WorkflowContext { │ │ │
|
||||
│ task_results: Arc ptr ───────────────────────────┘ │
|
||||
│ variables: Arc ptr ────────────────────┘ │
|
||||
│ current_item: Some(item_0) │
|
||||
│ current_index: Some(0) │
|
||||
│ } Size: ~40 bytes │
|
||||
│ │
|
||||
│ Item 1: WorkflowContext { │
|
||||
│ task_results: Arc ptr (points to same heap data) │
|
||||
│ variables: Arc ptr (points to same heap data) │
|
||||
│ current_item: Some(item_1) │
|
||||
│ current_index: Some(1) │
|
||||
│ } Size: ~40 bytes │
|
||||
│ │
|
||||
│ ... (1000 items × 40 bytes = 40KB total!) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Performance Improvement
|
||||
|
||||
```
|
||||
Memory Allocation Over Time (After Optimization)
|
||||
│
|
||||
│
|
||||
1GB│
|
||||
│
|
||||
│
|
||||
│
|
||||
512MB│
|
||||
│
|
||||
│
|
||||
│
|
||||
256MB│
|
||||
│
|
||||
│──────────────────────────────────────── (Constant!)
|
||||
40KB│
|
||||
│
|
||||
│
|
||||
0 ─┴──────────────────────────────────────────────────► Time
|
||||
0 200 400 600 800 1000 Items Processed
|
||||
|
||||
Legend:
|
||||
──── Flat line - memory stays constant
|
||||
Only ~40KB overhead for item contexts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison: Before vs After
|
||||
|
||||
### Before (Current Implementation)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Memory per clone | 1.06 MB |
|
||||
| Total memory for 1000 items | **1.06 GB** |
|
||||
| Clone operation complexity | O(C) where C = context size |
|
||||
| Time per clone (estimated) | ~100μs |
|
||||
| Total clone time | ~100ms |
|
||||
| Risk of OOM | **HIGH** |
|
||||
|
||||
### After (Arc-based Implementation)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Memory per clone | 40 bytes |
|
||||
| Total memory for 1000 items | **40 KB** |
|
||||
| Clone operation complexity | **O(1)** |
|
||||
| Time per clone (estimated) | ~1μs |
|
||||
| Total clone time | ~1ms |
|
||||
| Risk of OOM | **NONE** |
|
||||
|
||||
### Performance Gain
|
||||
|
||||
```
|
||||
BEFORE AFTER IMPROVEMENT
|
||||
Memory: 1.06 GB → 40 KB 26,500x reduction
|
||||
Clone Time: 100 ms → 1 ms 100x faster
|
||||
Complexity: O(N*C) → O(N) Optimal
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Code Comparison
|
||||
|
||||
### Before (Current)
|
||||
|
||||
```rust
|
||||
// In execute_with_items():
|
||||
for (item_idx, item) in batch.iter().enumerate() {
|
||||
let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
|
||||
let task = task.clone();
|
||||
|
||||
// 🔴 EXPENSIVE: Clones entire context including all task results
|
||||
let mut item_context = context.clone();
|
||||
|
||||
item_context.set_current_item(item.clone(), global_idx);
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### After (Proposed)
|
||||
|
||||
```rust
|
||||
// WorkflowContext now uses Arc for shared data:
|
||||
#[derive(Clone)]
|
||||
pub struct WorkflowContext {
|
||||
task_results: Arc<DashMap<String, JsonValue>>, // Shared
|
||||
variables: Arc<DashMap<String, JsonValue>>, // Shared
|
||||
parameters: Arc<JsonValue>, // Shared
|
||||
|
||||
current_item: Option<JsonValue>, // Per-item
|
||||
current_index: Option<usize>, // Per-item
|
||||
}
|
||||
|
||||
// In execute_with_items():
|
||||
for (item_idx, item) in batch.iter().enumerate() {
|
||||
let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
|
||||
let task = task.clone();
|
||||
|
||||
// ✅ CHEAP: Only clones Arc pointers (~40 bytes)
|
||||
let mut item_context = context.clone();
|
||||
|
||||
item_context.set_current_item(item.clone(), global_idx);
|
||||
// All items share the same underlying task_results via Arc
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Real-World Scenarios
|
||||
|
||||
### Scenario 1: Monitoring Workflow
|
||||
|
||||
```yaml
|
||||
# Monitor 1000 servers every 5 minutes
|
||||
workflow:
|
||||
tasks:
|
||||
- name: get_servers
|
||||
action: cloud.list_servers
|
||||
|
||||
- name: check_health
|
||||
action: monitoring.check_http
|
||||
with-items: "{{ task.get_servers.output.servers }}" # 1000 items
|
||||
input:
|
||||
url: "{{ item.health_endpoint }}"
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Before: 1GB memory allocation per health check cycle
|
||||
- After: 40KB memory allocation per health check cycle
|
||||
- **Improvement**: Can run 25,000 health checks with same memory
|
||||
|
||||
### Scenario 2: Data Processing Pipeline
|
||||
|
||||
```yaml
|
||||
# Process 10,000 log entries after aggregation tasks
|
||||
workflow:
|
||||
tasks:
|
||||
- name: aggregate_logs
|
||||
action: logs.aggregate
|
||||
|
||||
- name: enrich_metadata
|
||||
action: data.enrich
|
||||
|
||||
- name: extract_patterns
|
||||
action: analytics.extract
|
||||
|
||||
- name: process_entries
|
||||
action: logs.parse
|
||||
with-items: "{{ task.aggregate_logs.output.entries }}" # 10,000 items
|
||||
input:
|
||||
entry: "{{ item }}"
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Before: 10GB+ memory allocation (3 prior tasks with results)
|
||||
- After: 400KB memory allocation
|
||||
- **Improvement**: Prevents OOM, enables 100x larger datasets
|
||||
|
||||
### Scenario 3: Bulk API Operations
|
||||
|
||||
```yaml
|
||||
# Send 5,000 notifications after complex workflow
|
||||
workflow:
|
||||
tasks:
|
||||
- name: fetch_users
|
||||
- name: filter_eligible
|
||||
- name: prepare_messages
|
||||
- name: send_batch
|
||||
with-items: "{{ task.prepare_messages.output.messages }}" # 5,000
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Before: 5GB memory spike during notification sending
|
||||
- After: 200KB overhead
|
||||
- **Improvement**: Stable memory usage, predictable performance
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Arc<T> Behavior
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Arc<DashMap<String, JsonValue>> │
|
||||
│ │
|
||||
│ [Reference Count: 1] │
|
||||
│ [Pointer to Heap Data] │
|
||||
│ │
|
||||
│ When .clone() is called: │
|
||||
│ 1. Increment ref count (atomic op) │
|
||||
│ 2. Copy 8-byte pointer │
|
||||
│ 3. Return new Arc handle │
|
||||
│ │
|
||||
│ Cost: O(1) - just atomic increment │
|
||||
│ Memory: 0 bytes allocated │
|
||||
└─────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────┐
|
||||
│ DashMap<K, V> Features │
|
||||
│ │
|
||||
│ ✓ Thread-safe concurrent HashMap │
|
||||
│ ✓ Lock-free reads (most operations) │
|
||||
│ ✓ Fine-grained locking on writes │
|
||||
│ ✓ Iterator support │
|
||||
│ ✓ Drop-in replacement for HashMap │
|
||||
│ │
|
||||
│ Perfect for shared workflow context! │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Memory Safety Guarantees
|
||||
|
||||
```
|
||||
Item 0 Context ─┐
|
||||
│
|
||||
Item 1 Context ─┤
|
||||
│
|
||||
Item 2 Context ─┼──► Arc ──► Shared DashMap
|
||||
│ [ref_count: 1000]
|
||||
... │
|
||||
│
|
||||
Item 999 Context┘
|
||||
|
||||
When all items finish:
|
||||
→ ref_count decrements to 0
|
||||
→ DashMap is automatically deallocated
|
||||
→ No memory leaks
|
||||
→ No manual cleanup needed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Path
|
||||
|
||||
### Phase 1: Context Refactoring
|
||||
1. Add Arc wrappers to WorkflowContext fields
|
||||
2. Update template rendering to work with Arc<>
|
||||
3. Update all context accessors
|
||||
|
||||
### Phase 2: Testing
|
||||
1. Run existing unit tests (should pass)
|
||||
2. Add performance benchmarks
|
||||
3. Validate memory usage
|
||||
|
||||
### Phase 3: Validation
|
||||
1. Measure improvement (expect 10-100x)
|
||||
2. Test with real-world workflows
|
||||
3. Deploy to staging
|
||||
|
||||
### Phase 4: Documentation
|
||||
1. Update architecture docs
|
||||
2. Document Arc-based patterns
|
||||
3. Add performance guide
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The context cloning issue is a **critical performance bottleneck** that manifests as exponential-like behavior in real-world workflows. The Arc-based solution:
|
||||
|
||||
- ✅ **Eliminates the O(N*C) problem** → O(N)
|
||||
- ✅ **Reduces memory by 1000-10,000x**
|
||||
- ✅ **Increases speed by 100x**
|
||||
- ✅ **Prevents OOM failures**
|
||||
- ✅ **Is a well-established Rust pattern**
|
||||
- ✅ **Requires no API changes**
|
||||
- ✅ **Low implementation risk**
|
||||
|
||||
**Priority**: P0 (BLOCKING) - Must be fixed before production deployment.
|
||||
|
||||
**Estimated Effort**: 5-7 days
|
||||
|
||||
**Expected ROI**: 10-100x performance improvement for workflows with lists
|
||||
Reference in New Issue
Block a user