re-uploading work

2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions
--- a/docs/performance/performance-context-cloning-diagram.md
+++ b/docs/performance/performance-context-cloning-diagram.md
@@ -0,0 +1,420 @@
+# Workflow Context Cloning - Visual Explanation
+
+## The Problem: O(N*C) Context Cloning
+
+### Scenario: Processing 1000-item list in a workflow with 100 completed tasks
+
+```
+Workflow Execution Timeline
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+Task 1 → Task 2 → ... → Task 100 → Process List (1000 items)
+         └─────────────────────┘      └─────────────────┘
+         Context grows to 1MB          Each item clones 1MB
+                                       = 1GB of cloning!
+```
+
+### Current Implementation (Problematic)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    WorkflowContext                          │
+│  ┌──────────────────────────────────────────────────────┐  │
+│  │ task_results: HashMap<String, JsonValue>            │  │
+│  │  - task_1: { output: "...", size: 10KB }            │  │
+│  │  - task_2: { output: "...", size: 10KB }            │  │
+│  │  - ...                                               │  │
+│  │  - task_100: { output: "...", size: 10KB }          │  │
+│  │                                         Total: 1MB   │  │
+│  └──────────────────────────────────────────────────────┘  │
+│                                                             │
+│  variables: HashMap<String, JsonValue>      (+ 50KB)      │
+│  parameters: JsonValue                       (+ 10KB)     │
+└─────────────────────────────────────────────────────────────┘
+                            │
+                            │ .clone() called for EACH item
+                            ▼
+┌───────────────────────────────────────────────────────────────┐
+│  Processing 1000 items with with-items:                       │
+│                                                                │
+│  Item 0:  context.clone()  →  Copy 1MB  ┐                     │
+│  Item 1:  context.clone()  →  Copy 1MB  │                     │
+│  Item 2:  context.clone()  →  Copy 1MB  │                     │
+│  Item 3:  context.clone()  →  Copy 1MB  │ 1000 copies         │
+│  ...                                     │ = 1GB memory        │
+│  Item 998: context.clone() →  Copy 1MB  │   allocated         │
+│  Item 999: context.clone() →  Copy 1MB  ┘                     │
+└───────────────────────────────────────────────────────────────┘
+```
+
+### Performance Characteristics
+
+```
+Memory Allocation Over Time
+  │
+  │                                    ╱─────────────
+1GB│                               ╱───
+  │                           ╱───
+  │                       ╱───
+512MB│                  ╱───
+  │              ╱───
+  │          ╱───
+256MB│      ╱───
+  │   ╱───
+  │╱──
+0 ─┴──────────────────────────────────────────────────► Time
+  0   200  400  600  800  1000  Items Processed
+  
+Legend:
+╱─── Linear growth in memory allocation
+     (but all at once, causing potential OOM)
+```
+
+---
+
+## The Solution: Arc-Based Context Sharing
+
+### Proposed Implementation
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                WorkflowContext (New)                        │
+│  ┌──────────────────────────────────────────────────────┐  │
+│  │ task_results: Arc<DashMap<String, JsonValue>>       │  │
+│  │   ↓ Reference counted pointer (8 bytes)             │  │
+│  │   └→ [Shared Data on Heap]                          │  │
+│  │       - task_1: { ... }                             │  │
+│  │       - task_2: { ... }                             │  │
+│  │       - ...                                          │  │
+│  │       - task_100: { ... }                           │  │
+│  └──────────────────────────────────────────────────────┘  │
+│                                                             │
+│  variables: Arc<DashMap<String, JsonValue>>  (8 bytes)    │
+│  parameters: Arc<JsonValue>                  (8 bytes)    │
+│                                                             │
+│  current_item: Option<JsonValue>             (cheap)      │
+│  current_index: Option<usize>                (8 bytes)    │
+│                                                             │
+│  Total clone cost: ~40 bytes (just the Arc pointers!)     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Memory Diagram
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│  HEAP (Shared Memory - Allocated Once)                       │
+│                                                               │
+│  ┌─────────────────────────────────────────┐                │
+│  │ DashMap<String, JsonValue>              │                │
+│  │  task_results (1MB)                     │                │
+│  │  [ref_count: 1001]                      │◄───────┐       │
+│  └─────────────────────────────────────────┘        │       │
+│                                                      │       │
+│  ┌─────────────────────────────────────────┐        │       │
+│  │ DashMap<String, JsonValue>              │        │       │
+│  │  variables (50KB)                       │◄───┐   │       │
+│  │  [ref_count: 1001]                      │    │   │       │
+│  └─────────────────────────────────────────┘    │   │       │
+│                                                  │   │       │
+└──────────────────────────────────────────────────│───│───────┘
+                                                   │   │
+┌──────────────────────────────────────────────────│───│───────┐
+│  STACK (Per-Item Contexts)                       │   │       │
+│                                                   │   │       │
+│  Item 0:  WorkflowContext {                      │   │       │
+│    task_results: Arc ptr ───────────────────────────┘       │
+│    variables: Arc ptr ────────────────────┘                 │
+│    current_item: Some(item_0)                               │
+│    current_index: Some(0)                                   │
+│  }  Size: ~40 bytes                                         │
+│                                                              │
+│  Item 1:  WorkflowContext {                                 │
+│    task_results: Arc ptr (points to same heap data)         │
+│    variables: Arc ptr (points to same heap data)            │
+│    current_item: Some(item_1)                               │
+│    current_index: Some(1)                                   │
+│  }  Size: ~40 bytes                                         │
+│                                                              │
+│  ...  (1000 items × 40 bytes = 40KB total!)                │
+└──────────────────────────────────────────────────────────────┘
+```
+
+### Performance Improvement
+
+```
+Memory Allocation Over Time (After Optimization)
+  │
+  │
+1GB│
+  │
+  │
+  │
+512MB│
+  │
+  │
+  │
+256MB│
+  │
+  │────────────────────────────────────────  (Constant!)
+40KB│
+  │
+  │
+0 ─┴──────────────────────────────────────────────────► Time
+  0   200  400  600  800  1000  Items Processed
+  
+Legend:
+──── Flat line - memory stays constant
+     Only ~40KB overhead for item contexts
+```
+
+---
+
+## Comparison: Before vs After
+
+### Before (Current Implementation)
+
+| Metric | Value |
+|--------|-------|
+| Memory per clone | 1.06 MB |
+| Total memory for 1000 items | **1.06 GB** |
+| Clone operation complexity | O(C) where C = context size |
+| Time per clone (estimated) | ~100μs |
+| Total clone time | ~100ms |
+| Risk of OOM | **HIGH** |
+
+### After (Arc-based Implementation)
+
+| Metric | Value |
+|--------|-------|
+| Memory per clone | 40 bytes |
+| Total memory for 1000 items | **40 KB** |
+| Clone operation complexity | **O(1)** |
+| Time per clone (estimated) | ~1μs |
+| Total clone time | ~1ms |
+| Risk of OOM | **NONE** |
+
+### Performance Gain
+
+```
+                 BEFORE          AFTER         IMPROVEMENT
+Memory:          1.06 GB    →    40 KB         26,500x reduction
+Clone Time:      100 ms     →    1 ms          100x faster
+Complexity:      O(N*C)     →    O(N)          Optimal
+```
+
+---
+
+## Code Comparison
+
+### Before (Current)
+
+```rust
+// In execute_with_items():
+for (item_idx, item) in batch.iter().enumerate() {
+    let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
+    let task = task.clone();
+    
+    // 🔴 EXPENSIVE: Clones entire context including all task results
+    let mut item_context = context.clone();  
+    
+    item_context.set_current_item(item.clone(), global_idx);
+    // ...
+}
+```
+
+### After (Proposed)
+
+```rust
+// WorkflowContext now uses Arc for shared data:
+#[derive(Clone)]
+pub struct WorkflowContext {
+    task_results: Arc<DashMap<String, JsonValue>>,  // Shared
+    variables: Arc<DashMap<String, JsonValue>>,      // Shared
+    parameters: Arc<JsonValue>,                       // Shared
+    
+    current_item: Option<JsonValue>,                  // Per-item
+    current_index: Option<usize>,                     // Per-item
+}
+
+// In execute_with_items():
+for (item_idx, item) in batch.iter().enumerate() {
+    let executor = TaskExecutor::new(self.db_pool.clone(), self.mq.clone());
+    let task = task.clone();
+    
+    // ✅ CHEAP: Only clones Arc pointers (~40 bytes)
+    let mut item_context = context.clone();
+    
+    item_context.set_current_item(item.clone(), global_idx);
+    // All items share the same underlying task_results via Arc
+}
+```
+
+---
+
+## Real-World Scenarios
+
+### Scenario 1: Monitoring Workflow
+
+```yaml
+# Monitor 1000 servers every 5 minutes
+workflow:
+  tasks:
+    - name: get_servers
+      action: cloud.list_servers
+      
+    - name: check_health
+      action: monitoring.check_http
+      with-items: "{{ task.get_servers.output.servers }}"  # 1000 items
+      input:
+        url: "{{ item.health_endpoint }}"
+```
+
+**Impact**:
+- Before: 1GB memory allocation per health check cycle
+- After: 40KB memory allocation per health check cycle
+- **Improvement**: Can run 25,000 health checks with same memory
+
+### Scenario 2: Data Processing Pipeline
+
+```yaml
+# Process 10,000 log entries after aggregation tasks
+workflow:
+  tasks:
+    - name: aggregate_logs
+      action: logs.aggregate
+      
+    - name: enrich_metadata
+      action: data.enrich
+      
+    - name: extract_patterns
+      action: analytics.extract
+      
+    - name: process_entries
+      action: logs.parse
+      with-items: "{{ task.aggregate_logs.output.entries }}"  # 10,000 items
+      input:
+        entry: "{{ item }}"
+```
+
+**Impact**:
+- Before: 10GB+ memory allocation (3 prior tasks with results)
+- After: 400KB memory allocation
+- **Improvement**: Prevents OOM, enables 100x larger datasets
+
+### Scenario 3: Bulk API Operations
+
+```yaml
+# Send 5,000 notifications after complex workflow
+workflow:
+  tasks:
+    - name: fetch_users
+    - name: filter_eligible
+    - name: prepare_messages
+    - name: send_batch
+      with-items: "{{ task.prepare_messages.output.messages }}"  # 5,000
+```
+
+**Impact**:
+- Before: 5GB memory spike during notification sending
+- After: 200KB overhead
+- **Improvement**: Stable memory usage, predictable performance
+
+---
+
+## Technical Details
+
+### Arc<T> Behavior
+
+```
+┌─────────────────────────────────────────┐
+│  Arc<DashMap<String, JsonValue>>        │
+│                                         │
+│  [Reference Count: 1]                   │
+│  [Pointer to Heap Data]                 │
+│                                         │
+│  When .clone() is called:               │
+│   1. Increment ref count (atomic op)    │
+│   2. Copy 8-byte pointer                │
+│   3. Return new Arc handle              │
+│                                         │
+│  Cost: O(1) - just atomic increment     │
+│  Memory: 0 bytes allocated              │
+└─────────────────────────────────────────┘
+
+┌─────────────────────────────────────────┐
+│  DashMap<K, V> Features                 │
+│                                         │
+│  ✓ Thread-safe concurrent HashMap       │
+│  ✓ Lock-free reads (most operations)    │
+│  ✓ Fine-grained locking on writes       │
+│  ✓ Iterator support                     │
+│  ✓ Drop-in replacement for HashMap      │
+│                                         │
+│  Perfect for shared workflow context!   │
+└─────────────────────────────────────────┘
+```
+
+### Memory Safety Guarantees
+
+```
+Item 0 Context ─┐
+                │
+Item 1 Context ─┤
+                │
+Item 2 Context ─┼──► Arc ──► Shared DashMap
+                │            [ref_count: 1000]
+...             │
+                │
+Item 999 Context┘
+
+When all items finish:
+  → ref_count decrements to 0
+  → DashMap is automatically deallocated
+  → No memory leaks
+  → No manual cleanup needed
+```
+
+---
+
+## Migration Path
+
+### Phase 1: Context Refactoring
+1. Add Arc wrappers to WorkflowContext fields
+2. Update template rendering to work with Arc<>
+3. Update all context accessors
+
+### Phase 2: Testing
+1. Run existing unit tests (should pass)
+2. Add performance benchmarks
+3. Validate memory usage
+
+### Phase 3: Validation
+1. Measure improvement (expect 10-100x)
+2. Test with real-world workflows
+3. Deploy to staging
+
+### Phase 4: Documentation
+1. Update architecture docs
+2. Document Arc-based patterns
+3. Add performance guide
+
+---
+
+## Conclusion
+
+The context cloning issue is a **critical performance bottleneck** that manifests as exponential-like behavior in real-world workflows. The Arc-based solution:
+
+- ✅ **Eliminates the O(N*C) problem** → O(N)
+- ✅ **Reduces memory by 1000-10,000x**
+- ✅ **Increases speed by 100x**
+- ✅ **Prevents OOM failures**
+- ✅ **Is a well-established Rust pattern**
+- ✅ **Requires no API changes**
+- ✅ **Low implementation risk**
+
+**Priority**: P0 (BLOCKING) - Must be fixed before production deployment.
+
+**Estimated Effort**: 5-7 days
+
+**Expected ROI**: 10-100x performance improvement for workflows with lists