attune/docs/performance/log-size-limits.md

# Log Size Limits

## Overview

The log size limits feature prevents Out-of-Memory (OOM) issues when actions produce large amounts of output. Instead of buffering all stdout/stderr in memory, the worker service streams logs with configurable size limits and adds truncation notices when limits are exceeded.

## Configuration

Log size limits are configured in the worker configuration:

```yaml
worker:
  max_stdout_bytes: 10485760  # 10MB (default)
  max_stderr_bytes: 10485760  # 10MB (default)
  stream_logs: true           # Enable log streaming (default)
```

Or via environment variables:

```bash
ATTUNE__WORKER__MAX_STDOUT_BYTES=10485760
ATTUNE__WORKER__MAX_STDERR_BYTES=10485760
ATTUNE__WORKER__STREAM_LOGS=true
```

## How It Works

### 1. Streaming Architecture

Instead of using `wait_with_output()` which buffers all output in memory, the worker:

1. Spawns the process with piped stdout/stderr
2. Creates `BoundedLogWriter` instances for each stream
3. Reads output line-by-line concurrently
4. Writes to bounded writers that enforce size limits
5. Waits for process completion while streaming continues

### 2. Truncation Behavior

When output exceeds the configured limit:

1. The writer stops accepting new data after reaching the effective limit (configured limit - 128 byte reserve)
2. A truncation notice is appended to the log
3. Additional output is counted but discarded
4. The execution result includes truncation metadata

**Truncation Notices:**
- **stdout**: `[OUTPUT TRUNCATED: stdout exceeded size limit]`
- **stderr**: `[OUTPUT TRUNCATED: stderr exceeded size limit]`

### 3. Execution Result Metadata

The `ExecutionResult` struct includes truncation information:

```rust
pub struct ExecutionResult {
    pub stdout: String,
    pub stderr: String,
    // ... other fields ...

    // Truncation metadata
    pub stdout_truncated: bool,
    pub stderr_truncated: bool,
    pub stdout_bytes_truncated: usize,
    pub stderr_bytes_truncated: usize,
}
```

**Example:**
```json
{
  "stdout": "Line 1\nLine 2\n...\nLine 100\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
  "stderr": "",
  "stdout_truncated": true,
  "stderr_truncated": false,
  "stdout_bytes_truncated": 950000,
  "exit_code": 0
}
```

## Implementation Details

### BoundedLogWriter

The core component is `BoundedLogWriter`, which implements `AsyncWrite`:

- **Reserve Space**: Reserves 128 bytes for the truncation notice
- **Line-by-Line Reading**: Reads output line-by-line to ensure clean truncation boundaries
- **No Backpressure**: Always reports successful writes to avoid blocking the process
- **Concurrent Streaming**: stdout and stderr are streamed concurrently using `tokio::join!`

### Runtime Integration

All runtimes (Python, Shell, Local) use the streaming approach:

1. **Python Runtime**: `execute_with_streaming()` method handles both `-c` and file execution
2. **Shell Runtime**: `execute_with_streaming()` method handles both `-c` and file execution
3. **Local Runtime**: Delegates to Python/Shell, inheriting streaming behavior

### Memory Safety

Without log size limits:
- Action outputting 1GB → Worker uses 1GB+ memory
- 10 concurrent large actions → 10GB+ memory usage → OOM

With log size limits (10MB default):
- Action outputting 1GB → Worker uses ~10MB per action
- 10 concurrent large actions → ~100MB memory usage
- Safe and predictable memory usage

## Examples

### Action with Large Output

**Action:**
```python
# outputs 100MB
for i in range(1000000):
    print(f"Line {i}: " + "x" * 100)
```

**Result (with 10MB limit):**
```json
{
  "exit_code": 0,
  "stdout": "[first 10MB of output]\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
  "stdout_truncated": true,
  "stdout_bytes_truncated": 90000000,
  "duration_ms": 1234
}
```

### Action with Large stderr

**Action:**
```python
import sys
# outputs 50MB to stderr
for i in range(500000):
    sys.stderr.write(f"Warning {i}\n")
```

**Result (with 10MB limit):**
```json
{
  "exit_code": 0,
  "stdout": "",
  "stderr": "[first 10MB of warnings]\n\n[OUTPUT TRUNCATED: stderr exceeded size limit]\n",
  "stderr_truncated": true,
  "stderr_bytes_truncated": 40000000,
  "duration_ms": 2345
}
```

### No Truncation (Under Limit)

**Action:**
```python
print("Hello, World!")
```

**Result:**
```json
{
  "exit_code": 0,
  "stdout": "Hello, World!\n",
  "stderr": "",
  "stdout_truncated": false,
  "stderr_truncated": false,
  "stdout_bytes_truncated": 0,
  "stderr_bytes_truncated": 0,
  "duration_ms": 45
}
```

## API Access

### Execution Result

When retrieving execution results via the API, truncation metadata is included:

```bash
curl http://localhost:8080/api/v1/executions/123
```

**Response:**
```json
{
  "data": {
    "id": 123,
    "status": "succeeded",
    "result": {
      "stdout": "...[OUTPUT TRUNCATED]...",
      "stderr": "",
      "exit_code": 0
    },
    "stdout_truncated": true,
    "stderr_truncated": false,
    "stdout_bytes_truncated": 1500000
  }
}
```

## Best Practices

### 1. Configure Appropriate Limits

Choose limits based on your use case:

- **Small actions** (< 1MB output): Use default 10MB limit
- **Data processing** (moderate output): Consider 50-100MB
- **Log analysis** (large output): Consider 100-500MB
- **Never**: Set to unlimited (risks OOM)

### 2. Design Actions for Limited Logs

Instead of printing all data:

```python
# BAD: Prints entire dataset
for item in large_dataset:
    print(item)
```

Use structured output:

```python
# GOOD: Print summary, store data elsewhere
print(f"Processed {len(large_dataset)} items")
print(f"Results saved to: {output_file}")
```

### 3. Monitor Truncation

Track truncation events:
- Alert if many executions are truncated
- May indicate actions need refactoring
- Or limits need adjustment

### 4. Use Artifacts for Large Data

For large outputs, use artifacts:

```python
import json

# Write large data to artifact
with open('/tmp/results.json', 'w') as f:
    json.dump(large_results, f)

# Print only summary
print(f"Results written: {len(large_results)} items")
```

## Performance Impact

### Before (Buffered Output)

- **Memory**: O(output_size) per execution
- **Risk**: OOM on large output
- **Speed**: Fast (no streaming overhead)

### After (Streaming with Limits)

- **Memory**: O(limit_size) per execution, bounded
- **Risk**: No OOM, predictable memory usage
- **Speed**: Minimal overhead (~1-2% for line-by-line reading)
- **Safety**: Production-ready

## Testing

Test log truncation in your actions:

```python
import sys

def test_truncation():
    # Output 20MB (exceeds 10MB limit)
    for i in range(200000):
        print("x" * 100)

    # This line won't appear in output if truncated
    print("END")

    # But execution still completes successfully
    return {"status": "success"}
```

Check truncation in result:
```python
if result.stdout_truncated:
    print(f"Output was truncated by {result.stdout_bytes_truncated} bytes")
```

## Troubleshooting

### Issue: Important output is truncated

**Solution**: Refactor action to:
1. Print only essential information
2. Store detailed data in artifacts
3. Use structured logging

### Issue: Need to see all output for debugging

**Solution**: Temporarily increase limits:
```yaml
worker:
  max_stdout_bytes: 104857600  # 100MB for debugging
```

### Issue: Memory usage still high

**Check**:
1. Are limits configured correctly?
2. Are multiple workers running with high concurrency?
3. Are artifacts consuming memory?

## Limitations

1. **Line Boundaries**: Truncation happens at line boundaries, so the last line before truncation is included completely
2. **Binary Output**: Only text output is supported; binary output may be corrupted
3. **Reserve Space**: 128 bytes reserved for truncation notice reduces effective limit
4. **No Rotation**: Logs don't rotate; truncation is permanent

## Future Enhancements

Potential improvements:

1. **Log Rotation**: Rotate logs to files instead of truncation
2. **Compressed Storage**: Store truncated logs compressed
3. **Streaming API**: Stream logs in real-time via WebSocket
4. **Per-Action Limits**: Configure limits per action
5. **Smart Truncation**: Preserve first N bytes and last M bytes

## Related Features

- **Artifacts**: Store large output as artifacts instead of logs
- **Timeouts**: Prevent runaway processes (separate from log limits)
- **Resource Limits**: CPU/memory limits for actions (future)

## See Also

- [Worker Configuration](worker-configuration.md)
- [Runtime Architecture](runtime-architecture.md)
- [Performance Tuning](performance-tuning.md)