Files

David Culbreth 3b14c65998 re-uploading work

2026-02-04 17:46:30 -06:00

8.4 KiB

Raw Blame History

Log Size Limits

Overview

The log size limits feature prevents Out-of-Memory (OOM) issues when actions produce large amounts of output. Instead of buffering all stdout/stderr in memory, the worker service streams logs with configurable size limits and adds truncation notices when limits are exceeded.

Configuration

Log size limits are configured in the worker configuration:

worker:
  max_stdout_bytes: 10485760  # 10MB (default)
  max_stderr_bytes: 10485760  # 10MB (default)
  stream_logs: true           # Enable log streaming (default)

Or via environment variables:

ATTUNE__WORKER__MAX_STDOUT_BYTES=10485760
ATTUNE__WORKER__MAX_STDERR_BYTES=10485760
ATTUNE__WORKER__STREAM_LOGS=true

How It Works

1. Streaming Architecture

Instead of using wait_with_output() which buffers all output in memory, the worker:

Spawns the process with piped stdout/stderr
Creates BoundedLogWriter instances for each stream
Reads output line-by-line concurrently
Writes to bounded writers that enforce size limits
Waits for process completion while streaming continues

2. Truncation Behavior

When output exceeds the configured limit:

The writer stops accepting new data after reaching the effective limit (configured limit - 128 byte reserve)
A truncation notice is appended to the log
Additional output is counted but discarded
The execution result includes truncation metadata

Truncation Notices:

stdout: [OUTPUT TRUNCATED: stdout exceeded size limit]
stderr: [OUTPUT TRUNCATED: stderr exceeded size limit]

3. Execution Result Metadata

The ExecutionResult struct includes truncation information:

pub struct ExecutionResult {
    pub stdout: String,
    pub stderr: String,
    // ... other fields ...
    
    // Truncation metadata
    pub stdout_truncated: bool,
    pub stderr_truncated: bool,
    pub stdout_bytes_truncated: usize,
    pub stderr_bytes_truncated: usize,
}

Example:

{
  "stdout": "Line 1\nLine 2\n...\nLine 100\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
  "stderr": "",
  "stdout_truncated": true,
  "stderr_truncated": false,
  "stdout_bytes_truncated": 950000,
  "exit_code": 0
}

Implementation Details

BoundedLogWriter

The core component is BoundedLogWriter, which implements AsyncWrite:

Reserve Space: Reserves 128 bytes for the truncation notice
Line-by-Line Reading: Reads output line-by-line to ensure clean truncation boundaries
No Backpressure: Always reports successful writes to avoid blocking the process
Concurrent Streaming: stdout and stderr are streamed concurrently using tokio::join!

Runtime Integration

All runtimes (Python, Shell, Local) use the streaming approach:

Python Runtime: execute_with_streaming() method handles both -c and file execution
Shell Runtime: execute_with_streaming() method handles both -c and file execution
Local Runtime: Delegates to Python/Shell, inheriting streaming behavior

Memory Safety

Without log size limits:

Action outputting 1GB → Worker uses 1GB+ memory
10 concurrent large actions → 10GB+ memory usage → OOM

With log size limits (10MB default):

Action outputting 1GB → Worker uses ~10MB per action
10 concurrent large actions → ~100MB memory usage
Safe and predictable memory usage

Examples

Action with Large Output

Action:

# outputs 100MB
for i in range(1000000):
    print(f"Line {i}: " + "x" * 100)

Result (with 10MB limit):

{
  "exit_code": 0,
  "stdout": "[first 10MB of output]\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
  "stdout_truncated": true,
  "stdout_bytes_truncated": 90000000,
  "duration_ms": 1234
}

Action with Large stderr

Action:

import sys
# outputs 50MB to stderr
for i in range(500000):
    sys.stderr.write(f"Warning {i}\n")

Result (with 10MB limit):

{
  "exit_code": 0,
  "stdout": "",
  "stderr": "[first 10MB of warnings]\n\n[OUTPUT TRUNCATED: stderr exceeded size limit]\n",
  "stderr_truncated": true,
  "stderr_bytes_truncated": 40000000,
  "duration_ms": 2345
}

No Truncation (Under Limit)

Action:

print("Hello, World!")

Result:

{
  "exit_code": 0,
  "stdout": "Hello, World!\n",
  "stderr": "",
  "stdout_truncated": false,
  "stderr_truncated": false,
  "stdout_bytes_truncated": 0,
  "stderr_bytes_truncated": 0,
  "duration_ms": 45
}

API Access

Execution Result

When retrieving execution results via the API, truncation metadata is included:

curl http://localhost:8080/api/v1/executions/123

Response:

{
  "data": {
    "id": 123,
    "status": "succeeded",
    "result": {
      "stdout": "...[OUTPUT TRUNCATED]...",
      "stderr": "",
      "exit_code": 0
    },
    "stdout_truncated": true,
    "stderr_truncated": false,
    "stdout_bytes_truncated": 1500000
  }
}

Best Practices

1. Configure Appropriate Limits

Choose limits based on your use case:

Small actions (< 1MB output): Use default 10MB limit
Data processing (moderate output): Consider 50-100MB
Log analysis (large output): Consider 100-500MB
Never: Set to unlimited (risks OOM)

2. Design Actions for Limited Logs

Instead of printing all data:

# BAD: Prints entire dataset
for item in large_dataset:
    print(item)

Use structured output:

# GOOD: Print summary, store data elsewhere
print(f"Processed {len(large_dataset)} items")
print(f"Results saved to: {output_file}")

3. Monitor Truncation

Track truncation events:

Alert if many executions are truncated
May indicate actions need refactoring
Or limits need adjustment

4. Use Artifacts for Large Data

For large outputs, use artifacts:

import json

# Write large data to artifact
with open('/tmp/results.json', 'w') as f:
    json.dump(large_results, f)

# Print only summary
print(f"Results written: {len(large_results)} items")

Performance Impact

Before (Buffered Output)

Memory: O(output_size) per execution
Risk: OOM on large output
Speed: Fast (no streaming overhead)

After (Streaming with Limits)

Memory: O(limit_size) per execution, bounded
Risk: No OOM, predictable memory usage
Speed: Minimal overhead (~1-2% for line-by-line reading)
Safety: Production-ready

Testing

Test log truncation in your actions:

import sys

def test_truncation():
    # Output 20MB (exceeds 10MB limit)
    for i in range(200000):
        print("x" * 100)
    
    # This line won't appear in output if truncated
    print("END")
    
    # But execution still completes successfully
    return {"status": "success"}

Check truncation in result:

if result.stdout_truncated:
    print(f"Output was truncated by {result.stdout_bytes_truncated} bytes")

Troubleshooting

Issue: Important output is truncated

Solution: Refactor action to:

Print only essential information
Store detailed data in artifacts
Use structured logging

Issue: Need to see all output for debugging

Solution: Temporarily increase limits:

worker:
  max_stdout_bytes: 104857600  # 100MB for debugging

Issue: Memory usage still high

Check:

Are limits configured correctly?
Are multiple workers running with high concurrency?
Are artifacts consuming memory?

Limitations

Line Boundaries: Truncation happens at line boundaries, so the last line before truncation is included completely
Binary Output: Only text output is supported; binary output may be corrupted
Reserve Space: 128 bytes reserved for truncation notice reduces effective limit
No Rotation: Logs don't rotate; truncation is permanent

Future Enhancements

Potential improvements:

Log Rotation: Rotate logs to files instead of truncation
Compressed Storage: Store truncated logs compressed
Streaming API: Stream logs in real-time via WebSocket
Per-Action Limits: Configure limits per action
Smart Truncation: Preserve first N bytes and last M bytes

Artifacts: Store large output as artifacts instead of logs
Timeouts: Prevent runaway processes (separate from log limits)
Resource Limits: CPU/memory limits for actions (future)

8.4 KiB Raw Blame History

Log Size Limits

Overview

Configuration

How It Works

1. Streaming Architecture

2. Truncation Behavior

3. Execution Result Metadata

Implementation Details

BoundedLogWriter

Runtime Integration

Memory Safety

Examples

Action with Large Output

Action with Large stderr

No Truncation (Under Limit)

API Access

Execution Result

Best Practices

1. Configure Appropriate Limits

2. Design Actions for Limited Logs

3. Monitor Truncation

4. Use Artifacts for Large Data

Performance Impact

Before (Buffered Output)

After (Streaming with Limits)

Testing

Troubleshooting

Issue: Important output is truncated

Issue: Need to see all output for debugging

Issue: Memory usage still high

Limitations

Future Enhancements

Related Features

See Also

8.4 KiB

Raw Blame History