re-uploading work
This commit is contained in:
346
docs/performance/log-size-limits.md
Normal file
346
docs/performance/log-size-limits.md
Normal file
@@ -0,0 +1,346 @@
|
||||
# Log Size Limits
|
||||
|
||||
## Overview
|
||||
|
||||
The log size limits feature prevents Out-of-Memory (OOM) issues when actions produce large amounts of output. Instead of buffering all stdout/stderr in memory, the worker service streams logs with configurable size limits and adds truncation notices when limits are exceeded.
|
||||
|
||||
## Configuration
|
||||
|
||||
Log size limits are configured in the worker configuration:
|
||||
|
||||
```yaml
|
||||
worker:
|
||||
max_stdout_bytes: 10485760 # 10MB (default)
|
||||
max_stderr_bytes: 10485760 # 10MB (default)
|
||||
stream_logs: true # Enable log streaming (default)
|
||||
```
|
||||
|
||||
Or via environment variables:
|
||||
|
||||
```bash
|
||||
ATTUNE__WORKER__MAX_STDOUT_BYTES=10485760
|
||||
ATTUNE__WORKER__MAX_STDERR_BYTES=10485760
|
||||
ATTUNE__WORKER__STREAM_LOGS=true
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### 1. Streaming Architecture
|
||||
|
||||
Instead of using `wait_with_output()` which buffers all output in memory, the worker:
|
||||
|
||||
1. Spawns the process with piped stdout/stderr
|
||||
2. Creates `BoundedLogWriter` instances for each stream
|
||||
3. Reads output line-by-line concurrently
|
||||
4. Writes to bounded writers that enforce size limits
|
||||
5. Waits for process completion while streaming continues
|
||||
|
||||
### 2. Truncation Behavior
|
||||
|
||||
When output exceeds the configured limit:
|
||||
|
||||
1. The writer stops accepting new data after reaching the effective limit (configured limit - 128 byte reserve)
|
||||
2. A truncation notice is appended to the log
|
||||
3. Additional output is counted but discarded
|
||||
4. The execution result includes truncation metadata
|
||||
|
||||
**Truncation Notices:**
|
||||
- **stdout**: `[OUTPUT TRUNCATED: stdout exceeded size limit]`
|
||||
- **stderr**: `[OUTPUT TRUNCATED: stderr exceeded size limit]`
|
||||
|
||||
### 3. Execution Result Metadata
|
||||
|
||||
The `ExecutionResult` struct includes truncation information:
|
||||
|
||||
```rust
|
||||
pub struct ExecutionResult {
|
||||
pub stdout: String,
|
||||
pub stderr: String,
|
||||
// ... other fields ...
|
||||
|
||||
// Truncation metadata
|
||||
pub stdout_truncated: bool,
|
||||
pub stderr_truncated: bool,
|
||||
pub stdout_bytes_truncated: usize,
|
||||
pub stderr_bytes_truncated: usize,
|
||||
}
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```json
|
||||
{
|
||||
"stdout": "Line 1\nLine 2\n...\nLine 100\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
|
||||
"stderr": "",
|
||||
"stdout_truncated": true,
|
||||
"stderr_truncated": false,
|
||||
"stdout_bytes_truncated": 950000,
|
||||
"exit_code": 0
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### BoundedLogWriter
|
||||
|
||||
The core component is `BoundedLogWriter`, which implements `AsyncWrite`:
|
||||
|
||||
- **Reserve Space**: Reserves 128 bytes for the truncation notice
|
||||
- **Line-by-Line Reading**: Reads output line-by-line to ensure clean truncation boundaries
|
||||
- **No Backpressure**: Always reports successful writes to avoid blocking the process
|
||||
- **Concurrent Streaming**: stdout and stderr are streamed concurrently using `tokio::join!`
|
||||
|
||||
### Runtime Integration
|
||||
|
||||
All runtimes (Python, Shell, Local) use the streaming approach:
|
||||
|
||||
1. **Python Runtime**: `execute_with_streaming()` method handles both `-c` and file execution
|
||||
2. **Shell Runtime**: `execute_with_streaming()` method handles both `-c` and file execution
|
||||
3. **Local Runtime**: Delegates to Python/Shell, inheriting streaming behavior
|
||||
|
||||
### Memory Safety
|
||||
|
||||
Without log size limits:
|
||||
- Action outputting 1GB → Worker uses 1GB+ memory
|
||||
- 10 concurrent large actions → 10GB+ memory usage → OOM
|
||||
|
||||
With log size limits (10MB default):
|
||||
- Action outputting 1GB → Worker uses ~10MB per action
|
||||
- 10 concurrent large actions → ~100MB memory usage
|
||||
- Safe and predictable memory usage
|
||||
|
||||
## Examples
|
||||
|
||||
### Action with Large Output
|
||||
|
||||
**Action:**
|
||||
```python
|
||||
# outputs 100MB
|
||||
for i in range(1000000):
|
||||
print(f"Line {i}: " + "x" * 100)
|
||||
```
|
||||
|
||||
**Result (with 10MB limit):**
|
||||
```json
|
||||
{
|
||||
"exit_code": 0,
|
||||
"stdout": "[first 10MB of output]\n\n[OUTPUT TRUNCATED: stdout exceeded size limit]\n",
|
||||
"stdout_truncated": true,
|
||||
"stdout_bytes_truncated": 90000000,
|
||||
"duration_ms": 1234
|
||||
}
|
||||
```
|
||||
|
||||
### Action with Large stderr
|
||||
|
||||
**Action:**
|
||||
```python
|
||||
import sys
|
||||
# outputs 50MB to stderr
|
||||
for i in range(500000):
|
||||
sys.stderr.write(f"Warning {i}\n")
|
||||
```
|
||||
|
||||
**Result (with 10MB limit):**
|
||||
```json
|
||||
{
|
||||
"exit_code": 0,
|
||||
"stdout": "",
|
||||
"stderr": "[first 10MB of warnings]\n\n[OUTPUT TRUNCATED: stderr exceeded size limit]\n",
|
||||
"stderr_truncated": true,
|
||||
"stderr_bytes_truncated": 40000000,
|
||||
"duration_ms": 2345
|
||||
}
|
||||
```
|
||||
|
||||
### No Truncation (Under Limit)
|
||||
|
||||
**Action:**
|
||||
```python
|
||||
print("Hello, World!")
|
||||
```
|
||||
|
||||
**Result:**
|
||||
```json
|
||||
{
|
||||
"exit_code": 0,
|
||||
"stdout": "Hello, World!\n",
|
||||
"stderr": "",
|
||||
"stdout_truncated": false,
|
||||
"stderr_truncated": false,
|
||||
"stdout_bytes_truncated": 0,
|
||||
"stderr_bytes_truncated": 0,
|
||||
"duration_ms": 45
|
||||
}
|
||||
```
|
||||
|
||||
## API Access
|
||||
|
||||
### Execution Result
|
||||
|
||||
When retrieving execution results via the API, truncation metadata is included:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/api/v1/executions/123
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": 123,
|
||||
"status": "succeeded",
|
||||
"result": {
|
||||
"stdout": "...[OUTPUT TRUNCATED]...",
|
||||
"stderr": "",
|
||||
"exit_code": 0
|
||||
},
|
||||
"stdout_truncated": true,
|
||||
"stderr_truncated": false,
|
||||
"stdout_bytes_truncated": 1500000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Configure Appropriate Limits
|
||||
|
||||
Choose limits based on your use case:
|
||||
|
||||
- **Small actions** (< 1MB output): Use default 10MB limit
|
||||
- **Data processing** (moderate output): Consider 50-100MB
|
||||
- **Log analysis** (large output): Consider 100-500MB
|
||||
- **Never**: Set to unlimited (risks OOM)
|
||||
|
||||
### 2. Design Actions for Limited Logs
|
||||
|
||||
Instead of printing all data:
|
||||
|
||||
```python
|
||||
# BAD: Prints entire dataset
|
||||
for item in large_dataset:
|
||||
print(item)
|
||||
```
|
||||
|
||||
Use structured output:
|
||||
|
||||
```python
|
||||
# GOOD: Print summary, store data elsewhere
|
||||
print(f"Processed {len(large_dataset)} items")
|
||||
print(f"Results saved to: {output_file}")
|
||||
```
|
||||
|
||||
### 3. Monitor Truncation
|
||||
|
||||
Track truncation events:
|
||||
- Alert if many executions are truncated
|
||||
- May indicate actions need refactoring
|
||||
- Or limits need adjustment
|
||||
|
||||
### 4. Use Artifacts for Large Data
|
||||
|
||||
For large outputs, use artifacts:
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
# Write large data to artifact
|
||||
with open('/tmp/results.json', 'w') as f:
|
||||
json.dump(large_results, f)
|
||||
|
||||
# Print only summary
|
||||
print(f"Results written: {len(large_results)} items")
|
||||
```
|
||||
|
||||
## Performance Impact
|
||||
|
||||
### Before (Buffered Output)
|
||||
|
||||
- **Memory**: O(output_size) per execution
|
||||
- **Risk**: OOM on large output
|
||||
- **Speed**: Fast (no streaming overhead)
|
||||
|
||||
### After (Streaming with Limits)
|
||||
|
||||
- **Memory**: O(limit_size) per execution, bounded
|
||||
- **Risk**: No OOM, predictable memory usage
|
||||
- **Speed**: Minimal overhead (~1-2% for line-by-line reading)
|
||||
- **Safety**: Production-ready
|
||||
|
||||
## Testing
|
||||
|
||||
Test log truncation in your actions:
|
||||
|
||||
```python
|
||||
import sys
|
||||
|
||||
def test_truncation():
|
||||
# Output 20MB (exceeds 10MB limit)
|
||||
for i in range(200000):
|
||||
print("x" * 100)
|
||||
|
||||
# This line won't appear in output if truncated
|
||||
print("END")
|
||||
|
||||
# But execution still completes successfully
|
||||
return {"status": "success"}
|
||||
```
|
||||
|
||||
Check truncation in result:
|
||||
```python
|
||||
if result.stdout_truncated:
|
||||
print(f"Output was truncated by {result.stdout_bytes_truncated} bytes")
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Important output is truncated
|
||||
|
||||
**Solution**: Refactor action to:
|
||||
1. Print only essential information
|
||||
2. Store detailed data in artifacts
|
||||
3. Use structured logging
|
||||
|
||||
### Issue: Need to see all output for debugging
|
||||
|
||||
**Solution**: Temporarily increase limits:
|
||||
```yaml
|
||||
worker:
|
||||
max_stdout_bytes: 104857600 # 100MB for debugging
|
||||
```
|
||||
|
||||
### Issue: Memory usage still high
|
||||
|
||||
**Check**:
|
||||
1. Are limits configured correctly?
|
||||
2. Are multiple workers running with high concurrency?
|
||||
3. Are artifacts consuming memory?
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Line Boundaries**: Truncation happens at line boundaries, so the last line before truncation is included completely
|
||||
2. **Binary Output**: Only text output is supported; binary output may be corrupted
|
||||
3. **Reserve Space**: 128 bytes reserved for truncation notice reduces effective limit
|
||||
4. **No Rotation**: Logs don't rotate; truncation is permanent
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
Potential improvements:
|
||||
|
||||
1. **Log Rotation**: Rotate logs to files instead of truncation
|
||||
2. **Compressed Storage**: Store truncated logs compressed
|
||||
3. **Streaming API**: Stream logs in real-time via WebSocket
|
||||
4. **Per-Action Limits**: Configure limits per action
|
||||
5. **Smart Truncation**: Preserve first N bytes and last M bytes
|
||||
|
||||
## Related Features
|
||||
|
||||
- **Artifacts**: Store large output as artifacts instead of logs
|
||||
- **Timeouts**: Prevent runaway processes (separate from log limits)
|
||||
- **Resource Limits**: CPU/memory limits for actions (future)
|
||||
|
||||
## See Also
|
||||
|
||||
- [Worker Configuration](worker-configuration.md)
|
||||
- [Runtime Architecture](runtime-architecture.md)
|
||||
- [Performance Tuning](performance-tuning.md)
|
||||
Reference in New Issue
Block a user