Files
attune/work-summary/sessions/2026-01-30-fix-execution-failure-capture.md
2026-02-04 17:46:30 -06:00

8.0 KiB

Fix: Execution Failure Detection and Error Capture

Date: 2026-01-30
Issue: Executions occasionally fail with "Execution failed during preparation" error even though stdout.log shows the action ran successfully
Status: Fixed


Problem Description

Users reported occasional execution failures with the following characteristics:

  • Error message: "Execution failed during preparation"
  • Result JSON shows "succeeded": false
  • The stdout.log file exists and contains output from the action
  • The action appears to have run, but the system failed to capture the success

Example Error

{
  "error": "Execution failed during preparation",
  "stdout_log": "/tmp/attune/artifacts/execution_10172/stdout.log",
  "succeeded": false
}

Root Cause Analysis

The issue was identified in the worker's execution flow, specifically in how runtime errors are handled:

1. Process Wait Failures

In shell.rs (execute_with_streaming method), if child.wait() fails after the process has already started and written output:

Ok(Err(e)) => {
    return Err(RuntimeError::ProcessError(format!(
        "Process wait failed: {}",
        e
    )));
}

This returns an Err even though:

  • The child process ran successfully
  • Output was captured to stdout/stderr
  • The process may have completed normally

2. Stdin Write Failures

Writing secrets to stdin could fail after the process spawned:

let secrets_json = serde_json::to_string(secrets)?;
stdin.write_all(secrets_json.as_bytes()).await?;

The ? operator would propagate the error up, discarding captured output.

3. Error Propagation in Executor

In executor.rs, when execute_action() returns an Err:

let result = match self.execute_action(context).await {
    Ok(result) => result,
    Err(e) => {
        error!("Action execution failed: {}", e);
        self.handle_execution_failure(execution_id, None).await?;  // None = no result
        return Err(e);
    }
};

Passing None to handle_execution_failure triggers the "Execution failed during preparation" message, even though logs exist.

4. Poor Error Messages

When exit code was non-zero, the entire stderr was used as the error message, which could be very long and unhelpful.


Solution Implemented

Changes to shell.rs

1. Graceful Stdin Write Handling

let stdin_write_error = if let Some(mut stdin) = child.stdin.take() {
    match serde_json::to_string(secrets) {
        Ok(secrets_json) => {
            if let Err(e) = stdin.write_all(secrets_json.as_bytes()).await {
                Some(format!("Failed to write secrets to stdin: {}", e))
            } else if let Err(e) = stdin.write_all(b"\n").await {
                Some(format!("Failed to write newline to stdin: {}", e))
            } else {
                drop(stdin);
                None
            }
        }
        Err(e) => Some(format!("Failed to serialize secrets: {}", e)),
    }
} else {
    None
};
  • Capture stdin write errors instead of propagating them
  • Continue execution to capture output
  • Include error in ExecutionResult

2. Process Wait Error Recovery

let (exit_code, process_error) = match wait_result {
    Ok(Ok(status)) => (status.code().unwrap_or(-1), None),
    Ok(Err(e)) => {
        // Process wait failed, but we have the output - return it with an error
        warn!("Process wait failed but captured output: {}", e);
        (-1, Some(format!("Process wait failed: {}", e)))
    }
    Err(_) => {
        // Timeout occurred - return captured output
        return Ok(ExecutionResult {
            exit_code: -1,
            stdout: stdout_result.content.clone(),
            stderr: stderr_result.content.clone(),
            // ... include truncation info
        });
    }
};
  • Always return Ok(ExecutionResult) when we have captured output
  • Include process wait errors in the result's error field
  • Preserve stdout/stderr even on timeout

3. Improved Error Messages

let error = if let Some(proc_err) = process_error {
    Some(proc_err)
} else if let Some(stdin_err) = stdin_write_error {
    Some(stdin_err)
} else if exit_code != 0 {
    Some(if stderr_result.content.is_empty() {
        format!("Command exited with code {}", exit_code)
    } else {
        // Use last line of stderr as error, or full stderr if short
        if stderr_result.content.lines().count() > 5 {
            stderr_result.content.lines().last().unwrap_or("").to_string()
        } else {
            stderr_result.content.clone()
        }
    })
} else {
    None
};
  • Prioritize specific error sources
  • Use last line of stderr for concise error messages
  • Full stderr only if short (≤5 lines)

Changes to executor.rs

1. Better Documentation

// Note: execute_action should rarely return Err - most failures should be
// captured in ExecutionResult with non-zero exit codes
let result = match self.execute_action(context).await {
    Ok(result) => result,
    Err(e) => {
        error!("Action execution failed catastrophically: {}", e);
        // This should only happen for unrecoverable errors like runtime not found

Clarified that returning Err should be rare.

2. Enhanced Failure Handling

When result is None (early failure), now attempts to read logs from disk:

// Check if stdout log exists from artifact storage
let stdout_path = exec_dir.join("stdout.log");
if stdout_path.exists() {
    result_data["stdout_log"] = serde_json::json!(stdout_path.to_string_lossy());
    // Try to read a preview if file exists
    if let Ok(contents) = tokio::fs::read_to_string(&stdout_path).await {
        let preview = if contents.len() > 1000 {
            format!("{}...", &contents[..1000])
        } else {
            contents
        };
        result_data["stdout"] = serde_json::json!(preview);
    }
}

This provides better diagnostics even for catastrophic failures.

3. Truncation Metadata

Added truncation information to failure results:

if exec_result.stdout_truncated {
    result_data["stdout_truncated"] = serde_json::json!(true);
    result_data["stdout_bytes_truncated"] = 
        serde_json::json!(exec_result.stdout_bytes_truncated);
}

Impact

Before

  • Intermittent "preparation" failures even when actions ran successfully
  • Lost output from partially-completed executions
  • Verbose error messages (entire stderr dump)
  • Difficult debugging due to missing context

After

  • Always capture output when process runs, regardless of wait() status
  • Specific error messages identifying the actual failure point
  • Concise error summaries (last line of stderr)
  • Better diagnostics with truncation metadata
  • Graceful degradation for stdin write failures

Testing Recommendations

  1. Process Termination Scenarios

    • Actions that crash or are killed
    • Zombie processes
    • Process that exit before we can wait()
  2. Resource Exhaustion

    • Very large stdout/stderr (test truncation)
    • Many concurrent executions
    • Slow process cleanup
  3. Stdin Write Failures

    • Processes that close stdin immediately
    • Broken pipe scenarios
    • Large secret payloads
  4. Edge Cases

    • Timeout with partial output
    • Exit code 0 but stderr present
    • No output but successful exit

Files Modified

  • attune/crates/worker/src/runtime/shell.rs - Improved error handling and output capture
  • attune/crates/worker/src/executor.rs - Enhanced failure diagnostics

Notes

  • This fix makes the system more resilient to transient process management issues
  • The "Execution failed during preparation" error should now be extremely rare
  • When it does occur, the result will include any available logs
  • Error messages are now more actionable and concise
  • All changes are backward compatible - existing executions unaffected

  • attune/docs/worker-service.md - Worker architecture
  • attune/docs/running-tests.md - Testing guidelines