19 KiB
File-Based Artifact Storage Plan
Overview
Replace PostgreSQL BYTEA storage for file-type artifacts with a shared filesystem volume. Execution processes write artifact files directly to disk via paths assigned by the API; the API serves those files from disk on download. The database stores only metadata (path, size, content type) — no binary content for file-based artifacts.
Motivation:
- Eliminates PostgreSQL bloat from large binary artifacts
- Enables executions to write files incrementally (streaming logs, large outputs) without buffering in memory for an API upload
- Artifacts can be retained independently of execution records (executions are hypertables with 90-day retention)
- Decouples artifact lifecycle from execution lifecycle — artifacts created by one execution can be accessed by others or by external systems
Artifact Type Classification
| Type | Storage | Notes |
|---|---|---|
FileBinary |
Disk (shared volume) | Binary files produced by executions |
FileDatatable |
Disk (shared volume) | Tabular data files (CSV, etc.) |
FileText |
Disk (shared volume) | Text files, logs |
Log |
Disk (shared volume) | Execution stdout/stderr logs |
Progress |
DB (artifact.data JSONB) |
Small structured progress entries — unchanged |
Url |
DB (artifact.data JSONB) |
URL references — unchanged |
Directory Structure
/opt/attune/artifacts/ # artifacts_dir (configurable)
└── {artifact_ref_slug}/ # derived from artifact ref (globally unique)
├── v1.txt # version 1
├── v2.txt # version 2
└── v3.txt # version 3
Key decisions:
- No execution ID in the path. Artifacts may outlive execution records (hypertable retention) and may be shared across executions or created externally.
- Keyed by artifact ref. The
refcolumn has a unique index, making it a stable, globally unique identifier. Dots in refs become directory separators (e.g.,mypack.build_log→mypack/build_log/). - Version files named
v{N}.{ext}whereNis the version number fromnext_artifact_version()andextis derived fromcontent_type.
End-to-End Flow
Happy Path
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────────┐
│ Worker │────▶│Execution │────▶│ API │────▶│ Shared Volume │
│ Service │ │ Process │ │ Service │ │ /opt/attune/ │
│ │ │(Py/Node/ │ │ │ │ artifacts/ │
│ │ │ Shell) │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └────────────────┘
│ │ │ │
│ 1. Start exec │ │ │
│ Set ATTUNE_ │ │ │
│ ARTIFACTS_DIR │ │ │
│───────────────▶│ │ │
│ │ │ │
│ │ 2. POST /api/v1/artifacts │
│ │ {ref, type, execution} │
│ │───────────────▶│ │
│ │ │ 3. Create artifact │
│ │ │ row in DB │
│ │ │ │
│ │◀───────────────│ │
│ │ {id, ref, ...}│ │
│ │ │ │
│ │ 4. POST /api/v1/artifacts/{id}/versions
│ │ {content_type} │
│ │───────────────▶│ │
│ │ │ 5. Create version │
│ │ │ row (file_path, │
│ │ │ no BYTEA content) │
│ │ │ + mkdir on disk │
│ │◀───────────────│ │
│ │ {id, version, │ │
│ │ file_path} │ │
│ │ │ │
│ │ 6. Write file to │
│ │ $ATTUNE_ARTIFACTS_DIR/file_path │
│ │─────────────────────────────────────▶│
│ │ │ │
│ 7. Exec exits │ │ │
│◀───────────────│ │ │
│ │ │
│ 8. Finalize: stat files, │ │
│ update size_bytes in DB │ │
│ (direct DB access) │ │
│─────────────────────────────────┘ │
│ │
▼ │
┌──────────┐ │
│ Client │ 9. GET /api/v1/artifacts/{id}/download │
│ (UI) │──────────────────▶ API reads from disk ◀──────┘
└──────────┘
Step-by-Step
- Worker receives execution from MQ, prepares
ExecutionContext, setsATTUNE_ARTIFACTS_DIRenvironment variable. - Execution process calls
POST /api/v1/artifactsto create the artifact record (ref, type, execution ID, content_type). - API creates the
artifactrow in DB, returns the artifact ID. - Execution process calls
POST /api/v1/artifacts/{id}/versionsto create a new version. For file-type artifacts, the request body contains content_type and optional metadata — no file content. - API creates the
artifact_versionrow with a computedfile_path(e.g.,mypack/build_log/v1.txt),contentBYTEA left NULL. Creates the parent directory on disk. Returns version ID andfile_path. - Execution process writes file content to
$ATTUNE_ARTIFACTS_DIR/{file_path}. Can write incrementally (append, stream, etc.). - Execution process exits.
- Worker finalizes: scans artifact versions linked to this execution,
stat()s each file on disk, updatesartifact_version.size_bytesandartifact.size_bytesin the DB via direct repository access. - Client requests download: API reads from
{artifacts_dir}/{file_path}on disk and streams the response.
Implementation Phases
Phase 1: Configuration & Volume Infrastructure
crates/common/src/config.rs
- Add
artifacts_dir: StringtoConfigstruct with default/opt/attune/artifacts - Add
default_artifacts_dir()function
config.development.yaml
- Add
artifacts_dir: ./artifacts
config.docker.yaml
- Add
artifacts_dir: /opt/attune/artifacts
docker-compose.yaml
- Add
artifacts_datanamed volume - Mount
artifacts_data:/opt/attune/artifactsin: api (rw), all workers (rw), executor (ro) - Add
ATTUNE__ARTIFACTS_DIR: /opt/attune/artifactsto service environments where needed
Phase 2: Database Schema Changes
New migration: migrations/20250101000011_artifact_file_storage.sql
-- Add file_path to artifact_version for disk-based storage
ALTER TABLE artifact_version ADD COLUMN IF NOT EXISTS file_path TEXT;
-- Index for finding versions by file_path (orphan cleanup)
CREATE INDEX IF NOT EXISTS idx_artifact_version_file_path
ON artifact_version(file_path) WHERE file_path IS NOT NULL;
COMMENT ON COLUMN artifact_version.file_path IS
'Relative path from artifacts_dir root for disk-stored content. '
'When set, content BYTEA is NULL — file lives on shared volume.';
crates/common/src/models.rs — artifact_version module:
- Add
file_path: Option<String>toArtifactVersionstruct - Update
SELECT_COLUMNSandSELECT_COLUMNS_WITH_CONTENTto includefile_path
crates/common/src/repositories/artifact.rs — ArtifactVersionRepository:
- Add
file_path: Option<String>toCreateArtifactVersionInput - Wire
file_paththrough thecreatequery - Add
update_size_bytes(executor, version_id, size_bytes)method for worker finalization - Add
find_file_versions_by_execution(executor, execution_id)method — joinsartifact_version→artifactonartifact.executionto find all file-based versions for an execution
Phase 3: API Changes
Create Version Endpoint (modified)
POST /api/v1/artifacts/{id}/versions — currently create_version_json
Add a new endpoint or modify existing behavior:
POST /api/v1/artifacts/{id}/versions/file (new endpoint)
- Request body:
CreateFileVersionRequest { content_type: Option<String>, meta: Option<Value>, created_by: Option<String> } - No file content in the request — this is the key difference from
upload_version - API computes
file_pathfrom artifact ref + version number + content_type extension - Creates
artifact_versionrow withfile_pathset,contentNULL - Creates parent directory on disk:
{artifacts_dir}/{file_path_parent}/ - Returns
ArtifactVersionResponsewithfile_pathincluded
File path computation logic:
fn compute_file_path(artifact_ref: &str, version: i32, content_type: &str) -> String {
// "mypack.build_log" → "mypack/build_log"
let ref_path = artifact_ref.replace('.', "/");
let ext = extension_from_content_type(content_type);
format!("{}/v{}.{}", ref_path, version, ext)
}
Download Endpoints (modified)
GET /api/v1/artifacts/{id}/download and GET /api/v1/artifacts/{id}/versions/{v}/download:
- If
artifact_version.file_pathis set:- Resolve absolute path:
{artifacts_dir}/{file_path} - Verify file exists, return 404 if not
stat()the file for Content-Length header- Stream file content as response body
- Resolve absolute path:
- If
file_pathis NULL:- Fall back to existing BYTEA/JSON content from DB (backward compatible)
Upload Endpoint (unchanged for now)
POST /api/v1/artifacts/{id}/versions/upload (multipart) — continues to store in DB BYTEA. This remains available for non-execution uploads (external systems, small files, etc.).
Response DTO Changes
crates/api/src/dto/artifact.rs:
- Add
file_path: Option<String>toArtifactVersionResponse - Add
file_path: Option<String>toArtifactVersionSummary - Add
CreateFileVersionRequestDTO
Phase 4: Worker Changes
Environment Variable Injection
crates/worker/src/executor.rs — prepare_execution_context():
- Add
ATTUNE_ARTIFACTS_DIRto the standard env vars block:env.insert("ATTUNE_ARTIFACTS_DIR".to_string(), self.artifacts_dir.clone()); - The
ActionExecutorstruct needs to hold theartifacts_dirvalue (sourced from config)
Post-Execution Finalization
crates/worker/src/executor.rs — after execution completes (success or failure):
async fn finalize_artifacts(&self, execution_id: i64) -> Result<()>
- Query
artifact_versionrows joined throughartifact.execution = execution_idwherefile_path IS NOT NULL - For each version with a
file_path:- Resolve absolute path:
{artifacts_dir}/{file_path} tokio::fs::metadata(path).awaitto get file size- If file exists: update
artifact_version.size_bytesvia repository - If file doesn't exist: set
size_bytes = 0(execution didn't produce the file)
- Resolve absolute path:
- For each parent artifact: update
artifact.size_bytesto the latest version'ssize_bytes
This runs after every execution regardless of success/failure status, since even failed executions may have written partial artifacts.
Simplify Old ArtifactManager
crates/worker/src/artifacts.rs:
- The existing
ArtifactManageris a standalone prototype disconnected from the DB-backed system. It can be simplified to only handle theartifacts_dirpath resolution and directory creation, or removed entirely since the API now manages paths. - Keep the struct as a thin wrapper if it's useful for the finalization logic, but remove the
store_logs,store_result,store_filemethods that duplicate what the API does.
Phase 5: Retention & Cleanup
DB Trigger (existing, minor update)
The enforce_artifact_retention trigger fires AFTER INSERT ON artifact_version and deletes old version rows when the count exceeds the limit. This continues to work for row deletion. However, it cannot delete files on disk (triggers can't do filesystem I/O).
Orphan File Cleanup (new)
Add an async cleanup mechanism — either a periodic task in the worker/executor or a dedicated CLI command:
attune artifact cleanup (CLI) or periodic task:
- Scan all files under
{artifacts_dir}/ - For each file, check if a matching
artifact_version.file_pathrow exists - If no row exists (orphaned file), delete the file
- Also delete empty directories
This handles:
- Files left behind after the retention trigger deletes version rows
- Files from crashed executions that created directories but whose version rows were cleaned up
- Manual DB cleanup scenarios
Frequency: Daily or on-demand via CLI. Orphaned files are not harmful (just wasted disk space), so aggressive cleanup isn't critical.
Artifact Deletion Endpoint
The existing DELETE /api/v1/artifacts/{id} cascades to artifact_version rows via FK. Enhance it to also delete files on disk:
- Before deleting the DB row, query all versions with
file_path IS NOT NULL - Delete each file from disk
- Then delete the DB row (cascades to version rows)
- Clean up empty parent directories
Similarly for DELETE /api/v1/artifacts/{id}/versions/{v}.
Schema Summary
artifact table (unchanged)
Existing columns remain. size_bytes continues to reflect the latest version's size (updated by worker finalization for file-based artifacts, updated by DB trigger for DB-stored artifacts).
artifact_version table (modified)
| Column | Type | Notes |
|---|---|---|
id |
BIGSERIAL | PK |
artifact |
BIGINT | FK → artifact(id) ON DELETE CASCADE |
version |
INTEGER | Auto-assigned by next_artifact_version() |
content_type |
TEXT | MIME type |
size_bytes |
BIGINT | Set by worker finalization for file-based; set at insert for DB-stored |
content |
BYTEA | NULL for file-based artifacts; populated for DB-stored uploads |
content_json |
JSONB | For JSON content versions (unchanged) |
file_path |
TEXT | NEW — relative path from artifacts_dir. When set, content is NULL |
meta |
JSONB | Free-form metadata |
created_by |
TEXT | Who created this version |
created |
TIMESTAMPTZ | Immutable |
Invariant: Exactly one of content, content_json, or file_path should be non-NULL for a given version row.
Files Changed
| File | Changes |
|---|---|
crates/common/src/config.rs |
Add artifacts_dir field with default |
crates/common/src/models.rs |
Add file_path to ArtifactVersion |
crates/common/src/repositories/artifact.rs |
Wire file_path through create; add update_size_bytes, find_file_versions_by_execution |
crates/api/src/dto/artifact.rs |
Add file_path to version response DTOs; add CreateFileVersionRequest |
crates/api/src/routes/artifacts.rs |
New create_version_file endpoint; modify download endpoints for disk reads |
crates/api/src/state.rs |
No change needed — config already accessible via AppState.config |
crates/worker/src/executor.rs |
Inject ATTUNE_ARTIFACTS_DIR env var; add finalize_artifacts() post-execution |
crates/worker/src/service.rs |
Pass artifacts_dir config to ActionExecutor |
crates/worker/src/artifacts.rs |
Simplify or remove old ArtifactManager |
migrations/20250101000011_artifact_file_storage.sql |
Add file_path column to artifact_version |
config.development.yaml |
Add artifacts_dir: ./artifacts |
config.docker.yaml |
Add artifacts_dir: /opt/attune/artifacts |
docker-compose.yaml |
Add artifacts_data volume; mount in api + worker services |
Environment Variables
| Variable | Set By | Available To | Value |
|---|---|---|---|
ATTUNE_ARTIFACTS_DIR |
Worker | Execution process | Absolute path to artifacts volume (e.g., /opt/attune/artifacts) |
ATTUNE__ARTIFACTS_DIR |
Docker Compose | API / Worker services | Config override for artifacts_dir |
Backward Compatibility
- Existing DB-stored artifacts continue to work. Download endpoints check
file_pathfirst, fall back to BYTEA/JSON content. - Existing multipart upload endpoint unchanged. External systems can still upload small files via
POST /artifacts/{id}/versions/upload— those go to DB as before. - Progress and URL artifacts unchanged. They don't use
artifact_versioncontent at all. - No data migration needed. Existing artifacts have
file_path = NULLand continue to serve from DB.
Future Considerations
- External object storage (S3/MinIO): The
file_pathabstraction makes it straightforward to swap the local filesystem for S3 later — the path becomes an object key, and the download endpoint proxies or redirects. - Streaming writes: With disk-based storage, a future enhancement could allow the API to stream large file uploads directly to disk instead of buffering in memory.
- Artifact garbage collection: The orphan cleanup could be integrated into the executor's periodic maintenance loop alongside execution timeout monitoring.
- Cross-execution artifact access: Since artifacts are keyed by ref (not execution ID), a future enhancement could let actions declare artifact dependencies, and the worker could resolve and mount those paths.