[WIP] Workflows
This commit is contained in:
59
work-summary/2026-02-27-execution-hypertable.md
Normal file
59
work-summary/2026-02-27-execution-hypertable.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Execution Table → TimescaleDB Hypertable Conversion
|
||||
|
||||
**Date**: 2026-02-27
|
||||
**Scope**: Database migration, Rust code fixes, AGENTS.md updates
|
||||
|
||||
## Summary
|
||||
|
||||
Converted the `execution` table from a regular PostgreSQL table to a TimescaleDB hypertable partitioned on `created` (1-day chunks), consistent with the existing `event` and `enforcement` hypertable conversions. This enables automatic time-based partitioning, compression, and retention for execution data.
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
- **`updated` column preserved**: Unlike `event` (immutable) and `enforcement` (single update), executions are updated ~4 times during their lifecycle. The `updated` column and its BEFORE UPDATE trigger are kept because the timeout monitor and UI depend on them.
|
||||
- **`execution_history` preserved**: The execution_history hypertable tracks field-level diffs which remain valuable for a mutable table. Its continuous aggregates (`execution_status_hourly`, `execution_throughput_hourly`) are unchanged.
|
||||
- **7-day compression window is safe**: Executions complete within at most ~1 day, so all updates finish well before compression kicks in.
|
||||
- **New `execution_volume_hourly` continuous aggregate**: Queries the execution hypertable directly (like `event_volume_hourly` queries event), providing belt-and-suspenders volume monitoring alongside the history-based aggregates.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Migration: `migrations/20250101000010_execution_hypertable.sql`
|
||||
- Drops all FK constraints referencing `execution` (inquiry, workflow_execution, self-references, action, executor, workflow_def)
|
||||
- Changes PK from `(id)` to `(id, created)` (TimescaleDB requirement)
|
||||
- Converts to hypertable with `create_hypertable('execution', 'created', chunk_time_interval => '1 day')`
|
||||
- Adds compression policy (segmented by `action_ref`, after 7 days)
|
||||
- Adds 90-day retention policy
|
||||
- Adds `execution_volume_hourly` continuous aggregate with 30-minute refresh policy
|
||||
|
||||
### Rust Code Fixes
|
||||
- **`crates/executor/src/timeout_monitor.rs`**: Replaced `SELECT * FROM execution` with explicit column list. `SELECT *` on hypertables is fragile — the execution table has columns (`is_workflow`, `workflow_def`) not present in the Rust `Execution` model.
|
||||
- **`crates/api/tests/sse_execution_stream_tests.rs`**: Fixed references to non-existent `start_time` and `end_time` columns (replaced with `updated = NOW()`).
|
||||
- **`crates/common/src/repositories/analytics.rs`**: Added `ExecutionVolumeBucket` struct and `execution_volume_hourly` / `execution_volume_hourly_by_action` repository methods for the new continuous aggregate.
|
||||
|
||||
### AGENTS.md Updates
|
||||
- Added **Execution Table (TimescaleDB Hypertable)** documentation
|
||||
- Updated FK ON DELETE Policy to reflect execution as hypertable
|
||||
- Updated Nullable FK Fields to list all dropped FK constraints
|
||||
- Updated table count (still 20) and migration count (9 → 10)
|
||||
- Updated continuous aggregate count (5 → 6)
|
||||
- Updated development status to include execution hypertable
|
||||
- Added pitfall #19: never use `SELECT *` on hypertable-backed models
|
||||
- Added pitfall #20: execution/event/enforcement cannot be FK targets
|
||||
|
||||
## FK Constraints Dropped
|
||||
|
||||
| Source Column | Target | Disposition |
|
||||
|---|---|---|
|
||||
| `inquiry.execution` | `execution(id)` | Column kept as plain BIGINT |
|
||||
| `workflow_execution.execution` | `execution(id)` | Column kept as plain BIGINT |
|
||||
| `execution.parent` | `execution(id)` | Self-ref, column kept |
|
||||
| `execution.original_execution` | `execution(id)` | Self-ref, column kept |
|
||||
| `execution.workflow_def` | `workflow_definition(id)` | Column kept |
|
||||
| `execution.action` | `action(id)` | Column kept |
|
||||
| `execution.executor` | `identity(id)` | Column kept |
|
||||
| `execution.enforcement` | `enforcement(id)` | Already dropped in migration 000009 |
|
||||
|
||||
## Verification
|
||||
|
||||
- `cargo check --all-targets --workspace`: Zero warnings
|
||||
- `cargo test --workspace --lib`: All 90 unit tests pass
|
||||
- Integration test failures are pre-existing (missing `attune_test` database), unrelated to these changes
|
||||
91
work-summary/2026-02-27-with-items-concurrency-limiting.md
Normal file
91
work-summary/2026-02-27-with-items-concurrency-limiting.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# `with_items` Concurrency Limiting Implementation
|
||||
|
||||
**Date**: 2026-02-27
|
||||
**Scope**: `crates/executor/src/scheduler.rs`
|
||||
|
||||
## Problem
|
||||
|
||||
Workflow tasks with `with_items` and a `concurrency` limit dispatched all items simultaneously, ignoring the concurrency setting entirely. For example, a task with `concurrency: 3` and 20 items would dispatch all 20 at once instead of running at most 3 in parallel.
|
||||
|
||||
## Root Cause
|
||||
|
||||
The `dispatch_with_items_task` method iterated over all items in a single loop, creating a child execution and publishing it to the MQ for every item unconditionally. The `task_node.concurrency` value was logged but never used to gate dispatching.
|
||||
|
||||
## Solution
|
||||
|
||||
### Approach: DB-Based Sliding Window
|
||||
|
||||
All child execution records are created in the database up front (with fully-rendered inputs), but only the first `concurrency` items are published to the message queue. The remaining children stay at `Requested` status in the DB. As each item completes, `advance_workflow` queries for `Requested`-status siblings and publishes enough to refill the concurrency window.
|
||||
|
||||
This avoids the need for any auxiliary state in workflow variables — the database itself is the single source of truth for which items are pending vs in-flight.
|
||||
|
||||
### Initial Attempt: Workflow Variables (Abandoned)
|
||||
|
||||
The first implementation stored pending items as JSON metadata in `workflow_execution.variables` under `__pending_items__{task_name}`. This approach suffered from race conditions: when multiple items completed simultaneously, concurrent `advance_workflow` calls would read stale pending lists, pop the same item, and lose others. The result was that only the initial batch ever executed.
|
||||
|
||||
### Key Changes
|
||||
|
||||
#### 1. `dispatch_with_items_task` — Two-Phase Dispatch
|
||||
|
||||
- **Phase 1**: Creates ALL child execution records in the database. Each row has its input already rendered through the `WorkflowContext`, so no re-rendering is needed later.
|
||||
- **Phase 2**: Publishes only the first `min(total, concurrency)` to the MQ via `publish_execution_requested`. The rest stay at `Requested` status.
|
||||
|
||||
#### 2. `publish_execution_requested` — New Helper
|
||||
|
||||
Publishes an `ExecutionRequested` MQ message for an existing execution row. Used both during initial dispatch (Phase 2) and when filling concurrency slots on completion.
|
||||
|
||||
#### 3. `publish_pending_with_items_children` — Fill Concurrency Slots
|
||||
|
||||
Replaces the old `dispatch_next_pending_with_items`. Queries the database for siblings at `Requested` status (ordered by `task_index`), limited to the number of free slots, and publishes them. No workflow variables involved — the DB query `status = 'requested'` is the authoritative source of undispatched items.
|
||||
|
||||
#### 4. `advance_workflow` — Concurrency-Aware Completion
|
||||
|
||||
The with_items completion branch now:
|
||||
1. Counts **in-flight** siblings (`scheduling`, `scheduled`, `running` — NOT `requested`)
|
||||
2. Reads the `concurrency` limit from the task graph
|
||||
3. Calculates `free_slots = concurrency - in_flight`
|
||||
4. Calls `publish_pending_with_items_children(free_slots)` to fill the window
|
||||
5. Checks **all** non-terminal siblings (including `requested`) to decide whether to advance
|
||||
|
||||
## Concurrency Flow Example
|
||||
|
||||
For a task with 5 items and `concurrency: 3`:
|
||||
|
||||
```
|
||||
Initial: Create items 0-4 in DB; publish items 0, 1, 2 to MQ
|
||||
Items 3, 4 stay at Requested status in DB
|
||||
|
||||
Item 0 ✓: in_flight=2 (items 1,2), free_slots=1 → publish item 3
|
||||
siblings_remaining=3 (items 1,2,3,4 minus terminal) → return early
|
||||
|
||||
Item 1 ✓: in_flight=2 (items 2,3), free_slots=1 → publish item 4
|
||||
siblings_remaining=3 → return early
|
||||
|
||||
Item 2 ✓: in_flight=2 (items 3,4), free_slots=1 → no Requested items left
|
||||
siblings_remaining=2 → return early
|
||||
|
||||
Item 3 ✓: in_flight=1 (item 4), free_slots=2 → no Requested items left
|
||||
siblings_remaining=1 → return early
|
||||
|
||||
Item 4 ✓: in_flight=0, free_slots=3 → no Requested items left
|
||||
siblings_remaining=0 → advance workflow to successor tasks
|
||||
```
|
||||
|
||||
## Race Condition Handling
|
||||
|
||||
When multiple items complete simultaneously, concurrent `advance_workflow` calls may both query `status = 'requested'` and find the same pending items. The worst case is a brief over-dispatch (the same execution published to MQ twice). The scheduler handles this gracefully — the second message finds the execution already at `Scheduled`/`Running` status. This is a benign, self-correcting race that never loses items.
|
||||
|
||||
## Files Changed
|
||||
|
||||
- **`crates/executor/src/scheduler.rs`**:
|
||||
- Rewrote `dispatch_with_items_task` with two-phase create-then-publish approach
|
||||
- Added `publish_execution_requested` helper for publishing existing execution rows
|
||||
- Added `publish_pending_with_items_children` for DB-query-based slot filling
|
||||
- Rewrote `advance_workflow` with_items branch with in-flight counting and slot calculation
|
||||
- Updated unit tests for the new approach
|
||||
|
||||
## Testing
|
||||
|
||||
- All 104 executor tests pass (102 + 2 ignored)
|
||||
- 2 new unit tests for dispatch count and free slots calculations
|
||||
- Clean workspace build with no new warnings
|
||||
67
work-summary/2026-02-27-workflow-execution-orchestration.md
Normal file
67
work-summary/2026-02-27-workflow-execution-orchestration.md
Normal file
@@ -0,0 +1,67 @@
|
||||
# Workflow Execution Orchestration & UI Ref-Lock Fix
|
||||
|
||||
**Date**: 2026-02-27
|
||||
|
||||
## Problem
|
||||
|
||||
Two issues were addressed:
|
||||
|
||||
### 1. Workflow ref editable during edit mode (UI)
|
||||
When editing an existing workflow action, the pack selector and workflow name fields were editable, allowing users to change the action's ref — which should be immutable after creation.
|
||||
|
||||
### 2. Workflow execution runtime error
|
||||
Executing a workflow action produced:
|
||||
```
|
||||
Action execution failed: Internal error: Runtime not found: No runtime found for action: examples.single_echo (available: node.js, python, shell)
|
||||
```
|
||||
|
||||
**Root cause**: Workflow companion actions are created with `runtime: None` (they aren't scripts — they're orchestration definitions). When the executor's scheduler received an execution request for a workflow action, it dispatched it to a worker like any regular action. The worker then tried to find a runtime to execute it, failed (no runtime matches a `.workflow.yaml` entrypoint), and returned the error.
|
||||
|
||||
The `WorkflowCoordinator` in `crates/executor/src/workflow/coordinator.rs` existed as prototype code but was never integrated into the execution pipeline.
|
||||
|
||||
## Solution
|
||||
|
||||
### UI Fix (`web/src/pages/actions/WorkflowBuilderPage.tsx`)
|
||||
- Added `disabled={isEditing}` to the `SearchableSelect` pack selector (already supported a `disabled` prop)
|
||||
- Added `disabled={isEditing}` and conditional disabled styling to the workflow name `<input>`
|
||||
- Both fields are now locked when editing an existing workflow, preventing ref changes
|
||||
|
||||
### Workflow Orchestration (`crates/executor/src/scheduler.rs`)
|
||||
Added workflow detection and orchestration directly in the `ExecutionScheduler`:
|
||||
|
||||
1. **Detection**: `process_execution_requested` checks `action.workflow_def.is_some()` before dispatching to a worker
|
||||
2. **`process_workflow_execution`**: Loads the workflow definition, parses it into a `WorkflowDefinition`, builds a `TaskGraph`, creates a `workflow_execution` record, and marks the parent execution as Running
|
||||
3. **`dispatch_workflow_task`**: For each entry-point task in the graph, creates a child execution with the task's actual action ref (e.g., `core.echo` instead of `examples.single_echo`) and publishes an `ExecutionRequested` message. The child execution includes `workflow_task` metadata linking it back to the `workflow_execution` record.
|
||||
4. **`advance_workflow`** (public): Called by the completion listener when a workflow child task completes. Evaluates transitions from the completed task, schedules successor tasks, checks join barriers, and completes the workflow when all tasks are done.
|
||||
5. **`complete_workflow`**: Updates both the `workflow_execution` and parent `execution` records to their terminal state.
|
||||
|
||||
Key design decisions:
|
||||
- Child task executions re-enter the normal scheduling pipeline via MQ, so nested workflows (a workflow task that is itself a workflow) are handled recursively
|
||||
- Transition evaluation supports `succeeded()`, `failed()`, `timed_out()`, `always`, and custom conditions (custom defaults to fire-on-success for now)
|
||||
- Join barriers are respected — tasks with `join` counts wait for enough predecessors
|
||||
|
||||
### Completion Listener (`crates/executor/src/completion_listener.rs`)
|
||||
- Added workflow advancement: when a completed execution has `workflow_task` metadata, calls `ExecutionScheduler::advance_workflow` to schedule successor tasks or complete the workflow
|
||||
- Added an `AtomicUsize` round-robin counter for dispatching successor tasks to workers
|
||||
|
||||
### Binary Entry Point (`crates/executor/src/main.rs`)
|
||||
- Added `mod workflow;` so the binary crate can resolve `crate::workflow::graph::*` paths used in the scheduler
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `web/src/pages/actions/WorkflowBuilderPage.tsx` | Disable pack selector and name input when editing |
|
||||
| `crates/executor/src/scheduler.rs` | Workflow detection, orchestration, task dispatch, advancement |
|
||||
| `crates/executor/src/completion_listener.rs` | Workflow advancement on child task completion |
|
||||
| `crates/executor/src/main.rs` | Added `mod workflow;` |
|
||||
|
||||
## Architecture Note
|
||||
|
||||
This implementation bypasses the prototype `WorkflowCoordinator` (`crates/executor/src/workflow/coordinator.rs`) which had several issues: hardcoded `attune.` schema prefixes, `SELECT *` on the execution table, duplicate parent execution creation, and no integration with the MQ-based scheduling pipeline. The new implementation works directly within the scheduler and completion listener, using the existing repository layer and message queue infrastructure.
|
||||
|
||||
## Testing
|
||||
|
||||
- Existing executor unit tests pass
|
||||
- Workspace compiles with zero errors
|
||||
- No new warnings introduced (pre-existing warnings from unused prototype workflow code remain)
|
||||
50
work-summary/2026-02-27-workflow-param-resolution-fix.md
Normal file
50
work-summary/2026-02-27-workflow-param-resolution-fix.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Workflow Parameter Resolution Fix
|
||||
|
||||
**Date**: 2026-02-27
|
||||
**Scope**: `crates/executor/src/scheduler.rs`
|
||||
|
||||
## Problem
|
||||
|
||||
Workflow executions triggered via the API failed to resolve `{{ parameters.X }}` template expressions in task inputs. Instead of substituting the actual parameter value, the literal string `"{{ parameters.n }}"` was passed to the child action, causing runtime errors like:
|
||||
|
||||
```
|
||||
ValueError: invalid literal for int() with base 10: '{{ parameters.n }}'
|
||||
```
|
||||
|
||||
## Root Cause
|
||||
|
||||
The execution scheduler's `process_workflow_execution` and `advance_workflow` methods extracted workflow parameters from the execution's `config` field using:
|
||||
|
||||
```rust
|
||||
execution.config.as_ref()
|
||||
.and_then(|c| c.get("parameters").cloned())
|
||||
.unwrap_or(json!({}))
|
||||
```
|
||||
|
||||
This only handled the **wrapped** format `{"parameters": {"n": 5}}`, which is how child task executions store their config. However, when a workflow is triggered manually via the API, the config is stored in **flat** format `{"n": 5}` — the API places `request.parameters` directly into the execution's `config` column without wrapping it.
|
||||
|
||||
Because `config.get("parameters")` returned `None` for the flat format, `workflow_params` was set to `{}` (empty). The `WorkflowContext` was then built with no parameters, so `{{ parameters.n }}` failed to resolve. The error was silently swallowed by the fallback in `dispatch_workflow_task`, which used the raw (unresolved) input when template rendering failed.
|
||||
|
||||
## Fix
|
||||
|
||||
Added an `extract_workflow_params` helper function that handles both config formats, matching the existing logic in the worker's `ActionExecutor::prepare_execution_context`:
|
||||
|
||||
1. If config contains a `"parameters"` key → use that value (wrapped format)
|
||||
2. Otherwise, if config is a JSON object → use the entire object as parameters (flat format)
|
||||
3. Otherwise → return empty object
|
||||
|
||||
Replaced both extraction sites in the scheduler (`process_workflow_execution` and `advance_workflow`) with calls to this helper.
|
||||
|
||||
## Files Changed
|
||||
|
||||
- **`crates/executor/src/scheduler.rs`**:
|
||||
- Added `extract_workflow_params()` helper function
|
||||
- Updated `process_workflow_execution()` to use the helper
|
||||
- Updated `advance_workflow()` to use the helper
|
||||
- Added 6 unit tests covering wrapped, flat, None, non-object, empty, and precedence cases
|
||||
|
||||
## Testing
|
||||
|
||||
- All 104 existing executor tests pass
|
||||
- 6 new unit tests added and passing
|
||||
- No new warnings introduced
|
||||
73
work-summary/2026-02-27-workflow-template-resolution.md
Normal file
73
work-summary/2026-02-27-workflow-template-resolution.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Workflow Template Resolution Implementation
|
||||
|
||||
**Date**: 2026-02-27
|
||||
|
||||
## Problem
|
||||
|
||||
Workflow task parameters containing `{{ }}` template expressions were being passed to workers verbatim without resolution. For example, a workflow task with `seconds: "{{item}}"` would send the literal string `"{{item}}"` to `core.sleep`, which rejected it with `"ERROR: seconds must be a positive integer"`.
|
||||
|
||||
Three interconnected features were missing from the executor's workflow orchestration:
|
||||
|
||||
1. **Template resolution** — `{{ item }}`, `{{ parameters.x }}`, `{{ result().data.items }}`, etc. in task inputs were never rendered through the `WorkflowContext` before dispatching child executions.
|
||||
2. **`with_items` expansion** — Tasks declaring `with_items: "{{ number_list }}"` were not expanded into multiple parallel child executions (one per item).
|
||||
3. **`publish` variable processing** — Transition `publish` directives like `number_list: "{{ result().data.items }}"` were ignored, so variables never propagated between tasks.
|
||||
|
||||
A secondary issue was **type coercion**: `render_json` stringified all template results, so `"{{ item }}"` resolving to integer `5` became the string `"5"`, causing type validation failures in downstream actions.
|
||||
|
||||
## Root Cause
|
||||
|
||||
The `ExecutionScheduler::dispatch_workflow_task()` method passed `task_node.input` directly into the child execution's config without any template rendering. Neither `process_workflow_execution` (entry-point dispatch) nor `advance_workflow` (successor dispatch) constructed or used a `WorkflowContext`. The `publish` directives on transitions were completely ignored in `advance_workflow`.
|
||||
|
||||
## Changes
|
||||
|
||||
### `crates/executor/src/workflow/context.rs`
|
||||
|
||||
- **Function-call expressions**: Added support for `result()`, `result().path.to.field`, `succeeded()`, `failed()`, and `timed_out()` in the expression evaluator via `try_evaluate_function_call()`.
|
||||
- **`TaskOutcome` enum**: New enum (`Succeeded`, `Failed`, `TimedOut`) to track the last completed task's status for function expressions.
|
||||
- **`set_last_task_outcome()`**: Records the result and outcome of the most recently completed task.
|
||||
- **Type-preserving `render_json`**: When a JSON string value is a pure template expression (the entire string is `{{ expr }}`), `render_json` now returns the raw `JsonValue` from the expression instead of stringifying it. Added `try_evaluate_pure_expression()` helper. This means `"{{ item }}"` resolving to `5` stays as integer `5`, not string `"5"`.
|
||||
- **`rebuild()` constructor**: Reconstructs a `WorkflowContext` from persisted workflow state (stored variables, parameters, and completed task results). Used by the scheduler when advancing a workflow.
|
||||
- **`export_variables()`**: Exports workflow variables as a JSON object for persisting back to the `workflow_execution.variables` column.
|
||||
- **Updated `publish_from_result()`**: Uses type-preserving `render_json` for publish expressions so arrays/numbers/booleans retain their types.
|
||||
- **18 unit tests**: All passing, including new tests for type preservation, `result()` function, `succeeded()`/`failed()`, publish with result function, rebuild, and the exact `with_items` integer scenario from the failing workflow.
|
||||
|
||||
### `crates/executor/src/scheduler.rs`
|
||||
|
||||
- **Template resolution in `dispatch_workflow_task()`**: Now accepts a `WorkflowContext` parameter and renders `task_node.input` through `wf_ctx.render_json()` before wrapping in the execution config.
|
||||
- **Initial context in `process_workflow_execution()`**: Builds a `WorkflowContext` from the parent execution's parameters and workflow-level vars, passes it to entry-point task dispatch.
|
||||
- **Context reconstruction in `advance_workflow()`**: Rebuilds the `WorkflowContext` from the `workflow_execution.variables` column plus results of all completed child executions. Sets `last_task_outcome` from the just-completed execution.
|
||||
- **`publish` processing**: Iterates transition `publish` directives when a transition fires, evaluates expressions through the context, and persists updated variables back to the `workflow_execution` record.
|
||||
- **`with_items` expansion**: New `dispatch_with_items_task()` method resolves the `with_items` expression to a JSON array, then creates one child execution per item with `item`/`index` set on the context. Each child gets `task_index` set in its `WorkflowTaskMetadata`.
|
||||
- **`with_items` completion tracking**: In `advance_workflow()`, tasks with `task_index` (indicating `with_items`) are only marked completed/failed when ALL sibling items for that task name are done.
|
||||
|
||||
### `packs/examples/actions/list_example.sh` & `list_example.yaml`
|
||||
|
||||
- Rewrote shell script from `bash`+`jq` (unavailable in worker containers) to pure POSIX shell with DOTENV parameter parsing, matching the core pack pattern.
|
||||
- Changed `parameter_format` from `json` to `dotenv`.
|
||||
|
||||
### `packs.external/python_example/actions/list_numbers.py` & `list_numbers.yaml`
|
||||
|
||||
- New action `python_example.list_numbers` that returns `{"items": list(range(start, n+start))}`.
|
||||
- Parameters: `n` (default 10), `start` (default 0). JSON output format, Python ≥3.9.
|
||||
|
||||
## Workflow Flow (After Fix)
|
||||
|
||||
For the `examples.hello_workflow`:
|
||||
|
||||
```
|
||||
1. generate_numbers task dispatched with rendered input {count: 5, n: 5}
|
||||
2. python_example.list_numbers returns {items: [0, 1, 2, 3, 4]}
|
||||
3. Transition publish: number_list = result().data.items → [0,1,2,3,4]
|
||||
Variables persisted to workflow_execution record
|
||||
4. sleep_2 dispatched with with_items: "{{ number_list }}"
|
||||
→ 5 child executions created, each with item/index context
|
||||
→ seconds: "{{item}}" renders to 0, 1, 2, 3, 4 (integers, not strings)
|
||||
5. All sleep items complete → task marked done → echo_3 dispatched
|
||||
6. Workflow completes
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
- All 96 executor unit tests pass (0 failures)
|
||||
- All 18 workflow context tests pass (including 8 new tests)
|
||||
- Full workspace compiles with no new warnings (30 pre-existing)
|
||||
141
work-summary/2026-02-event-hypertable-migration.md
Normal file
141
work-summary/2026-02-event-hypertable-migration.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Event & Enforcement Tables → TimescaleDB Hypertable Migration
|
||||
|
||||
**Date:** 2026-02
|
||||
**Scope:** Database migrations, Rust models/repositories/API, Web UI
|
||||
|
||||
## Summary
|
||||
|
||||
Converted the `event` and `enforcement` tables from regular PostgreSQL tables to TimescaleDB hypertables, and removed the now-unnecessary `event_history` and `enforcement_history` tables.
|
||||
|
||||
- **Events** are immutable after insert (never updated), so a separate change-tracking history table added no value.
|
||||
- **Enforcements** are updated exactly once (~1 second after creation, to set status from `created` to `processed` or `disabled`), well before the 7-day compression window. A history table tracking one deterministic status change per row was unnecessary overhead.
|
||||
|
||||
Both tables now benefit from automatic time-based partitioning, compression, and retention directly.
|
||||
|
||||
## Motivation
|
||||
|
||||
The `event_history` and `enforcement_history` hypertables were created alongside `execution_history` and `worker_history` to track field-level changes. However:
|
||||
|
||||
- **Events** are never modified after creation — no code path in the API, executor, worker, or sensor ever updates an event row. The history trigger was recording INSERT operations only, duplicating data already in the `event` table.
|
||||
- **Enforcements** undergo a single, predictable status transition (created → processed/disabled) within ~1 second. The history table recorded one INSERT and one UPDATE per enforcement — the INSERT was redundant, and the UPDATE only changed `status`. The new `resolved_at` column captures this lifecycle directly on the enforcement row itself.
|
||||
|
||||
## Changes
|
||||
|
||||
### Database Migrations
|
||||
|
||||
**`000004_trigger_sensor_event_rule.sql`**:
|
||||
- Removed `updated` column from the `event` table
|
||||
- Removed `update_event_updated` trigger
|
||||
- Replaced `updated` column with `resolved_at TIMESTAMPTZ` (nullable) on the `enforcement` table
|
||||
- Removed `update_enforcement_updated` trigger
|
||||
- Updated column comments for enforcement (status lifecycle, resolved_at semantics)
|
||||
|
||||
**`000008_notify_triggers.sql`**:
|
||||
- Updated enforcement NOTIFY trigger payloads: `updated` → `resolved_at`
|
||||
|
||||
**`000009_timescaledb_history.sql`**:
|
||||
- Removed `event_history` table, all its indexes, trigger function, trigger, compression and retention policies
|
||||
- Removed `enforcement_history` table, all its indexes, trigger function, trigger, compression and retention policies
|
||||
- Added hypertable conversion for `event` table:
|
||||
- Dropped FK constraint from `enforcement.event` → `event(id)`
|
||||
- Changed PK from `(id)` to `(id, created)`
|
||||
- Converted to hypertable with 1-day chunk interval
|
||||
- Compression segmented by `trigger_ref`, retention 90 days
|
||||
- Added hypertable conversion for `enforcement` table:
|
||||
- Dropped FK constraint from `execution.enforcement` → `enforcement(id)`
|
||||
- Changed PK from `(id)` to `(id, created)`
|
||||
- Converted to hypertable with 1-day chunk interval
|
||||
- Compression segmented by `rule_ref`, retention 90 days
|
||||
- Updated `event_volume_hourly` continuous aggregate to query `event` table directly
|
||||
- Updated `enforcement_volume_hourly` continuous aggregate to query `enforcement` table directly
|
||||
|
||||
### Rust Code — Events
|
||||
|
||||
**`crates/common/src/models.rs`**:
|
||||
- Removed `updated` field from `Event` struct
|
||||
- Removed `Event` variant from `HistoryEntityType` enum
|
||||
|
||||
**`crates/common/src/repositories/event.rs`**:
|
||||
- Removed `UpdateEventInput` struct and `Update` trait implementation for `EventRepository`
|
||||
- Updated all SELECT queries to remove `updated` column
|
||||
|
||||
**`crates/api/src/dto/event.rs`**:
|
||||
- Removed `updated` field from `EventResponse`
|
||||
|
||||
**`crates/common/tests/event_repository_tests.rs`**:
|
||||
- Removed all update tests
|
||||
- Renamed timestamp test to `test_event_created_timestamp_auto_set`
|
||||
- Updated `test_delete_event_enforcement_retains_event_id` (FK dropped, so enforcement.event is now a dangling reference after event deletion)
|
||||
|
||||
### Rust Code — Enforcements
|
||||
|
||||
**`crates/common/src/models.rs`**:
|
||||
- Replaced `updated: DateTime<Utc>` with `resolved_at: Option<DateTime<Utc>>` on `Enforcement` struct
|
||||
- Removed `Enforcement` variant from `HistoryEntityType` enum
|
||||
- Updated `FromStr`, `Display`, and `table_name()` implementations (only `Execution` and `Worker` remain)
|
||||
|
||||
**`crates/common/src/repositories/event.rs`**:
|
||||
- Added `resolved_at: Option<DateTime<Utc>>` to `UpdateEnforcementInput`
|
||||
- Updated all SELECT queries to use `resolved_at` instead of `updated`
|
||||
- Update query no longer appends `, updated = NOW()` — `resolved_at` is set explicitly by the caller
|
||||
|
||||
**`crates/api/src/dto/event.rs`**:
|
||||
- Replaced `updated` with `resolved_at: Option<DateTime<Utc>>` on `EnforcementResponse`
|
||||
|
||||
**`crates/executor/src/enforcement_processor.rs`**:
|
||||
- Both status update paths (Processed and Disabled) now set `resolved_at: Some(chrono::Utc::now())`
|
||||
- Updated test mock enforcement struct
|
||||
|
||||
**`crates/common/tests/enforcement_repository_tests.rs`**:
|
||||
- Updated all tests to use `resolved_at` instead of `updated`
|
||||
- Renamed `test_create_enforcement_with_invalid_event_fails` → `test_create_enforcement_with_nonexistent_event_succeeds` (FK dropped)
|
||||
- Renamed `test_enforcement_timestamps_auto_managed` → `test_enforcement_resolved_at_lifecycle`
|
||||
- All `UpdateEnforcementInput` usages now include `resolved_at` field
|
||||
|
||||
### Rust Code — History Infrastructure
|
||||
|
||||
**`crates/api/src/routes/history.rs`**:
|
||||
- Removed `get_event_history` and `get_enforcement_history` endpoints
|
||||
- Removed `/events/{id}/history` and `/enforcements/{id}/history` routes
|
||||
- Updated doc comments to list only `execution` and `worker`
|
||||
|
||||
**`crates/api/src/dto/history.rs`**:
|
||||
- Updated entity type comment
|
||||
|
||||
**`crates/common/src/repositories/entity_history.rs`**:
|
||||
- Updated tests to remove `Event` and `Enforcement` variant assertions
|
||||
- Both now correctly fail to parse as `HistoryEntityType`
|
||||
|
||||
### Web UI
|
||||
|
||||
**`web/src/pages/events/EventDetailPage.tsx`**:
|
||||
- Removed `EntityHistoryPanel` component
|
||||
|
||||
**`web/src/pages/enforcements/EnforcementDetailPage.tsx`**:
|
||||
- Removed `EntityHistoryPanel` component
|
||||
- Added `resolved_at` display in Overview card ("Resolved At" field, shows "Pending" when null)
|
||||
- Added `resolved_at` display in Metadata sidebar
|
||||
|
||||
**`web/src/hooks/useHistory.ts`**:
|
||||
- Removed `"event"` and `"enforcement"` from `HistoryEntityType` union and `pluralMap`
|
||||
- Removed `useEventHistory` and `useEnforcementHistory` convenience hooks
|
||||
|
||||
**`web/src/hooks/useEnforcementStream.ts`**:
|
||||
- Removed history query invalidation (no more enforcement_history table)
|
||||
|
||||
### Documentation
|
||||
|
||||
- Updated `AGENTS.md`: table counts (22→20), history entity list, FK policy, enforcement lifecycle (resolved_at), pitfall #17
|
||||
- Updated `docs/plans/timescaledb-entity-history.md`: removed event_history and enforcement_history from all tables, added notes about both hypertables
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
1. **Composite PK `(id, created)` on both tables**: Required by TimescaleDB — the partitioning column must be part of the PK. The `id` column retains its `BIGSERIAL` for unique identification; `created` is added for partitioning.
|
||||
|
||||
2. **Dropped FKs targeting hypertables**: TimescaleDB hypertables cannot be the target of foreign key constraints. Affected: `enforcement.event → event(id)` and `execution.enforcement → enforcement(id)`. Both columns remain as plain BIGINT for application-level joins. Since the original FKs were `ON DELETE SET NULL` (soft references), this is a minor change — the columns may now become dangling references if the referenced row is deleted.
|
||||
|
||||
3. **`resolved_at` instead of `updated`**: The `updated` column was a generic auto-managed timestamp. The new `resolved_at` column is semantically meaningful — it records specifically when the enforcement was resolved (status transitioned away from `created`). It is `NULL` while the enforcement is pending, making it easy to query for unresolved enforcements. The executor sets it explicitly alongside the status change.
|
||||
|
||||
4. **Compression segmentation**: Event table segments by `trigger_ref`, enforcement table segments by `rule_ref` — matching the most common query patterns for each table.
|
||||
|
||||
5. **90-day retention for both**: Aligned with execution history retention since events and enforcements are primary operational records in the event-driven pipeline.
|
||||
69
work-summary/2026-02-remove-action-is-workflow.md
Normal file
69
work-summary/2026-02-remove-action-is-workflow.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Remove `is_workflow` from Action Table & Add Workflow Edit Button
|
||||
|
||||
**Date**: 2026-02
|
||||
|
||||
## Summary
|
||||
|
||||
Removed the redundant `is_workflow` boolean column from the `action` table throughout the entire stack. An action being a workflow is fully determined by having a non-null `workflow_def` FK — the boolean was unnecessary. Also added a workflow edit button and visual indicator to the Actions page UI.
|
||||
|
||||
## Changes
|
||||
|
||||
### Backend — Drop `is_workflow` from Action
|
||||
|
||||
**`crates/common/src/models.rs`**
|
||||
- Removed `is_workflow: bool` field from the `Action` struct
|
||||
|
||||
**`crates/common/src/repositories/action.rs`**
|
||||
- Removed `is_workflow` from all SELECT column lists (9 queries)
|
||||
- Updated `find_workflows()` to use `WHERE workflow_def IS NOT NULL` instead of `WHERE is_workflow = true`
|
||||
- Updated `link_workflow_def()` to only `SET workflow_def = $2` (no longer sets `is_workflow = true`)
|
||||
|
||||
**`crates/api/src/dto/action.rs`**
|
||||
- Removed `is_workflow` field from `ActionResponse` and `ActionSummary` DTOs
|
||||
- Added `workflow_def: Option<i64>` field to both DTOs (non-null means this action is a workflow)
|
||||
- Updated `From<Action>` impls accordingly
|
||||
|
||||
**`crates/api/src/validation/params.rs`**
|
||||
- Removed `is_workflow` from test fixture `make_action()`
|
||||
|
||||
**Comments updated in:**
|
||||
- `crates/api/src/routes/workflows.rs` — companion action helper functions
|
||||
- `crates/common/src/workflow/registrar.rs` — companion action creation
|
||||
- `crates/executor/src/workflow/registrar.rs` — companion action creation
|
||||
|
||||
### Database Migration
|
||||
|
||||
**`migrations/20250101000006_workflow_system.sql`** (modified in-place, no production deployments)
|
||||
- Removed `ADD COLUMN is_workflow BOOLEAN DEFAULT false NOT NULL` from ALTER TABLE
|
||||
- Removed `idx_action_is_workflow` partial index
|
||||
- Updated `workflow_action_link` view to use `LEFT JOIN action a ON a.workflow_def = wd.id` (dropped `AND a.is_workflow = true` filter)
|
||||
- Updated column comment on `workflow_def`
|
||||
|
||||
> Note: `execution.is_workflow` is a separate DB-level column used by PostgreSQL notification triggers and was NOT removed. It exists only in SQL (not in the Rust `Execution` model).
|
||||
|
||||
### Frontend — Workflow Edit Button & Indicator
|
||||
|
||||
**TypeScript types updated** (4 files):
|
||||
- `web/src/api/models/ActionResponse.ts` — added `workflow_def?: number | null`
|
||||
- `web/src/api/models/ActionSummary.ts` — added `workflow_def?: number | null`
|
||||
- `web/src/api/models/PaginatedResponse_ActionSummary.ts` — added `workflow_def?: number | null`
|
||||
- `web/src/api/models/ApiResponse_ActionResponse.ts` — added `workflow_def?: number | null`
|
||||
|
||||
**`web/src/pages/actions/ActionsPage.tsx`**
|
||||
- **Action list sidebar**: Workflow actions now show a purple `GitBranch` icon next to their label
|
||||
- **Action detail view**: Workflow actions show a purple "Edit Workflow" button (with `Pencil` icon) that navigates to `/actions/workflows/:ref/edit`
|
||||
|
||||
### Prior Fix — Workflow Save Upsert (same session)
|
||||
|
||||
**`web/src/pages/actions/WorkflowBuilderPage.tsx`**
|
||||
- Fixed workflow save from "new" page when workflow already exists
|
||||
- On 409 CONFLICT from POST, automatically falls back to PUT (update) with the same data
|
||||
- Constructs the workflow ref as `{packRef}.{name}` for the fallback PUT call
|
||||
|
||||
## Design Rationale
|
||||
|
||||
The `is_workflow` boolean on the action table was fully redundant:
|
||||
- A workflow action always has `workflow_def IS NOT NULL`
|
||||
- A workflow action's entrypoint always ends in `.workflow.yaml`
|
||||
- The executor detects workflows by looking up `workflow_definition` by ref, not by checking `is_workflow`
|
||||
- No runtime code path depended on the boolean that couldn't use `workflow_def IS NOT NULL` instead
|
||||
Reference in New Issue
Block a user