re-uploading work
This commit is contained in:
702
docs/workflows/inquiry-handling.md
Normal file
702
docs/workflows/inquiry-handling.md
Normal file
@@ -0,0 +1,702 @@
|
||||
# Inquiry Handling - Human-in-the-Loop Workflows
|
||||
|
||||
## Overview
|
||||
|
||||
Inquiry handling enables **human-in-the-loop workflows** in Attune, allowing action executions to pause and wait for human input, approval, or decisions before continuing. This is essential for workflows that require manual intervention, approval gates, or interactive decision-making.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
1. **Action** - Returns a result containing an inquiry request
|
||||
2. **Worker** - Executes action and returns result with `__inquiry` marker
|
||||
3. **Executor (Completion Listener)** - Detects inquiry request and creates inquiry
|
||||
4. **Inquiry Record** - Database record tracking the inquiry state
|
||||
5. **API** - Endpoints for users to view and respond to inquiries
|
||||
6. **Executor (Inquiry Handler)** - Listens for responses and resumes executions
|
||||
7. **Notifier** - Sends real-time notifications about inquiry events
|
||||
|
||||
### Message Flow
|
||||
|
||||
```
|
||||
Action Execution → Worker completes → ExecutionCompleted message →
|
||||
Completion Listener detects __inquiry → Creates Inquiry record →
|
||||
Publishes InquiryCreated message → Notifier alerts users →
|
||||
User responds via API → API publishes InquiryResponded message →
|
||||
Inquiry Handler receives message → Updates execution with response →
|
||||
Execution continues/completes
|
||||
```
|
||||
|
||||
## Inquiry Request Format
|
||||
|
||||
### Action Result with Inquiry
|
||||
|
||||
Actions can request human input by including an `__inquiry` key in their result:
|
||||
|
||||
```json
|
||||
{
|
||||
"__inquiry": {
|
||||
"prompt": "Approve deployment to production?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {"type": "boolean"},
|
||||
"comments": {"type": "string"}
|
||||
},
|
||||
"required": ["approved"]
|
||||
},
|
||||
"assigned_to": 123,
|
||||
"timeout_seconds": 3600
|
||||
},
|
||||
"deployment_plan": {
|
||||
"target": "production",
|
||||
"version": "v2.5.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Inquiry Fields
|
||||
|
||||
| Field | Type | Required | Description |
|
||||
|-------|------|----------|-------------|
|
||||
| `prompt` | string | Yes | Question/prompt text displayed to user |
|
||||
| `response_schema` | JSON Schema | No | Schema defining expected response format |
|
||||
| `assigned_to` | integer | No | Identity ID of user assigned to respond |
|
||||
| `timeout_seconds` | integer | No | Seconds from creation until inquiry times out |
|
||||
|
||||
## Creating Inquiries
|
||||
|
||||
### From Python Actions
|
||||
|
||||
```python
|
||||
def run(deployment_plan):
|
||||
# Validate deployment plan
|
||||
validate_plan(deployment_plan)
|
||||
|
||||
# Request human approval
|
||||
return {
|
||||
"__inquiry": {
|
||||
"prompt": f"Approve deployment of {deployment_plan['version']} to production?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {"type": "boolean"},
|
||||
"reason": {"type": "string"}
|
||||
},
|
||||
"required": ["approved"]
|
||||
},
|
||||
"timeout_seconds": 7200 # 2 hours
|
||||
},
|
||||
"plan": deployment_plan
|
||||
}
|
||||
```
|
||||
|
||||
### From JavaScript Actions
|
||||
|
||||
```javascript
|
||||
async function run(config) {
|
||||
// Prepare deployment
|
||||
const plan = await prepareDeploy(config);
|
||||
|
||||
// Request approval with assigned user
|
||||
return {
|
||||
__inquiry: {
|
||||
prompt: `Deploy ${plan.serviceName} to ${plan.environment}?`,
|
||||
response_schema: {
|
||||
type: "object",
|
||||
properties: {
|
||||
approved: { type: "boolean" },
|
||||
comments: { type: "string" }
|
||||
}
|
||||
},
|
||||
assigned_to: config.approver_id,
|
||||
timeout_seconds: 3600
|
||||
},
|
||||
deployment: plan
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Inquiry Lifecycle
|
||||
|
||||
### Status Flow
|
||||
|
||||
```
|
||||
pending → responded (user provides response)
|
||||
pending → timeout (timeout_at expires)
|
||||
pending → cancelled (manual cancellation)
|
||||
```
|
||||
|
||||
### Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE attune.inquiry (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
execution BIGINT NOT NULL REFERENCES attune.execution(id),
|
||||
prompt TEXT NOT NULL,
|
||||
response_schema JSONB,
|
||||
assigned_to BIGINT REFERENCES attune.identity(id),
|
||||
status attune.inquiry_status_enum NOT NULL DEFAULT 'pending',
|
||||
response JSONB,
|
||||
timeout_at TIMESTAMPTZ,
|
||||
responded_at TIMESTAMPTZ,
|
||||
created TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### List Inquiries
|
||||
|
||||
**GET** `/api/v1/inquiries`
|
||||
|
||||
Query parameters:
|
||||
- `status` - Filter by status (pending, responded, timeout, cancelled)
|
||||
- `execution` - Filter by execution ID
|
||||
- `assigned_to` - Filter by assigned user ID
|
||||
- `page`, `per_page` - Pagination
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"id": 1,
|
||||
"execution": 123,
|
||||
"prompt": "Approve deployment?",
|
||||
"assigned_to": 5,
|
||||
"status": "pending",
|
||||
"has_response": false,
|
||||
"timeout_at": "2024-01-15T12:00:00Z",
|
||||
"created": "2024-01-15T10:00:00Z"
|
||||
}
|
||||
],
|
||||
"total": 1,
|
||||
"page": 1,
|
||||
"per_page": 50,
|
||||
"pages": 1
|
||||
}
|
||||
```
|
||||
|
||||
### Get Inquiry Details
|
||||
|
||||
**GET** `/api/v1/inquiries/:id`
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": 1,
|
||||
"execution": 123,
|
||||
"prompt": "Approve deployment to production?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {"type": "boolean"}
|
||||
}
|
||||
},
|
||||
"assigned_to": 5,
|
||||
"status": "pending",
|
||||
"response": null,
|
||||
"timeout_at": "2024-01-15T12:00:00Z",
|
||||
"responded_at": null,
|
||||
"created": "2024-01-15T10:00:00Z",
|
||||
"updated": "2024-01-15T10:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Respond to Inquiry
|
||||
|
||||
**POST** `/api/v1/inquiries/:id/respond`
|
||||
|
||||
Request body:
|
||||
```json
|
||||
{
|
||||
"response": {
|
||||
"approved": true,
|
||||
"comments": "LGTM - all tests passed"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"id": 1,
|
||||
"execution": 123,
|
||||
"status": "responded",
|
||||
"response": {
|
||||
"approved": true,
|
||||
"comments": "LGTM - all tests passed"
|
||||
},
|
||||
"responded_at": "2024-01-15T10:30:00Z"
|
||||
},
|
||||
"message": "Response submitted successfully"
|
||||
}
|
||||
```
|
||||
|
||||
### Cancel Inquiry
|
||||
|
||||
**POST** `/api/v1/inquiries/:id/cancel`
|
||||
|
||||
Cancels a pending inquiry (admin/system use).
|
||||
|
||||
## Message Queue Events
|
||||
|
||||
### InquiryCreated
|
||||
|
||||
Published when an inquiry is created.
|
||||
|
||||
Routing key: `inquiry.created`
|
||||
|
||||
Payload:
|
||||
```json
|
||||
{
|
||||
"inquiry_id": 1,
|
||||
"execution_id": 123,
|
||||
"prompt": "Approve deployment?",
|
||||
"response_schema": {...},
|
||||
"assigned_to": 5,
|
||||
"timeout_at": "2024-01-15T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### InquiryResponded
|
||||
|
||||
Published when a user responds to an inquiry.
|
||||
|
||||
Routing key: `inquiry.responded`
|
||||
|
||||
Payload:
|
||||
```json
|
||||
{
|
||||
"inquiry_id": 1,
|
||||
"execution_id": 123,
|
||||
"response": {
|
||||
"approved": true
|
||||
},
|
||||
"responded_by": 5,
|
||||
"responded_at": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
## Executor Service Integration
|
||||
|
||||
### Completion Listener
|
||||
|
||||
The completion listener detects inquiry requests in execution results:
|
||||
|
||||
```rust
|
||||
// Check if execution result contains an inquiry request
|
||||
if let Some(result) = &exec.result {
|
||||
if InquiryHandler::has_inquiry_request(result) {
|
||||
// Create inquiry and publish InquiryCreated message
|
||||
InquiryHandler::create_inquiry_from_result(
|
||||
pool,
|
||||
publisher,
|
||||
execution_id,
|
||||
result,
|
||||
).await?;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Inquiry Handler
|
||||
|
||||
The inquiry handler processes inquiry responses:
|
||||
|
||||
```rust
|
||||
// Listen for InquiryResponded messages
|
||||
consumer.consume_with_handler(|envelope: MessageEnvelope<InquiryRespondedPayload>| {
|
||||
async move {
|
||||
// Update execution with inquiry response
|
||||
Self::resume_execution_with_response(
|
||||
pool,
|
||||
publisher,
|
||||
execution,
|
||||
inquiry,
|
||||
response,
|
||||
).await?;
|
||||
}
|
||||
}).await?;
|
||||
```
|
||||
|
||||
### Timeout Checker
|
||||
|
||||
A background task periodically checks for expired inquiries:
|
||||
|
||||
```rust
|
||||
// Run every 60 seconds
|
||||
InquiryHandler::timeout_check_loop(pool, 60).await;
|
||||
```
|
||||
|
||||
This updates pending inquiries to `timeout` status when `timeout_at` is exceeded.
|
||||
|
||||
## Access Control
|
||||
|
||||
### Assignment Enforcement
|
||||
|
||||
If an inquiry has `assigned_to` set, only that user can respond:
|
||||
|
||||
```rust
|
||||
if let Some(assigned_to) = inquiry.assigned_to {
|
||||
if assigned_to != user_id {
|
||||
return Err(ApiError::Forbidden("Not authorized to respond"));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### RBAC Integration (Future)
|
||||
|
||||
Future versions will integrate with RBAC for:
|
||||
- Permission to respond to inquiries
|
||||
- Permission to cancel inquiries
|
||||
- Visibility filtering based on roles
|
||||
|
||||
## Timeout Handling
|
||||
|
||||
### Automatic Timeout
|
||||
|
||||
Inquiries with `timeout_at` set are automatically marked as timed out:
|
||||
|
||||
```sql
|
||||
UPDATE attune.inquiry
|
||||
SET status = 'timeout', updated = NOW()
|
||||
WHERE status = 'pending'
|
||||
AND timeout_at IS NOT NULL
|
||||
AND timeout_at < NOW();
|
||||
```
|
||||
|
||||
### Timeout Behavior
|
||||
|
||||
When an inquiry times out:
|
||||
1. Status changes to `timeout`
|
||||
2. Execution remains in current state
|
||||
3. Optional: Publish timeout event
|
||||
4. Optional: Resume execution with timeout indicator
|
||||
|
||||
## Real-Time Notifications
|
||||
|
||||
### WebSocket Integration
|
||||
|
||||
The Notifier service sends real-time notifications for inquiry events:
|
||||
|
||||
```javascript
|
||||
// Subscribe to inquiry notifications
|
||||
ws.send(JSON.stringify({
|
||||
type: "subscribe",
|
||||
filters: {
|
||||
entity_type: "inquiry",
|
||||
user_id: 5
|
||||
}
|
||||
}));
|
||||
|
||||
// Receive notification
|
||||
{
|
||||
"id": 123,
|
||||
"entity_type": "inquiry",
|
||||
"entity": "1",
|
||||
"activity": "created",
|
||||
"content": {
|
||||
"prompt": "Approve deployment?",
|
||||
"assigned_to": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Notification Triggers
|
||||
|
||||
- **inquiry.created** - New inquiry created
|
||||
- **inquiry.responded** - Inquiry received response
|
||||
- **inquiry.timeout** - Inquiry timed out
|
||||
- **inquiry.cancelled** - Inquiry was cancelled
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Deployment Approval
|
||||
|
||||
```python
|
||||
def deploy_to_production(config):
|
||||
# Prepare deployment
|
||||
plan = prepare_deployment(config)
|
||||
|
||||
# Request approval
|
||||
return {
|
||||
"__inquiry": {
|
||||
"prompt": f"Approve deployment of {config['service']} v{config['version']}?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {"type": "boolean"},
|
||||
"rollback_plan": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"assigned_to": get_on_call_engineer(),
|
||||
"timeout_seconds": 1800 # 30 minutes
|
||||
},
|
||||
"deployment_plan": plan
|
||||
}
|
||||
```
|
||||
|
||||
### Data Validation
|
||||
|
||||
```python
|
||||
def validate_data_import(data):
|
||||
# Check for anomalies
|
||||
anomalies = detect_anomalies(data)
|
||||
|
||||
if anomalies:
|
||||
return {
|
||||
"__inquiry": {
|
||||
"prompt": f"Found {len(anomalies)} anomalies. Continue import?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"continue": {"type": "boolean"},
|
||||
"exclude_records": {"type": "array", "items": {"type": "integer"}}
|
||||
}
|
||||
},
|
||||
"timeout_seconds": 3600
|
||||
},
|
||||
"anomalies": anomalies
|
||||
}
|
||||
|
||||
# No anomalies, proceed normally
|
||||
return import_data(data)
|
||||
```
|
||||
|
||||
### Configuration Review
|
||||
|
||||
```python
|
||||
def update_firewall_rules(rules):
|
||||
# Analyze impact
|
||||
impact = analyze_impact(rules)
|
||||
|
||||
if impact["severity"] == "high":
|
||||
return {
|
||||
"__inquiry": {
|
||||
"prompt": "High-impact firewall changes detected. Approve?",
|
||||
"response_schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {"type": "boolean"},
|
||||
"review_notes": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"assigned_to": get_security_team_lead(),
|
||||
"timeout_seconds": 7200
|
||||
},
|
||||
"impact_analysis": impact,
|
||||
"proposed_rules": rules
|
||||
}
|
||||
|
||||
# Low impact, apply immediately
|
||||
return apply_rules(rules)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Clear Prompts
|
||||
|
||||
Write clear, actionable prompts:
|
||||
|
||||
✅ Good: "Approve deployment of api-service v2.1.0 to production?"
|
||||
❌ Bad: "Continue?"
|
||||
|
||||
### 2. Reasonable Timeouts
|
||||
|
||||
Set appropriate timeout values:
|
||||
|
||||
- **Critical decisions**: 30-60 minutes
|
||||
- **Routine approvals**: 2-4 hours
|
||||
- **Non-urgent reviews**: 24-48 hours
|
||||
|
||||
### 3. Response Schemas
|
||||
|
||||
Define clear response schemas to validate user input:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"approved": {
|
||||
"type": "boolean",
|
||||
"description": "Whether to approve the action"
|
||||
},
|
||||
"comments": {
|
||||
"type": "string",
|
||||
"description": "Optional comments explaining the decision"
|
||||
}
|
||||
},
|
||||
"required": ["approved"]
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Assignment
|
||||
|
||||
Assign inquiries to specific users for accountability:
|
||||
|
||||
```python
|
||||
{
|
||||
"__inquiry": {
|
||||
"prompt": "...",
|
||||
"assigned_to": get_on_call_user_id()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Context Information
|
||||
|
||||
Include relevant context in the action result:
|
||||
|
||||
```python
|
||||
return {
|
||||
"__inquiry": {
|
||||
"prompt": "Approve deployment?"
|
||||
},
|
||||
"deployment_details": {
|
||||
"service": "api",
|
||||
"version": "v2.1.0",
|
||||
"changes": ["Added new endpoint", "Fixed bug #123"],
|
||||
"tests_passed": True,
|
||||
"ci_build_url": "https://ci.example.com/builds/456"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Inquiry Not Created
|
||||
|
||||
**Problem**: Action completes but no inquiry is created.
|
||||
|
||||
**Check**:
|
||||
1. Action result contains `__inquiry` key
|
||||
2. Completion listener is running
|
||||
3. Check executor logs for errors
|
||||
4. Verify inquiry table exists
|
||||
|
||||
### Execution Not Resuming
|
||||
|
||||
**Problem**: User responds but execution doesn't continue.
|
||||
|
||||
**Check**:
|
||||
1. InquiryResponded message was published (check API logs)
|
||||
2. Inquiry handler is running and consuming messages
|
||||
3. Check executor logs for errors processing response
|
||||
4. Verify execution exists and is in correct state
|
||||
|
||||
### Timeout Not Working
|
||||
|
||||
**Problem**: Inquiries not timing out automatically.
|
||||
|
||||
**Check**:
|
||||
1. Timeout checker loop is running
|
||||
2. `timeout_at` is set correctly in inquiry record
|
||||
3. Check system time/timezone configuration
|
||||
4. Review executor logs for timeout check errors
|
||||
|
||||
### Response Rejected
|
||||
|
||||
**Problem**: API rejects inquiry response.
|
||||
|
||||
**Check**:
|
||||
1. Inquiry is still in `pending` status
|
||||
2. Inquiry hasn't timed out
|
||||
3. User is authorized (if `assigned_to` is set)
|
||||
4. Response matches `response_schema` (when validation is enabled)
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Database Indexes
|
||||
|
||||
Ensure these indexes exist for efficient inquiry queries:
|
||||
|
||||
```sql
|
||||
CREATE INDEX idx_inquiry_status ON attune.inquiry(status);
|
||||
CREATE INDEX idx_inquiry_assigned_status ON attune.inquiry(assigned_to, status);
|
||||
CREATE INDEX idx_inquiry_timeout_at ON attune.inquiry(timeout_at) WHERE timeout_at IS NOT NULL;
|
||||
```
|
||||
|
||||
### Message Queue
|
||||
|
||||
- Use separate consumer for inquiry responses
|
||||
- Set appropriate prefetch count (10-20)
|
||||
- Enable message acknowledgment
|
||||
|
||||
### Timeout Checking
|
||||
|
||||
- Run timeout checker every 60-120 seconds
|
||||
- Use batched updates for efficiency
|
||||
- Monitor for long-running timeout queries
|
||||
|
||||
## Security
|
||||
|
||||
### Input Validation
|
||||
|
||||
Always validate inquiry responses:
|
||||
|
||||
```rust
|
||||
// TODO: Validate response against response_schema
|
||||
if let Some(schema) = &inquiry.response_schema {
|
||||
validate_json_schema(&request.response, schema)?;
|
||||
}
|
||||
```
|
||||
|
||||
### Authorization
|
||||
|
||||
Verify user permissions:
|
||||
|
||||
```rust
|
||||
// Check assignment
|
||||
if let Some(assigned_to) = inquiry.assigned_to {
|
||||
if assigned_to != user.id {
|
||||
return Err(ApiError::Forbidden("Not authorized"));
|
||||
}
|
||||
}
|
||||
|
||||
// Future: Check RBAC permissions
|
||||
if !user.has_permission("inquiry:respond") {
|
||||
return Err(ApiError::Forbidden("Missing permission"));
|
||||
}
|
||||
```
|
||||
|
||||
### Audit Trail
|
||||
|
||||
All inquiry responses are logged:
|
||||
|
||||
- Who responded
|
||||
- When they responded
|
||||
- What they responded with
|
||||
- Original inquiry context
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
|
||||
1. **Multi-step Approvals** - Chain multiple inquiries for approval workflows
|
||||
2. **Conditional Resumption** - Resume execution differently based on response
|
||||
3. **Inquiry Templates** - Reusable inquiry definitions
|
||||
4. **Bulk Operations** - Approve/reject multiple inquiries at once
|
||||
5. **Escalation** - Auto-reassign if no response within timeframe
|
||||
6. **Reminder Notifications** - Alert users of pending inquiries
|
||||
7. **Response Validation** - Validate responses against JSON schema
|
||||
8. **Inquiry History** - View history of all inquiries for an execution chain
|
||||
|
||||
### Integration Opportunities
|
||||
|
||||
- **Slack/Teams** - Respond to inquiries via chat
|
||||
- **Email** - Send inquiry notifications and accept email responses
|
||||
- **Mobile Apps** - Native mobile inquiry interface
|
||||
- **External Systems** - Webhook integration for external approval systems
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Workflow Orchestration](workflow-orchestration.md)
|
||||
- [Message Queue Architecture](message-queue.md)
|
||||
- [Notifier Service](notifier-service.md)
|
||||
- [API Documentation](api-overview.md)
|
||||
- [Executor Service](executor-service.md)
|
||||
Reference in New Issue
Block a user