re-uploading work

This commit is contained in:
2026-02-04 17:46:30 -06:00
commit 3b14c65998
1388 changed files with 381262 additions and 0 deletions

View File

@@ -0,0 +1,313 @@
# Docker Migrations and Startup Configuration Fixes
**Date**: 2026-01-31
**Status**: ✅ Complete
**Issue**: Services failing to start due to missing database migrations and configuration errors
## Problems Solved
### 1. Database Migrations Not Running
**Error**: `enum WorkerType does not have variant constructor docker`
**Root Cause**: Database schema (enums, tables, triggers) wasn't being created when Docker containers started, causing enum type errors when services tried to query the database.
**Solution**: Created automated migration system that runs before services start.
### 2. Port Conflicts
**Error**: `address already in use` for ports 5432 (PostgreSQL) and 5672 (RabbitMQ)
**Root Cause**: System-level PostgreSQL and RabbitMQ services were already running and using the same ports.
**Solution**: Created helper script to stop system services and documented port conflict resolution.
### 3. Configuration Errors
**Error**: Multiple configuration validation failures
**Issues Fixed**:
- `worker_type: docker` → Changed to `worker_type: container` (invalid enum value)
- `ENCRYPTION_KEY` too short → Extended to 60+ characters
- Wrong environment variable names → Fixed to use `ATTUNE__` prefix
## Implementation Details
### Migration System
**Created Files**:
1. **`docker/run-migrations.sh`** (162 lines)
- Waits for PostgreSQL to be ready
- Tracks applied migrations in `_migrations` table
- Runs migrations in sorted order with transaction safety
- Provides detailed progress output with color coding
- Handles errors gracefully with rollback
2. **`docker/init-roles.sql`** (19 lines)
- Creates required PostgreSQL roles (`svc_attune`, `attune_api`)
- Grants necessary permissions
- Runs before migrations to satisfy GRANT statements
**Updated Files**:
- **`docker-compose.yaml`**:
- Added `migrations` service using `postgres:16-alpine` image
- Configured to run before all Attune services
- Services depend on `migrations` with `condition: service_completed_successfully`
- Mounts migration scripts and SQL files
### Port Conflict Resolution
**Created Files**:
1. **`scripts/stop-system-services.sh`** (184 lines)
- Stops PostgreSQL, RabbitMQ, Redis system services
- Verifies ports are free (5432, 5672, 6379, 8080, 8081, 3000)
- Cleans up orphaned Docker containers
- Interactive prompts for disabling services on boot
2. **`docker/PORT_CONFLICTS.md`** (303 lines)
- Comprehensive troubleshooting guide
- Port conflict table
- Multiple resolution methods
- Alternative approaches (changing ports, using system services)
### Configuration Fixes
**Files Modified**:
1. **`docker-compose.yaml`**:
- Fixed: `ENCRYPTION_KEY``ATTUNE__SECURITY__ENCRYPTION_KEY`
- Fixed: `JWT_SECRET``ATTUNE__SECURITY__JWT_SECRET`
- Added: `ATTUNE__WORKER__WORKER_TYPE: container`
- Updated default encryption key length to 60+ characters
2. **`config.docker.yaml`**:
- Changed `worker_type: docker``worker_type: container`
3. **`env.docker.example`**:
- Updated `ENCRYPTION_KEY` example to 60+ characters
- Added proper documentation for environment variable format
### Docker Build Race Conditions (Bonus)
**Also Fixed**:
- Added `sharing=locked` to BuildKit cache mounts in `docker/Dockerfile`
- Created `make docker-cache-warm` target for optimal build performance
- Documented race condition solutions in `docker/DOCKER_BUILD_RACE_CONDITIONS.md`
## Migration System Architecture
```
┌─────────────────────────────────────────────────┐
│ docker compose up -d │
└────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Infrastructure Services Start │
│ - PostgreSQL (postgres:16-alpine) │
│ - RabbitMQ (rabbitmq:3.13-management-alpine) │
│ - Redis (redis:7-alpine) │
└────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Wait for Services to be Healthy │
│ (healthchecks pass) │
└────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Migrations Service Starts │
│ 1. Run docker/init-roles.sql │
│ - Create svc_attune role │
│ - Create attune_api role │
│ - Grant permissions │
│ 2. Create _migrations tracking table │
│ 3. Run migrations in order: │
│ - Check if already applied │
│ - Run in transaction │
│ - Mark as applied │
│ 4. Exit with success │
└────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Attune Services Start (depend on migrations) │
│ - attune-api (port 8080) │
│ - attune-executor │
│ - attune-worker │
│ - attune-sensor │
│ - attune-notifier (port 8081) │
│ - attune-web (port 3000) │
└─────────────────────────────────────────────────┘
```
## Migration Tracking
The migration system creates a `_migrations` table to track applied migrations:
```sql
CREATE TABLE IF NOT EXISTS _migrations (
id SERIAL PRIMARY KEY,
filename VARCHAR(255) UNIQUE NOT NULL,
applied_at TIMESTAMP DEFAULT NOW()
);
```
This prevents re-running migrations and allows for idempotent deployments.
## Results
### Before
- ❌ Services failed to start with enum errors
- ❌ Port conflicts prevented container startup
- ❌ Configuration validation errors
- ❌ Manual database setup required
- ❌ No migration tracking
### After
- ✅ All 16 migrations apply automatically on first startup
- ✅ Migrations tracked and skipped on subsequent runs
- ✅ API service healthy and responding on port 8080
- ✅ Web UI accessible on port 3000
- ✅ Infrastructure services running correctly
- ✅ Executor, sensor, notifier services operational
- ✅ Configuration properly validated
- ⚠️ Worker service needs Python runtime (separate issue)
## Testing Results
```bash
$ docker compose ps
NAME STATUS PORTS
attune-api Up (healthy) 0.0.0.0:8080->8080/tcp
attune-executor Up (health: starting) 8080/tcp
attune-notifier Up (health: starting) 0.0.0.0:8081->8081/tcp
attune-postgres Up (healthy) 0.0.0.0:5432->5432/tcp
attune-rabbitmq Up (healthy) 0.0.0.0:5672->5672/tcp
attune-redis Up (healthy) 0.0.0.0:6379->6379/tcp
attune-sensor Up (health: starting) 8080/tcp
attune-web Up (healthy) 0.0.0.0:3000->80/tcp
attune-worker Restarting (Python issue)
$ curl http://localhost:8080/health
{"status":"ok"}
```
## Usage
### First-Time Setup
```bash
# Stop system services (if needed)
./scripts/stop-system-services.sh
# Start everything
docker compose up -d
# Check status
docker compose ps
# View migration logs
docker compose logs migrations
# Check API health
curl http://localhost:8080/health
```
### Subsequent Starts
```bash
# Migrations only run if new ones are detected
docker compose up -d
# Database schema persists in postgres_data volume
# Already-applied migrations are skipped automatically
```
### Troubleshooting
```bash
# Reset database completely
docker compose down -v # WARNING: Deletes all data
docker compose up -d
# Check migration status
docker compose exec postgres psql -U attune -d attune -c "SELECT * FROM _migrations;"
# View service logs
docker compose logs api
docker compose logs migrations
```
## Known Issues
### Worker Service - Python Runtime Missing
**Status**: Not Critical (services work without worker)
**Error**: `Python validation failed: No such file or directory (os error 2)`
**Cause**: Worker container doesn't have Python installed but tries to validate Python runtime
**Solution Options**:
1. Install Python in worker container (Dockerfile update)
2. Make Python runtime validation optional
3. Use shell-only actions until fixed
This doesn't block core functionality - API, executor, sensor, and notifier all work correctly.
## Files Created/Modified
### Created (9 files)
- `docker/run-migrations.sh` - Migration runner script
- `docker/init-roles.sql` - PostgreSQL role initialization
- `docker/PORT_CONFLICTS.md` - Port conflict resolution guide
- `scripts/stop-system-services.sh` - System service management
- `docker/DOCKER_BUILD_RACE_CONDITIONS.md` - Build optimization guide
- `docker/BUILD_QUICKSTART.md` - Quick start guide
- `docker/.dockerbuild-quickref.txt` - Quick reference card
- `work-summary/docker-build-race-fix.md` - Build race fix summary
- `work-summary/docker-migrations-startup-fix.md` - This file
### Modified (6 files)
- `docker-compose.yaml` - Added migrations service, fixed env vars
- `docker/Dockerfile` - Added cache sharing locks
- `config.docker.yaml` - Fixed worker_type enum value
- `env.docker.example` - Updated encryption key length
- `Makefile` - Added docker helpers
- `README.md` - Updated Docker deployment instructions
## Environment Variable Reference
### Required Format
```bash
# Use double underscore __ as separator
ATTUNE__SECTION__KEY=value
# Examples:
ATTUNE__SECURITY__JWT_SECRET=your-secret-here
ATTUNE__SECURITY__ENCRYPTION_KEY=your-32plus-char-key-here
ATTUNE__DATABASE__URL=postgresql://user:pass@host:port/db
ATTUNE__WORKER__WORKER_TYPE=container
```
### Common Mistakes
`ENCRYPTION_KEY=value` (missing prefix)
`ATTUNE__SECURITY__ENCRYPTION_KEY=value`
`ATTUNE_SECURITY_ENCRYPTION_KEY=value` (single underscore)
`ATTUNE__SECURITY__ENCRYPTION_KEY=value` (double underscore)
❌ Short encryption key (< 32 chars)
✅ Key with 32+ characters
## Summary
Successfully implemented automated database migration system for Docker deployments, eliminating manual setup steps and ensuring consistent database state across environments. The migration system is:
- **Idempotent**: Safe to run multiple times
- **Transactional**: Each migration runs in a transaction with rollback on error
- **Tracked**: Applied migrations recorded to prevent re-running
- **Ordered**: Migrations run in sorted filename order
- **Visible**: Clear console output with success/failure indicators
This provides a production-ready database initialization flow that matches industry best practices for containerized applications.