Files
attune/docs/docker-layer-optimization.md

425 lines
14 KiB
Markdown

# Docker Layer Optimization Guide
## Problem Statement
When building Rust workspace projects in Docker, copying the entire `crates/` directory creates a single Docker layer that gets invalidated whenever **any file** in **any crate** changes. This means:
- **Before optimization**: Changing one line in `api/src/main.rs` invalidates layers for ALL services (api, executor, worker, sensor, notifier)
- **Impact**: Every service rebuild takes ~5-6 minutes instead of ~30 seconds
- **Root cause**: Docker's layer caching treats `COPY crates/ ./crates/` as an atomic operation
## Architecture: Packs as Volumes
**Important**: The optimized Dockerfiles do NOT copy the `packs/` directory into service images. Packs are content/configuration that should be decoupled from service binaries.
### Packs Volume Strategy
```yaml
# docker-compose.yaml
volumes:
packs_data: # Shared volume for all services
services:
init-packs: # Run-once service that populates packs_data
volumes:
- ./packs:/source/packs:ro # Source packs from host
- packs_data:/opt/attune/packs # Copy to shared volume
api:
volumes:
- packs_data:/opt/attune/packs:ro # Mount packs as read-only
worker:
volumes:
- packs_data:/opt/attune/packs:ro # All services share same packs
```
**Benefits**:
- ✅ Update packs without rebuilding service images
- ✅ Reduce image size (packs not baked in)
- ✅ Faster builds (no pack copying during image build)
- ✅ Consistent packs across all services
## The Solution: Selective Crate Copying
The optimized Dockerfiles use a multi-stage approach that separates dependency caching from source code compilation:
### Stage 1: Planner (Dependency Caching)
```dockerfile
# Copy only Cargo.toml files (not source code)
COPY Cargo.toml Cargo.lock ./
COPY crates/common/Cargo.toml ./crates/common/Cargo.toml
COPY crates/api/Cargo.toml ./crates/api/Cargo.toml
# ... all other crate manifests
# Create dummy source files
RUN mkdir -p crates/common/src && echo "fn main() {}" > crates/common/src/lib.rs
# ... create dummies for all crates
# Build with dummy source to cache dependencies
RUN cargo build --release --bin attune-${SERVICE}
```
**Result**: This layer is only invalidated when dependencies change (Cargo.toml/Cargo.lock modifications).
### Stage 2: Builder (Selective Source Compilation)
```dockerfile
# Copy common crate (shared dependency)
COPY crates/common/ ./crates/common/
# Copy ONLY the service being built
COPY crates/${SERVICE}/ ./crates/${SERVICE}/
# Build the actual service
RUN cargo build --release --bin attune-${SERVICE}
```
**Result**: This layer is only invalidated when the specific service's code changes (or common crate changes).
### Stage 3: Runtime (No Packs Copying)
```dockerfile
# Create directories for volume mount points
RUN mkdir -p /opt/attune/packs /opt/attune/logs
# Note: Packs are NOT copied here
# They will be mounted as a volume at runtime from packs_data volume
```
**Result**: Service images contain only binaries and configs, not packs. Packs are mounted at runtime.
## Performance Comparison
### Before Optimization (Old Dockerfile)
```
Scenario: Change api/src/routes/actions.rs
- Layer invalidated: COPY crates/ ./crates/
- Rebuilds: All dependencies + all crates
- Time: ~5-6 minutes
- Size: Full dependency rebuild
```
### After Optimization (New Dockerfile)
```
Scenario: Change api/src/routes/actions.rs
- Layer invalidated: COPY crates/api/ ./crates/api/
- Rebuilds: Only attune-api binary
- Time: ~30-60 seconds
- Size: Minimal incremental compilation
```
### Dependency Change Comparison
```
Scenario: Add new dependency to Cargo.toml
- Before: ~5-6 minutes (full rebuild)
- After: ~3-4 minutes (dependency cached separately)
```
## Implementation
### Using Optimized Dockerfiles
The optimized Dockerfiles are available as:
- `docker/Dockerfile.optimized` - For main services (api, executor, sensor, notifier)
- `docker/Dockerfile.worker.optimized` - For worker services
#### Option 1: Switch to Optimized Dockerfiles (Recommended)
Update `docker-compose.yaml`:
```yaml
services:
api:
build:
context: .
dockerfile: docker/Dockerfile.optimized # Changed from docker/Dockerfile
args:
SERVICE: api
```
#### Option 2: Replace Existing Dockerfiles
```bash
# Backup current Dockerfiles
cp docker/Dockerfile docker/Dockerfile.backup
cp docker/Dockerfile.worker docker/Dockerfile.worker.backup
# Replace with optimized versions
mv docker/Dockerfile.optimized docker/Dockerfile
mv docker/Dockerfile.worker.optimized docker/Dockerfile.worker
```
### Testing the Optimization
1. **Clean build (first time)**:
```bash
docker compose build --no-cache api
# Time: ~5-6 minutes (expected, building from scratch)
```
2. **Incremental build (change API code)**:
```bash
# Edit attune/crates/api/src/routes/actions.rs
echo "// test comment" >> crates/api/src/routes/actions.rs
docker compose build api
# Time: ~30-60 seconds (optimized, only rebuilds API)
```
3. **Verify other services not affected**:
```bash
# The worker service should still use cached layers
docker compose build worker-shell
# Time: ~5 seconds (uses cache, no rebuild needed)
```
## How It Works: Docker Layer Caching
Docker builds images in layers, and each instruction (`COPY`, `RUN`, etc.) creates a new layer. Layers are cached and reused if:
1. The instruction hasn't changed
2. The context (files being copied) hasn't changed
3. All previous layers are still valid
### Old Approach (Unoptimized)
```
Layer 1: COPY Cargo.toml Cargo.lock
Layer 2: COPY crates/ ./crates/ ← Invalidated on ANY crate change
Layer 3: RUN cargo build ← Always rebuilds everything
```
### New Approach (Optimized)
```
Stage 1 (Planner):
Layer 1: COPY Cargo.toml Cargo.lock ← Only invalidated on dependency changes
Layer 2: COPY */Cargo.toml ← Only invalidated on dependency changes
Layer 3: RUN cargo build (dummy) ← Caches compiled dependencies
Stage 2 (Builder):
Layer 4: COPY crates/common/ ← Invalidated on common changes
Layer 5: COPY crates/${SERVICE}/ ← Invalidated on service-specific changes
Layer 6: RUN cargo build ← Only recompiles changed crates
```
## BuildKit Cache Mounts
The optimized Dockerfiles also use BuildKit cache mounts for additional speedup:
```dockerfile
RUN --mount=type=cache,target=/usr/local/cargo/registry,sharing=shared \
--mount=type=cache,target=/usr/local/cargo/git,sharing=shared \
--mount=type=cache,target=/build/target,id=target-builder-${SERVICE} \
cargo build --release
```
**Benefits**:
- **Cargo registry**: Downloaded crates persist between builds
- **Cargo git**: Git dependencies persist between builds
- **Target directory**: Compilation artifacts persist between builds
- **Optimized sharing**: Registry/git use `sharing=shared` for concurrent access
- **Service-specific caches**: Target directory uses unique cache IDs to prevent conflicts
**Cache Strategy**:
- **`sharing=shared`**: Registry and git caches (cargo handles concurrent access safely)
- **Service-specific IDs**: Target caches use `id=target-builder-${SERVICE}` to prevent conflicts
- **Result**: Safe parallel builds without serialization overhead (4x faster)
- **See**: `docs/QUICKREF-buildkit-cache-strategy.md` for detailed explanation
**Requirements**:
- Enable BuildKit: `export DOCKER_BUILDKIT=1`
- Or use docker-compose which enables it automatically
## Advanced: Parallel Builds
With the optimized Dockerfiles, you can safely build multiple services in parallel:
```bash
# Build all services in parallel (4 workers)
docker compose build --parallel 4
# Or build specific services
docker compose build api executor worker-shell
```
**Optimized for Parallel Builds**:
- ✅ Registry/git caches use `sharing=shared` (concurrent-safe)
- ✅ Target caches use service-specific IDs (no conflicts)
- ✅ **4x faster** than old `sharing=locked` strategy
- ✅ No race conditions or "File exists" errors
**Why it's safe**: Each service compiles different binaries (api vs executor vs worker), so their target caches don't conflict. Cargo's registry and git caches are inherently concurrent-safe.
See `docs/QUICKREF-buildkit-cache-strategy.md` for detailed explanation of the cache strategy.
## Tradeoffs and Considerations
### Advantages
- ✅ **Faster incremental builds**: 30 seconds vs 5 minutes
- ✅ **Better cache utilization**: Only rebuild what changed
- ✅ **Smaller layer diffs**: More efficient CI/CD pipelines
- ✅ **Reduced build costs**: Less CPU time in CI environments
### Disadvantages
- ❌ **More complex Dockerfiles**: Additional planner stage
- ❌ **Slightly longer first build**: Dummy compilation overhead (~30 seconds)
- ❌ **Manual manifest copying**: Need to list all crates explicitly
### When to Use
- ✅ **Active development**: Frequent code changes benefit from fast rebuilds
- ✅ **CI/CD pipelines**: Reduce build times and costs
- ✅ **Monorepo workspaces**: Multiple services sharing common code
### When NOT to Use
- ❌ **Single-crate projects**: No benefit for non-workspace projects
- ❌ **Infrequent builds**: Complexity not worth it for rare builds
- ❌ **Dockerfile simplicity required**: Stick with basic approach
## Pack Binaries
Pack binaries (like `attune-core-timer-sensor`) need to be built separately and placed in `./packs/` before starting docker-compose.
### Building Pack Binaries
Use the provided script:
```bash
./scripts/build-pack-binaries.sh
```
Or manually:
```bash
# Build pack binaries in Docker with GLIBC compatibility
docker build -f docker/Dockerfile.pack-binaries -t attune-pack-builder .
# Extract binaries
docker create --name pack-tmp attune-pack-builder
docker cp pack-tmp:/pack-binaries/attune-core-timer-sensor ./packs/core/sensors/
docker rm pack-tmp
# Make executable
chmod +x ./packs/core/sensors/attune-core-timer-sensor
```
The `init-packs` service will copy these binaries (along with other pack files) into the `packs_data` volume when docker-compose starts.
### Why Separate Pack Binaries?
- **GLIBC Compatibility**: Built in Debian Bookworm for GLIBC 2.36 compatibility
- **Decoupled Updates**: Update pack binaries without rebuilding service images
- **Smaller Service Images**: Service images don't include pack compilation stages
- **Cleaner Architecture**: Packs are content, services are runtime
## Maintenance
### Adding New Crates
When adding a new crate to the workspace:
1. **Update `Cargo.toml`** workspace members:
```toml
[workspace]
members = [
"crates/common",
"crates/new-service", # Add this
]
```
2. **Update optimized Dockerfiles** (both planner and builder stages):
```dockerfile
# In planner stage
COPY crates/new-service/Cargo.toml ./crates/new-service/Cargo.toml
RUN mkdir -p crates/new-service/src && echo "fn main() {}" > crates/new-service/src/main.rs
# In builder stage
COPY crates/new-service/Cargo.toml ./crates/new-service/Cargo.toml
```
3. **Test the build**:
```bash
docker compose build new-service
```
### Updating Packs
Packs are mounted as volumes, so updating them doesn't require rebuilding service images:
1. **Update pack files** in `./packs/`:
```bash
# Edit pack files
vim packs/core/actions/my_action.yaml
```
2. **Rebuild pack binaries** (if needed):
```bash
./scripts/build-pack-binaries.sh
```
3. **Restart services** to pick up changes:
```bash
docker compose restart
```
No image rebuild required!
## Troubleshooting
### Build fails with "crate not found"
**Cause**: Missing crate manifest in COPY instructions
**Fix**: Add the crate's Cargo.toml to both planner and builder stages
### Changes not reflected in build
**Cause**: Docker using stale cached layers
**Fix**: Force rebuild with `docker compose build --no-cache <service>`
### "File exists" errors during parallel builds
**Cause**: Cache mount conflicts
**Fix**: Already handled by `sharing=locked` in optimized Dockerfiles
### Slow builds after dependency changes
**Cause**: Expected behavior - dependencies must be recompiled
**Fix**: This is normal; optimization helps with code changes, not dependency changes
## Alternative Approaches
### cargo-chef (Not Used)
The `cargo-chef` tool provides similar optimization but requires additional tooling:
- Pros: Automatic dependency detection, no manual manifest copying
- Cons: Extra dependency, learning curve, additional maintenance
We opted for the manual approach because:
- Simpler to understand and maintain
- No external dependencies
- Full control over the build process
- Easier to debug issues
### Volume Mounts for Development
For local development, consider mounting the source as a volume:
```yaml
volumes:
- ./crates/api:/build/crates/api
```
- Pros: Instant code updates without rebuilds
- Cons: Not suitable for production images
## References
- [Docker Build Cache Documentation](https://docs.docker.com/build/cache/)
- [BuildKit Cache Mounts](https://docs.docker.com/build/guide/mounts/)
- [Rust Docker Best Practices](https://docs.docker.com/language/rust/build-images/)
- [cargo-chef Alternative](https://github.com/LukeMathWalker/cargo-chef)
## Summary
The optimized Docker build strategy significantly reduces build times by:
1. **Separating dependency resolution from source compilation**
2. **Only copying the specific crate being built** (plus common dependencies)
3. **Using BuildKit cache mounts** to persist compilation artifacts
4. **Mounting packs as volumes** instead of copying them into images
**Key Architecture Principles**:
- **Service images**: Contain only compiled binaries and configuration
- **Packs**: Mounted as volumes, updated independently of services
- **Pack binaries**: Built separately with GLIBC compatibility
- **Volume strategy**: `init-packs` service populates shared `packs_data` volume
**Result**:
- Incremental builds drop from 5-6 minutes to 30-60 seconds
- Pack updates don't require image rebuilds
- Service images are smaller and more focused
- Docker-based development workflows are practical for Rust workspaces