Docker Deployment Guide

Run the full Colony pipeline with docker-compose up. The topology consists of singleton services (sprint-master, monitor, webhook-receiver) plus per-repo worker containers that process all task types (analyze, develop, review, merge, plan).

Prerequisites

Docker Engine 20+ and Docker Compose v2
A GitHub token (PAT or GitHub App)
An Anthropic API key
A colony.config.yaml tailored for your repo(s)

Quick Start

# 1. Copy and fill in environment variables
cp .env.example .env
# Edit .env — set GITHUB_TOKEN and ANTHROPIC_API_KEY

# 2. Prepare your colony.config.yaml
# Workers clone repos at startup — workspace.repo_dir is overridden at runtime.
# Only github.app.private_key_path needs to use the Docker volume path:
#   github.app.private_key_path → /colony/keys/github-app.pem

# 3. Build the image
docker-compose build

# 4. Start all agents
docker-compose up -d

# 5. View logs
docker-compose logs -f

Environment Variables

Variable	Required	Description
`GITHUB_TOKEN`	Yes*	GitHub personal access token. Not needed if using App auth.
`ANTHROPIC_API_KEY`	Yes	Anthropic API key for Claude Code CLI.
`DATABASE_URL`	No*	Postgres connection string. Do not set in `.env` for Docker Compose — `docker-compose.yml` sets this automatically using the bundled Postgres. Only set manually when using an external database.
`COLONY_CONFIG`	No	Base64-encoded `colony.config.yaml`. If set, the entrypoint writes it to `/colony/colony.config.yaml`. Useful for cloud deployments where volume mounts are inconvenient.
`COLONY_AGENT`	No	Which agent to run. Set per service in `docker-compose.yml`. Values: `sprint-master`, `monitor`, `webhook-receiver`, `cli`, `<name>-worker-<N>` (e.g., `colony-worker-1`).
`COLONY_REPO`	No	Scopes a worker to a single repository (e.g., `your-org/your-repo`). Set per worker service in multi-repo deployments.
`WEBHOOK_SECRET`	No	GitHub webhook secret for signature validation. Required when running the `webhook-receiver` service.
`NODE_OPTIONS`	No	Node.js runtime flags (e.g., `--max-old-space-size=4096`).

Volume Layout

/colony/colony.config.yaml       — config file (mount read-only)
/colony/keys/github-app.pem      — GitHub App A PEM (mount read-only, if using App auth)
/colony/keys/github-ops-app.pem  — GitHub App B PEM (mount read-only, if using dual App setup)
/colony/workspaces/               — worktree base_dir (read-write, named volume — sprint-master and webhook-receiver only; monitor does not need this mount)
/colony/repos/{owner}/{repo}/    — worker clone path (created automatically at startup)

Target Repository

Workers clone their target repos automatically at startup to /colony/repos/{owner}/{repo}. No host-mounted repo directory is needed for worker containers. The workspace.repo_dir and workspace.base_dir values in your config are overridden at runtime by the clone-setup process.

# colony.config.yaml — workspace.repo_dir is ignored by workers in container mode
repos:
  - owner: my-org
    repo: my-app
    workers:
      pool_size: 2        # Safe — each worker container gets its own isolated clone
      health_port_start: 9200
    workspace:
      repo_dir: ~/git/my-app           # Used by native mode only; workers override this
      base_dir: ~/.colony/workspaces   # Used by native mode only; workers override this

Workers clone via HTTPS using the system-wide git credential helper configured in the Dockerfile. After cloning, workers run mise install (if available) and the repo’s workspace.setup_command (defaults to npm install).

Container restart behavior: Restarting a worker container produces a fresh clone with no stale worktree artifacts. This means pool_size > 1 works safely — each container gets its own .git/ state with no shared-git concurrency issues.

Disk usage: Each worker container needs enough ephemeral storage for a full repo clone plus worktrees created during task execution.

GitHub App Auth

If using GitHub App authentication, mount the PEM file and update the config path:

github:
  app:
    app_id: 123456
    private_key_path: /colony/keys/github-app.pem
    installation_id: 78901234

# docker-compose.yml — uncomment the keys volume
volumes:
  - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro

Host → container path mapping: Place the PEM file at ~/.colony/keys/github-app.pem on the host (matching the native install convention). The volume mount maps it to /colony/keys/github-app.pem inside the container, which is the path to set in private_key_path for Docker deployments.

Dual App Setup (Autonomous Merging)

If using a second GitHub App (colony-ops) for autonomous merging, mount a second PEM file and add the ops_app block to your config:

github:
  app:
    app_id: 111111
    private_key_path: /colony/keys/github-app.pem
    installation_id: 11111111
  ops_app:
    app_id: 222222
    private_key_path: /colony/keys/github-ops-app.pem
    installation_id: 22222222

review:
  auto_merge_on_approval: true

services:
  sprint-master:
    volumes:
      - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
      - ~/.colony/keys/github-ops-app.pem:/colony/keys/github-ops-app.pem:ro
  worker:
    volumes:
      # No repo mount needed — workers clone repos at startup
      - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
      - ~/.colony/keys/github-ops-app.pem:/colony/keys/github-ops-app.pem:ro
  webhook-receiver:
    volumes:
      - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
  monitor:
    volumes:
      - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro

Place App B’s PEM at ~/.colony/keys/github-ops-app.pem on the host:

mv ~/Downloads/<ops-app-name>.*.private-key.pem ~/.colony/keys/github-ops-app.pem
chmod 600 ~/.colony/keys/github-ops-app.pem

See docs/github-app-setup.md for the full dual App walkthrough.

Health Checks

Each service exposes an HTTP health endpoint. Default ports:

Service	Port	Notes
sprint-master	9100	Singleton
monitor	9106	Singleton (also serves dashboard)
webhook-receiver	9800	Singleton
worker	9200+	Per-repo; sequential from `repos[].workers.health_port_start`

The docker-compose.yml health checks use these defaults. If you override health ports in your config YAML, update the health check URLs and port mappings in docker-compose.yml to match.

# Check health manually
curl http://localhost:9100/health

Webhook Setup

The webhook-receiver service provides instant GitHub event dispatch to agents, eliminating up to 30s polling delays.

1. Generate a webhook secret

openssl rand -hex 32

Add the result to your .env file:

WEBHOOK_SECRET=<generated-secret>

2. Configure the webhook in your colony config

webhook:
  enabled: true
  secret_env: WEBHOOK_SECRET
  port: 9800

3. Register the webhook on GitHub

In your GitHub repository (or organization) settings, add a webhook:

Payload URL: http://<your-host>:9800/webhook
Content type: application/json
Secret: the same value you set in WEBHOOK_SECRET
Events: select Let me select individual events and enable:
- Issues
- Pull requests
- Pull request reviews

4. Start the service

The webhook-receiver service starts automatically with docker-compose up -d. Verify it is healthy:

curl http://localhost:9800/health

Docker networking note

In Docker, services communicate using Docker service names, not localhost. If you set explicit dispatch URLs in your config, use service names (e.g., http://sprint-master:9100/wake, http://worker:9200/health). When using the defaults derived from health_port values, services dispatch to the host network port, which works when all services share the default bridge network.

Without webhooks

All agents continue to poll GitHub on their configured interval. Webhooks are additive — polling remains the fallback for environments where inbound HTTP is not feasible (local development, firewalled deployments).

Using the CLI

Run ad-hoc CLI commands against the running deployment:

# colony status
docker-compose run --rm -e COLONY_AGENT=cli sprint-master colony status

# Or use docker exec on a running container
docker exec colony-sprint-master colony status

# Migrate old config to worker pool format
docker exec colony-sprint-master colony config migrate --config /colony/colony.config.yaml

Building

Full build with tests

docker build .

The default build runs npm test during the build stage. To skip tests for faster builds, target the build stage directly:

docker build --target build .

Rebuilding after code changes

docker-compose build --no-cache
docker-compose up -d

Viewing Logs

# All services
docker-compose logs -f

# Single service
docker-compose logs -f worker

# Last 100 lines
docker-compose logs --tail=100 sprint-master

Stopping

# Stop all agents
docker-compose down

# Stop and remove volumes (deletes worktrees)
docker-compose down -v

Troubleshooting

Agent won’t start — “No config found” Mount your colony.config.yaml at /colony/colony.config.yaml or set the COLONY_CONFIG env var.

Developer agent fails to create worktrees Workers clone repos automatically at startup. If worktree creation fails, check network connectivity and GitHub App auth (the clone may have failed). For native deployments, ensure the target repo is cloned and accessible at the path in workspace.repo_dir.

Health check failing Verify the health port in your config matches the port in the docker-compose.yml health check. Default ports: sprint-master 9100, worker 9200+, monitor 9106, webhook-receiver 9800.

Claude Code CLI not found The production image installs @anthropic-ai/claude-code globally. If builds fail at that step, check npm registry access from your build environment.

Permission denied on /colony/workspaces The named volume is owned by root by default. The Node.js process runs as root in the container. If you mount a host directory instead, ensure it’s writable by UID 0 (or run the container with a matching user).