Docker Deployment Guide
Docker Deployment Guide
Section titled “Docker Deployment Guide”Run the full Colony pipeline with docker-compose up. The topology consists of singleton services (sprint-master, monitor, webhook-receiver) plus per-repo worker containers that process all task types (analyze, develop, review, merge, plan).
Prerequisites
Section titled “Prerequisites”- Docker Engine 20+ and Docker Compose v2
- A GitHub token (PAT or GitHub App)
- An Anthropic API key
- A
colony.config.yamltailored for your repo(s)
Quick Start
Section titled “Quick Start”# 1. Copy and fill in environment variablescp .env.example .env# Edit .env — set GITHUB_TOKEN and ANTHROPIC_API_KEY
# 2. Prepare your colony.config.yaml# Workers clone repos at startup — workspace.repo_dir is overridden at runtime.# Only github.app.private_key_path needs to use the Docker volume path:# github.app.private_key_path → /colony/keys/github-app.pem
# 3. Build the imagedocker-compose build
# 4. Start all agentsdocker-compose up -d
# 5. View logsdocker-compose logs -fEnvironment Variables
Section titled “Environment Variables”| Variable | Required | Description |
|---|---|---|
GITHUB_TOKEN | Yes* | GitHub personal access token. Not needed if using App auth. |
ANTHROPIC_API_KEY | Yes | Anthropic API key for Claude Code CLI. |
DATABASE_URL | No* | Postgres connection string. Do not set in .env for Docker Compose — docker-compose.yml sets this automatically using the bundled Postgres. Only set manually when using an external database. |
COLONY_CONFIG | No | Base64-encoded colony.config.yaml. If set, the entrypoint writes it to /colony/colony.config.yaml. Useful for cloud deployments where volume mounts are inconvenient. |
COLONY_AGENT | No | Which agent to run. Set per service in docker-compose.yml. Values: sprint-master, monitor, webhook-receiver, cli, <name>-worker-<N> (e.g., colony-worker-1). |
COLONY_REPO | No | Scopes a worker to a single repository (e.g., your-org/your-repo). Set per worker service in multi-repo deployments. |
WEBHOOK_SECRET | No | GitHub webhook secret for signature validation. Required when running the webhook-receiver service. |
NODE_OPTIONS | No | Node.js runtime flags (e.g., --max-old-space-size=4096). |
Volume Layout
Section titled “Volume Layout”/colony/colony.config.yaml — config file (mount read-only)/colony/keys/github-app.pem — GitHub App A PEM (mount read-only, if using App auth)/colony/keys/github-ops-app.pem — GitHub App B PEM (mount read-only, if using dual App setup)/colony/workspaces/ — worktree base_dir (read-write, named volume — sprint-master and webhook-receiver only; monitor does not need this mount)/colony/repos/{owner}/{repo}/ — worker clone path (created automatically at startup)Target Repository
Section titled “Target Repository”Workers clone their target repos automatically at startup to /colony/repos/{owner}/{repo}. No host-mounted repo directory is needed for worker containers. The workspace.repo_dir and workspace.base_dir values in your config are overridden at runtime by the clone-setup process.
# colony.config.yaml — workspace.repo_dir is ignored by workers in container moderepos: - owner: my-org repo: my-app workers: pool_size: 2 # Safe — each worker container gets its own isolated clone health_port_start: 9200 workspace: repo_dir: ~/git/my-app # Used by native mode only; workers override this base_dir: ~/.colony/workspaces # Used by native mode only; workers override thisWorkers clone via HTTPS using the system-wide git credential helper configured in the Dockerfile. After cloning, workers run mise install (if available) and the repo’s workspace.setup_command (defaults to npm install).
Container restart behavior: Restarting a worker container produces a fresh clone with no stale worktree artifacts. This means pool_size > 1 works safely — each container gets its own .git/ state with no shared-git concurrency issues.
Disk usage: Each worker container needs enough ephemeral storage for a full repo clone plus worktrees created during task execution.
GitHub App Auth
Section titled “GitHub App Auth”If using GitHub App authentication, mount the PEM file and update the config path:
github: app: app_id: 123456 private_key_path: /colony/keys/github-app.pem installation_id: 78901234# docker-compose.yml — uncomment the keys volumevolumes: - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:roHost → container path mapping: Place the PEM file at
~/.colony/keys/github-app.pemon the host (matching the native install convention). The volume mount maps it to/colony/keys/github-app.peminside the container, which is the path to set inprivate_key_pathfor Docker deployments.
Dual App Setup (Autonomous Merging)
Section titled “Dual App Setup (Autonomous Merging)”If using a second GitHub App (colony-ops) for autonomous merging, mount a second PEM file and add the ops_app block to your config:
github: app: app_id: 111111 private_key_path: /colony/keys/github-app.pem installation_id: 11111111 ops_app: app_id: 222222 private_key_path: /colony/keys/github-ops-app.pem installation_id: 22222222
review: auto_merge_on_approval: trueservices: sprint-master: volumes: - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro - ~/.colony/keys/github-ops-app.pem:/colony/keys/github-ops-app.pem:ro worker: volumes: # No repo mount needed — workers clone repos at startup - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro - ~/.colony/keys/github-ops-app.pem:/colony/keys/github-ops-app.pem:ro webhook-receiver: volumes: - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro monitor: volumes: - ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:roPlace App B’s PEM at ~/.colony/keys/github-ops-app.pem on the host:
mv ~/Downloads/<ops-app-name>.*.private-key.pem ~/.colony/keys/github-ops-app.pemchmod 600 ~/.colony/keys/github-ops-app.pemSee docs/github-app-setup.md for the full dual App walkthrough.
Health Checks
Section titled “Health Checks”Each service exposes an HTTP health endpoint. Default ports:
| Service | Port | Notes |
|---|---|---|
| sprint-master | 9100 | Singleton |
| monitor | 9106 | Singleton (also serves dashboard) |
| webhook-receiver | 9800 | Singleton |
| worker | 9200+ | Per-repo; sequential from repos[].workers.health_port_start |
The docker-compose.yml health checks use these defaults. If you override health ports in your config YAML, update the health check URLs and port mappings in docker-compose.yml to match.
# Check health manuallycurl http://localhost:9100/healthWebhook Setup
Section titled “Webhook Setup”The webhook-receiver service provides instant GitHub event dispatch to agents, eliminating up to 30s polling delays.
1. Generate a webhook secret
Section titled “1. Generate a webhook secret”openssl rand -hex 32Add the result to your .env file:
WEBHOOK_SECRET=<generated-secret>2. Configure the webhook in your colony config
Section titled “2. Configure the webhook in your colony config”webhook: enabled: true secret_env: WEBHOOK_SECRET port: 98003. Register the webhook on GitHub
Section titled “3. Register the webhook on GitHub”In your GitHub repository (or organization) settings, add a webhook:
- Payload URL:
http://<your-host>:9800/webhook - Content type:
application/json - Secret: the same value you set in
WEBHOOK_SECRET - Events: select Let me select individual events and enable:
IssuesPull requestsPull request reviews
4. Start the service
Section titled “4. Start the service”The webhook-receiver service starts automatically with docker-compose up -d. Verify it is healthy:
curl http://localhost:9800/healthDocker networking note
Section titled “Docker networking note”In Docker, services communicate using Docker service names, not localhost. If you set explicit dispatch URLs in your config, use service names (e.g., http://sprint-master:9100/wake, http://worker:9200/health). When using the defaults derived from health_port values, services dispatch to the host network port, which works when all services share the default bridge network.
Without webhooks
Section titled “Without webhooks”All agents continue to poll GitHub on their configured interval. Webhooks are additive — polling remains the fallback for environments where inbound HTTP is not feasible (local development, firewalled deployments).
Using the CLI
Section titled “Using the CLI”Run ad-hoc CLI commands against the running deployment:
# colony statusdocker-compose run --rm -e COLONY_AGENT=cli sprint-master colony status
# Or use docker exec on a running containerdocker exec colony-sprint-master colony status
# Migrate old config to worker pool formatdocker exec colony-sprint-master colony config migrate --config /colony/colony.config.yamlBuilding
Section titled “Building”Full build with tests
Section titled “Full build with tests”docker build .The default build runs npm test during the build stage. To skip tests for faster builds, target the build stage directly:
docker build --target build .Rebuilding after code changes
Section titled “Rebuilding after code changes”docker-compose build --no-cachedocker-compose up -dViewing Logs
Section titled “Viewing Logs”# All servicesdocker-compose logs -f
# Single servicedocker-compose logs -f worker
# Last 100 linesdocker-compose logs --tail=100 sprint-masterStopping
Section titled “Stopping”# Stop all agentsdocker-compose down
# Stop and remove volumes (deletes worktrees)docker-compose down -vTroubleshooting
Section titled “Troubleshooting”Agent won’t start — “No config found”
Mount your colony.config.yaml at /colony/colony.config.yaml or set the COLONY_CONFIG env var.
Developer agent fails to create worktrees
Workers clone repos automatically at startup. If worktree creation fails, check network connectivity and GitHub App auth (the clone may have failed). For native deployments, ensure the target repo is cloned and accessible at the path in workspace.repo_dir.
Health check failing
Verify the health port in your config matches the port in the docker-compose.yml health check. Default ports: sprint-master 9100, worker 9200+, monitor 9106, webhook-receiver 9800.
Claude Code CLI not found
The production image installs @anthropic-ai/claude-code globally. If builds fail at that step, check npm registry access from your build environment.
Permission denied on /colony/workspaces The named volume is owned by root by default. The Node.js process runs as root in the container. If you mount a host directory instead, ensure it’s writable by UID 0 (or run the container with a matching user).