Skip to content

Colony Deployment Guide

Set up Colony as a dedicated agent swarm on a Mac server (or similar hardware). This guide covers everything from a bare machine to a fully operational, monitored deployment.

Audience: Technical founders through DevOps engineers. Each section has “skip if” markers — jump past what you’ve already done.

Three deployment modes:

  • Docker (recommended) — containerized agents via docker compose
  • Apple Container (macOS) — containerized agents via Apple’s lightweight VM runtime
  • Native — agents run directly as Node.js processes

Coming soon: Pre-built images on GHCR will enable docker compose pull && docker compose up without cloning Colony source.

First time? Start with the Getting Started Guide for a quick evaluation setup. This guide is for production deployment on dedicated hardware.


Run scripts/setup-mac.sh to automate this phase, or follow the manual steps below.

Terminal window
./scripts/setup-mac.sh

The script is idempotent — safe to re-run at any time. It checks each step before acting and prints a color-coded summary at the end.

Skip if: xcode-select -p prints a path.

Terminal window
xcode-select --install

A dialog will appear — click “Install” and wait for it to finish. This provides git, make, and other build essentials.

Skip if: brew --version works.

Terminal window
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

On Apple Silicon, Homebrew installs to /opt/homebrew. Add it to your shell profile if it’s not already on PATH:

Terminal window
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

Skip if: gh --version works.

Terminal window
brew install gh

Authenticate with GitHub:

Terminal window
gh auth login

Follow the prompts — select SSH as the preferred protocol. This generates an SSH key and registers it with your GitHub account in one step. You will need this to clone Colony and your target repos.

Choose one of the following container runtimes. Both run the same OCI image built from Colony’s Dockerfile.

Skip if: docker info succeeds.

Terminal window
brew install --cask docker

After installation:

  1. Open Docker Desktop from /Applications
  2. Accept the license agreement
  3. Go to Settings > Resources and allocate:
    • Memory: 8 GB+ RAM
    • CPUs: 4+ CPUs
    • Disk: 50 GB+ disk image size
  4. Wait for Docker to finish starting

Verify:

Terminal window
docker info

Skip if: container --version succeeds.

Requires: macOS 15+, Apple Silicon.

Terminal window
brew install container

No GUI, no daemon — just a CLI. After installation, start the runtime (this also downloads the Linux kernel on first run):

Terminal window
container system start

Disable Rosetta for builds (uses native ARM instead of x86 emulation):

Terminal window
container system property set build.rosetta false

Verify:

Terminal window
container --version
container system status

Note: Apple Container runs one lightweight VM per container, providing stronger isolation than Docker’s shared-kernel model. It uses the same Dockerfiles and OCI images. The build.rosetta property defaults to true, which requires Rosetta to be installed. Setting it to false builds native ARM images, which is preferred on Apple Silicon.

Skip if: node --version shows v20 or higher.

Terminal window
brew install node@20

If node is not on PATH after installation:

Terminal window
brew link --overwrite node@20

Skip if: claude --version works.

Terminal window
npm install -g @anthropic-ai/claude-code

The Developer and Analyzer agents shell out to the claude CLI for LLM-powered work.

Skip if: ls ~/.colony/keys ~/.colony/workspaces ~/.colony/logs succeeds.

Terminal window
mkdir -p ~/.colony/keys ~/.colony/workspaces ~/.colony/logs
DirectoryPurpose
~/.colony/keys/GitHub App PEM file(s) (github-app.pem, github-ops-app.pem)
~/.colony/workspaces/Git worktrees created by the Developer agent
~/.colony/logs/Agent logs, health check output, disk monitor output

Skip if: git config --global user.name and git config --global user.email both return values.

Terminal window
git config --global user.name "Your Name"
git config --global user.email "you@example.com"

Skip if: gh auth login (step 1.3) already configured SSH, or ssh -T git@github.com succeeds.

If you need to set up SSH manually without gh:

Terminal window
ssh-keygen -t ed25519 -C "you@example.com"

Add the public key to GitHub: https://github.com/settings/ssh/new

Terminal window
cat ~/.ssh/id_ed25519.pub

See docs/github-app-setup.md for the full walkthrough. Key deployment-specific notes:

  1. Place the PEM file at ~/.colony/keys/github-app.pem:

    Terminal window
    mv ~/Downloads/<app-name>.*.private-key.pem ~/.colony/keys/github-app.pem
    chmod 600 ~/.colony/keys/github-app.pem
  2. Record these values from the GitHub App setup — you will need them for colony.config.yaml:

    • app_id — shown on the app’s settings page after creation
    • installation_id — from the URL after installing the app on your repos
    • bot_username — format is <app-slug>[bot] (e.g., colony-bot[bot])
  3. Required repository permissions for App A (colony-coder):

    PermissionAccessPurpose
    ContentsRead & writePush branches, read repo files
    IssuesRead & writeManage issue labels, post comments
    Pull requestsRead & writeOpen PRs, post analyzer/developer comments
    MetadataRead-onlyRequired (automatically granted)

Optional: App B (colony-ops) for Autonomous Merging

Section titled “Optional: App B (colony-ops) for Autonomous Merging”

To enable fully autonomous merging without a human approving each PR, create a second GitHub App (colony-ops). See the Dual App Setup section of the GitHub App setup guide.

If using dual Apps, also place App B’s PEM file at ~/.colony/keys/github-ops-app.pem:

Terminal window
mv ~/Downloads/<ops-app-name>.*.private-key.pem ~/.colony/keys/github-ops-app.pem
chmod 600 ~/.colony/keys/github-ops-app.pem

For Docker/container deployments, update the volume mount in step 4a to include the second PEM file (see docs/docker.md).

Get your API key from https://console.anthropic.com.

Cost guidance by agent:

AgentModel (recommended)Relative CostNotes
AnalyzerSonnetLowSingle-pass structured output
DeveloperSonnet (small) / Opus (medium/large)HighBulk of spend
ReviewerSonnetLowDeterministic checks + short LLM review
MergerZeroNo LLM calls
Sprint MasterZeroNo LLM calls

Recommendations:

  • Set a $100-200/month spending limit in the Anthropic console
  • Start all agents on Sonnet to validate the pipeline end-to-end
  • Scale the Developer agent to Opus after validation (for medium and large issues)
  • Monitor costs closely for the first week

Terminal window
mkdir -p ~/git && cd ~/git
git clone git@github.com:RunColony/colony.git
cd colony
npm install
npm run build

Docker/container deployments: Skip this step. Worker containers clone their target repos automatically at startup (via clone-setup.ts). No host-mounted repo directories are needed for workers.

For native deployments, clone each repository Colony will manage:

Terminal window
git clone git@github.com:your-org/your-repo.git ~/git/your-repo

Create the config file in the Colony root directory. This is a full starter config with conservative defaults:

github:
owner: your-org
repo: your-repo
app:
app_id: YOUR_APP_ID
private_key_path: ~/.colony/keys/github-app.pem
installation_id: YOUR_INSTALLATION_ID
bot_username: 'your-app-slug[bot]'
repos:
- owner: your-org
repo: your-repo
app:
app_id: YOUR_APP_ID
private_key_path: ~/.colony/keys/github-app.pem
installation_id: YOUR_INSTALLATION_ID
intake_mode: tagged
workers:
pool_size: 1
health_port_start: 9200
workspace:
repo_dir: ~/git/your-repo # Native mode only — workers override this at runtime
base_dir: ~/.colony/workspaces/{owner}/{repo}
review:
checks:
build: 'npm run build'
test: 'npm test'
timeout_per_check: 300
agents:
sprint_master:
enabled: true
poll_interval: 60
health_port: 9100
workers:
enabled: true
poll_interval: 10
heartbeat_interval: 30
health_port: 9200
executors:
analyzer:
effort: medium
planner:
max_turns: 200
labels:
prefix: 'colony'
logging:
level: info
format: pretty
workspace:
repo_dir: ~/git/your-repo # Native mode only — workers override this at runtime
base_dir: ~/.colony/workspaces/{owner}/{repo}
cleanup_after_merge: true
claude:
timeout: 1800
max_retries: 1
models:
developer: claude-sonnet-4-6
reviewer: claude-sonnet-4-6
analyzer: claude-sonnet-4-6
scaling:
small:
developer_max_turns: 80
model: claude-sonnet-4-6
effort: medium
medium:
developer_max_turns: 150
model: claude-sonnet-4-6
effort: high
large:
developer_max_turns: 250
model: claude-sonnet-4-6
effort: high
review:
auto_merge_on_approval: false
# checks are configured per-repo in the repos[] array above
database:
url_env: DATABASE_URL # required for workers and sprint-master

Migrating from old config? Run npx colony config migrate --config old-config.yaml to automatically convert per-agent config keys to the new worker pool format.

Key settings to understand:

  • intake_mode: tagged — only issues manually labeled colony:enqueue get picked up. Change to all once you trust the pipeline.

  • auto_merge_on_approval: false — you review and merge PRs manually. Set to true after validation.

  • review.checks must match actual scripts in your target repo’s package.json. If a configured check does not exist in the target repo, the Developer will add it to pass self-validation, conflicting with Reviewer feedback. Only configure checks that already exist.

For Docker or Apple Container deployments, change this path:

  • github.app.private_key_path/colony/keys/github-app.pem

Note: Workers clone repos automatically at startup to /colony/repos/{owner}/{repo} and override workspace.repo_dir and workspace.base_dir at runtime. You do not need to change these values in your config for worker containers. For native deployments, set workspace.repo_dir to the local clone path and workspace.base_dir to your preferred worktree directory.

Terminal window
cp .env.example .env

Edit .env and set:

ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=postgresql://colony:colony@localhost:5432/colony

If using a PAT instead of GitHub App auth, also set GITHUB_TOKEN. DATABASE_URL is required for workers and sprint-master (the docker-compose.yml sets this automatically when using the bundled Postgres service).

Terminal window
npx colony status --config colony.config.yaml

This checks config loading, GitHub connectivity, and agent readiness without starting the pipeline.

Colony needs its label set on each target repo. Run:

Terminal window
npx colony init --config colony.config.yaml

This creates the colony:* labels on your GitHub repo. Without this step, the first pipeline run will fail because the required labels don’t exist.

Colony’s pipeline store uses a versioned migration framework. Migrations run automatically at agent startup — both sprint-master and worker call PipelineStore.initialize() which applies any pending migrations before the agent begins processing. No manual migration step is required for either Docker or native deployments.

Migrations 014 and 015 (added is_blocked and is_paused columns to pipeline_issues) must be applied before agents that use Postgres-authoritative state (#1209) begin processing. Because migrations run at startup, simply restarting agents after pulling the new image is sufficient.

To verify the current migration version:

Terminal window
psql "$DATABASE_URL" -c "SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1;"

The version should be 015 or higher before agents resume processing.


First, ensure your colony.config.yaml uses Docker volume paths (see 3c above).

Create a docker-compose.override.yml to mount the GitHub App PEM into each service. Workers clone their repos automatically at startup — no repo volume mount is needed:

services:
sprint-master:
volumes:
- ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
worker:
volumes:
# No repo mount needed — workers clone repos at startup to /colony/repos/{owner}/{repo}
- ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
# Uncomment if using dual App setup (autonomous merging):
# - ~/.colony/keys/github-ops-app.pem:/colony/keys/github-ops-app.pem:ro
webhook-receiver:
volumes:
- ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro
monitor:
volumes:
# No repo mount needed — monitor reads repo list from Postgres (DATABASE_URL + network only)
- ~/.colony/keys/github-app.pem:/colony/keys/github-app.pem:ro

See docker-compose.override.example.yml for more examples (multi-repo mounts, resource limits, Postgres customization).

Build and start:

Terminal window
docker compose build
docker compose up -d
docker compose ps
docker compose logs -f

Verify health endpoints:

Terminal window
curl http://localhost:9100/health # sprint-master
curl http://localhost:9200/health # worker
curl http://localhost:9106/health # monitor
curl http://localhost:9800/health # webhook-receiver

To stop:

Terminal window
docker compose down

First, ensure your colony.config.yaml uses container volume paths (see 3c above). If using webhooks, add dispatch_host:

webhook:
enabled: true
secret_env: WEBHOOK_SECRET
port: 9800
dispatch_host: host.docker.internal

Build and start:

Terminal window
./scripts/colony-container.sh build
./scripts/colony-container.sh up -d
./scripts/colony-container.sh ps

Verify health endpoints:

Terminal window
./scripts/colony-container.sh health

To stop:

Terminal window
./scripts/colony-container.sh down

Export your API key and start all agents:

Terminal window
export ANTHROPIC_API_KEY=sk-ant-...
npx colony start --config colony.config.yaml

Check status:

Terminal window
npx colony status

View logs:

Terminal window
tail -f ~/.colony/logs/*.log

To stop:

Terminal window
npx colony stop

Walk through a complete issue lifecycle to validate the deployment:

  1. Create a test issue on your target repo. Use a small, well-defined task (e.g., “Add a greet(name) function to utils.ts that returns Hello, {name}!”).

  2. Label the issue colony:enqueue (or just create it if intake_mode: all).

  3. Watch the pipeline progress. The issue will move through states:

    • colony:new — Sprint Master picks it up
    • colony:analyzing — Analyzer triages and writes a spec
    • colony:ready-for-dev — ready for implementation
    • colony:in-development — Developer creates a branch, writes code, opens a PR
    • colony:in-review — Reviewer runs checks and reviews the PR
    • colony:merge-pending or colony:human-review-ready — depending on auto_merge_on_approval
  4. Check the PR. Review the code, the agent comments, and the CI results.

  5. Merge the PR (or let the Merger agent handle it if auto_merge_on_approval: true).

Monitor progress via:

Terminal window
# Docker
docker compose logs -f
# Native
tail -f ~/.colony/logs/*.log

Or check issue labels on GitHub — they update in real time as agents process the issue.


  1. Add Docker Desktop to Login Items (System Settings > General > Login Items).

  2. Create a launchd plist to start Colony containers after Docker is ready:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.colony.docker-start</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>while ! docker info &amp;&gt;/dev/null 2&gt;&amp;1; do sleep 5; done; cd $HOME/git/colony &amp;&amp; /usr/local/bin/docker compose up -d</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/colony-autostart.log</string>
<key>StandardErrorPath</key>
<string>/tmp/colony-autostart.log</string>
</dict>
</plist>

Install the plist:

Terminal window
cp com.colony.docker-start.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.colony.docker-start.plist

To unload:

Terminal window
launchctl unload ~/Library/LaunchAgents/com.colony.docker-start.plist

Create a launchd plist that starts Colony containers after the system boots:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.colony.container-start</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>cd $HOME/git/colony &amp;&amp; ./scripts/colony-container.sh up -d</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/colony-autostart.log</string>
<key>StandardErrorPath</key>
<string>/tmp/colony-autostart.log</string>
</dict>
</plist>

Install the plist:

Terminal window
cp com.colony.container-start.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.colony.container-start.plist

Create a plist that starts Colony agents directly and restarts them if they crash:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.colony.agents</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/npx</string>
<string>colony</string>
<string>start</string>
<string>--config</string>
<string>/Users/you/git/colony/colony.config.yaml</string>
</array>
<key>WorkingDirectory</key>
<string>/Users/you/git/colony</string>
<key>EnvironmentVariables</key>
<dict>
<key>ANTHROPIC_API_KEY</key>
<string>sk-ant-your-key</string>
<key>PATH</key>
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
</dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/Users/you/.colony/logs/launchd.log</string>
<key>StandardErrorPath</key>
<string>/Users/you/.colony/logs/launchd.log</string>
</dict>
</plist>

Apple Silicon note: The plist uses /usr/local/bin/npx. On Apple Silicon Macs with Homebrew, npx is at /opt/homebrew/bin/npx. Run which npx to find your correct path and update the plist accordingly.

Replace /Users/you with your actual home directory. Install:

Terminal window
cp com.colony.agents.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.colony.agents.plist

Edit (or create) ~/.docker/daemon.json:

{
"log-driver": "json-file",
"log-opts": { "max-size": "50m", "max-file": "5" }
}

Restart Docker Desktop after saving. On Linux, the path is /etc/docker/daemon.json and use sudo systemctl restart docker.

This limits each container’s log to 50 MB with 5 rotated files.

Add an entry for Colony logs in /etc/newsyslog.d/colony.conf:

# logfilename [owner:group] mode count size when flags
/Users/you/.colony/logs/*.log 644 5 10240 * J

This rotates logs at 10 MB with 5 compressed backups.

Colony worktrees accumulate disk space over time. The scripts/monitor-disk.sh script checks disk usage, counts worktrees per repo, and identifies stale worktrees.

Docker workers: Each worker container needs enough ephemeral storage for a full repo clone plus worktrees. Container restarts produce clean clones, so stale worktree accumulation is not a concern for containerized workers — only for native deployments.

Set up a daily cron job:

0 8 * * * /path/to/colony/scripts/monitor-disk.sh --alert slack >> ~/.colony/logs/disk.log 2>&1

Options:

  • --threshold N — warn when disk usage exceeds N% (default: 80)
  • --stale-days N — worktrees older than N days are flagged as stale (default: 7)
  • --alert slack|macos — send an alert when warnings are detected

The scripts/health-check.sh script pings all agent health endpoints (singleton + worker) and reports status. It auto-detects Docker vs native mode.

Set up a cron job to run every 5 minutes:

*/5 * * * * /path/to/colony/scripts/health-check.sh --alert slack >> ~/.colony/logs/health.log 2>&1

Options:

  • --mode docker|native|auto — detection mode (default: auto)
  • --alert slack|macos|email — alert on failure
  • --host HOST — host to check (default: localhost)

Three alerting options, from simplest to most capable:

macOS Notifications (zero setup): Both health-check.sh and monitor-disk.sh support --alert macos, which uses osascript to display native notifications. Good for a machine you’re logged into.

Email (msmtp or Postfix): Both scripts support --alert email. Set COLONY_ALERT_EMAIL in your environment. Requires a local MTA — msmtp is the simplest:

Terminal window
brew install msmtp

Slack Webhook (recommended for teams):

  1. Create a Slack Incoming Webhook at https://api.slack.com/messaging/webhooks
  2. Set COLONY_SLACK_WEBHOOK_URL in your environment (or add to .env):
    Terminal window
    export COLONY_SLACK_WEBHOOK_URL="https://hooks.slack.com/services/T.../B.../..."
  3. Use --alert slack with the monitoring scripts

Polling-only is the default and works well for most deployments. For sub-second response to GitHub events:

  1. Install Cloudflare Tunnel:

    Terminal window
    brew install cloudflared
  2. Create a tunnel to expose the webhook receiver:

    Terminal window
    cloudflared tunnel --url http://localhost:9800

    Note the generated URL (e.g., https://xxxx.trycloudflare.com).

  3. Register the webhook on GitHub. In your repository settings > Webhooks > Add webhook:

    • Payload URL: https://xxxx.trycloudflare.com/webhook
    • Content type: application/json
    • Secret: generate with openssl rand -hex 32
    • Events: select Issues, Pull requests, Pull request reviews
  4. Add webhook config to colony.config.yaml:

    webhook:
    enabled: true
    secret_env: WEBHOOK_SECRET
    port: 9800

For Docker or Apple Container deployments, add dispatch_host so the webhook receiver can reach agents via the host network:

webhook:
enabled: true
secret_env: WEBHOOK_SECRET
port: 9800
dispatch_host: host.docker.internal
  1. Add WEBHOOK_SECRET to .env:

    WEBHOOK_SECRET=<the-secret-from-step-3>
  2. Optionally run cloudflared as a launchd service for persistence:

    Terminal window
    cloudflared service install

    Or create a launchd plist similar to the ones in section 5a.


Error: No config found

Colony looks for config in this order: --config flag, ./colony.config.yaml, ~/.colony/config.yaml. Ensure one of these exists. For Docker, the config must be mounted at /colony/colony.config.yaml.

Error: Failed to create worktree
  • Docker workers clone repos automatically at startup — if worktree creation fails, check network connectivity, GitHub App auth, and available disk space in the container
  • Native mode: ensure the target repo is cloned and accessible at the path in workspace.repo_dir
  • Check that the worktrees directory has sufficient disk space
  • Run git worktree prune in the target repo to clean up stale worktree references (native mode only)
✗ worker (port 9200) — unhealthy (HTTP 000)
  • Verify the service is running: docker compose ps (Docker) or npx colony status (native)
  • For Apple Container: ./scripts/colony-container.sh ps or container ls
  • Check that health port in config matches the port the service is actually listening on (sprint-master: 9100, worker: 9200+, monitor: 9106, webhook-receiver: 9800)
  • Review service logs for startup errors
Error: claude command not found
  • Verify installation: which claude
  • If using Docker, the image installs @anthropic-ai/claude-code globally during build. Rebuild if the CLI was added after your last build: docker compose build --no-cache
  • For native, install globally: npm install -g @anthropic-ai/claude-code
Error: EACCES: permission denied
  • PEM file: chmod 600 ~/.colony/keys/github-app.pem
  • Workspaces directory: ensure the user running Colony owns ~/.colony/workspaces/
  • Docker: the Node.js process runs as root in the container. If mounting a host directory for workspaces instead of using the named volume, ensure it is writable by UID 0

Issues stuck in in-development after restart

Section titled “Issues stuck in in-development after restart”

Workers use a Postgres task queue, not label polling. If a worker crashes mid-task, the task remains in claimed status. On startup, workers reclaim stale tasks automatically (threshold: 10 minutes). The monitor agent also periodically reclaims stale tasks. If an issue remains stuck, check:

  • work_tasks table for orphaned claimed tasks
  • Worker logs for repeated failures on the same issue
  • The monitor dashboard for self-healing activity

When countReviewCycles exceeds max_review_cycles, the Developer blocks immediately before doing any work. Relabeling to changes-requested just re-triggers the block. To advance a stuck issue manually, relabel directly to in-review and let the Reviewer assess the PR as-is.