Operator

You tend the colony. You don’t write the code Colony writes, you don’t review every PR, you don’t define the work. You make sure the colony has the conventions, capacity, and tools it needs to do its job — and you step in when something goes wrong.

If this is you: platform engineers, tech leads, the person who owns the deployment, the person on-call when “Colony is broken” surfaces.

What this role does

Configure — set the colony’s conventions, repo policies, and worker behavior
Allocate capacity — size the worker pool for current and projected workload
Intervene — fix work that’s stuck, stalled, or oscillating
Recover — handle the failure cases that escape the colony’s automatic recovery

Configure

Per-repo configuration lives in .colony/conventions.md (and optional adjacent files). This is the most important thing you’ll write — it shapes every analyzer, developer, and reviewer interaction.

Key sections to fill in:

Tech stack — what languages, frameworks, build tools the repo uses
Coding conventions — file layout, naming, import styles, testing approach
Forbidden patterns — what not to do (often more useful than what to do)
Review priorities — what the human Reviewer cares about most

Tenant-level configuration (worker pool sizing, label policy, automerge behavior) lives in the dashboard.

Allocate capacity

Worker pool size controls how many issues can run in parallel. Sizing tradeoffs:

Too small: dispatch backs up; issues wait at ready-for-dev even when ready
Too large: worker churn, wasted compute, harder to read the dashboard
Just right: typical queue depth ≤ pool size; transient bursts drain in minutes, not hours

Watch the dashboard’s queue-depth and worker-utilization graphs for a week before adjusting. Resizing in response to a single bad day usually overcorrects.

Intervene

When work stalls, the dashboard’s “needs operator” view shows what’s stuck and why. Common interventions:

Stalled worker — the mayor’s recovery protocol exhausted retries. Click “retry” or run /colony:retry on the issue. If it stalls again, the underlying problem is in .colony/conventions.md or the issue itself.
Cycling reviewer — the same PR has been rejected 3+ times. Read the rejections; usually the convention or the issue is ambiguous. Fix the source, not the symptom.
Dependency lock — Colony paused an issue because another PR is in flight. If the lock is spurious, override it on the dashboard. If the lock is real, wait or reorder.

Recover

Some failures escape automatic recovery. The handful you’ll see:

Orphan PR — PR exists, issue doesn’t (or vice versa). Run /colony:cancel on the issue, or close the PR; Colony reconciles on its next polling cycle.
Runaway cost — an issue burns more budget than expected. Run /colony:pause to stop dispatch; investigate the issue and the worker logs; resume with /colony:resume or cancel.
Conflict-resolution loop — the merger keeps producing conflicts. Manually rebase the PR or close it and let the developer worker re-attempt against the current main.
Outage — workers can’t reach the executor (Claude API, etc.). Pause dispatch; investigate the executor; resume.

Slash commands available on any issue or PR:

Command	What it does
`/colony:retry`	Re-run the worker for the current state
`/colony:cancel`	Abandon the issue; close any open PR
`/colony:pause`	Stop dispatch for this issue (resumable)
`/colony:resume`	Resume after pause
`/colony:decompose`	Force epic decomposition on the issue
`/colony:reimplement`	Discard the current PR and re-develop
`/colony:review`	Re-run the reviewer (also works on external PRs)

Where you engage in the Workflow

Phase 1: Intake — monitor the analyzer queue
Phase 2: Planning & Dispatch — capacity allocation
Phase 3: Development — intervene on stalls
Phase 4: Review — fix repeating-review-loop config issues
Phase 5: Merge & Close — resolve non-trivial merge conflicts

Your tools

.colony/conventions.md — per-repo configuration; lives in the repo
Slash commands — see table above
Dashboard worker view — Cloud-only; live worker state, queue depth, utilization
Recovery runbooks — see Reference when published

Anti-patterns

Resizing the worker pool to “fix” a stuck issue. Capacity isn’t the problem; that issue is. Capacity changes mask the real failure. Run /colony:retry and read the worker logs first.
Editing .colony/conventions.md for every one-off rejection. Conventions are for repeating problems. A single rejection might just be a bad issue. Wait for the second occurrence.
Running /colony:cancel to “clean up” a backlog. Cancellation breaks the audit trail and may cancel work the Author still needs. Triage first.
Pausing dispatch globally because one repo is misbehaving. Pause the issue or the repo, not the colony. Other tenants and repos shouldn’t suffer.

Going deeper

Reviewer role — when reviews loop, this is the role that surfaces the symptom you’ll fix
Team Patterns — multi-repo conventions and freeze windows
Reference: configuration schema (when published) — every field in .colony/conventions.md