Agent CLI Setup Guide¶

How to set up and run autonomous experiments with each supported AI tool.

All tools use the same protocol (AGENT.md), the same locked harness (src/harness.py), and the same log format (research/log.jsonl). The tool doesn't matter -- the protocol does.

Prerequisites (all tools)¶

git clone https://github.com/cybertronai/SutroYaro.git
cd SutroYaro

# Verify environment
PYTHONPATH=src python3 checks/env_check.py

# Establish baselines on your machine
PYTHONPATH=src python3 checks/baseline_check.py

Both checks must pass before running experiments. If they fail, fix the issue before proceeding.

Claude Code¶

Claude Code is a terminal-native AI agent from Anthropic. It has file read/write, bash execution, and tool permissions built in.

Install¶

# macOS / Linux
npm install -g @anthropic-ai/claude-code

# Verify
claude --version

You need an Anthropic API key or a Claude Pro/Team subscription.

Run experiments¶

# Single cycle, 10 experiments
bin/run-agent --tool claude --max 10

# Overnight: 10 cycles, 5 experiments each
bin/run-agent --tool claude --loop 10 --max 5

# With researcher ID
bin/run-agent --tool claude --max 10 --researcher yad-agent

How it launches¶

The launcher calls claude -p "$prompt" --allowedTools "Bash,Read,Write,Edit,Glob,Grep" --max-turns 200. The prompt tells the agent to read AGENT.md and follow its protocol.

Customization¶

Claude Code reads CLAUDE.md at the project root automatically. This file already contains the project context, rules, and file references. You can also use:

Skills/plugins: custom slash commands (e.g., /commit, analysis tools)
MCP servers: extend with external tools (calendar, GitHub, etc.)
Hooks: scripts that run on tool calls (stop hooks, pre-commit checks)
AGENTS.md: sub-agent dispatch instructions

Gemini CLI¶

Gemini CLI is Google's open-source terminal AI agent. It has a 1M token context window, MCP server support, and a free tier.

Install¶

npm install -g @google/gemini-cli

# Verify
gemini --version

Free tier with a personal Google account: 60 requests/min, 1,000 requests/day. Or use a Google AI Studio / Vertex AI key for more.

Run experiments¶

# Single cycle
bin/run-agent --tool gemini --max 10

# Overnight
bin/run-agent --tool gemini --loop 10 --max 5

How it launches¶

The launcher calls gemini -p "$prompt" --yolo. The -p flag runs in headless mode (no interactive UI). --yolo skips confirmation prompts so the agent can run bash commands without pausing.

Customization¶

Gemini CLI reads GEMINI.md at the project root if it exists. You could create one that mirrors CLAUDE.md, or just rely on AGENT.md (which the prompt tells the agent to read).

MCP servers: same protocol as Claude Code, connect external tools
Extensions: custom commands via .gemini/ config directory
TOML config: per-project settings in .gemini/settings.toml

Gotcha¶

Gemini CLI's custom slash commands don't work in headless mode yet (known issue). The agent can still read files and run bash, which is all our protocol needs.

Antigravity¶

Google Antigravity is an agent-first IDE (like VS Code but built around AI agents). It has a Manager View for dispatching parallel agents.

Install¶

Download from antigravity.google or via the command line tool:

# The CLI command is 'agy'
agy .    # opens the current directory in Antigravity

Free for individuals in public preview.

Run experiments (manual workflow)¶

Antigravity is an IDE, not a headless CLI. It cannot be driven by bin/run-agent. Instead:

Open the project: agy .
In the agent chat, type: "Read AGENT.md. Follow its protocol exactly. Your researcher ID is [your-name]."
The agent will read the files, pick hypotheses from TODO.md, run experiments, and log results.

Parallel agents via Manager View¶

Antigravity's unique feature is Manager View, where you can dispatch multiple agents in parallel:

Open Manager View
Create Agent 1: "Read AGENT.md. Test the first unchecked hypothesis in TODO.md."
Create Agent 2: "Read AGENT.md. Test the second unchecked hypothesis in TODO.md."
Both agents work simultaneously on different hypotheses

This is similar to Claude Code's sub-agent dispatch but with a visual interface.

Customization¶

Multi-model: supports Claude Sonnet/Opus, Gemini, and GPT models
Artifacts: agents produce visible task lists, plans, and screenshots
No MCP yet: MCP server support is not available as of early 2026

Codex CLI¶

OpenAI's terminal coding agent, powered by codex-1 (optimized o3). Built in Rust, runs locally.

Install¶

npm install -g @openai/codex
# or
brew install --cask codex

# Verify
codex --version

Included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. Or use API credits.

Run experiments¶

# Single cycle
bin/run-agent --tool codex --max 10

# Overnight
bin/run-agent --tool codex --loop 10 --max 5

How it launches¶

The launcher calls codex -q "$prompt". The -q flag runs in quiet/non-interactive mode.

Customization¶

AGENTS.md: Codex reads this file for project context (similar to CLAUDE.md for Claude Code)
MCP servers: supported for external tool integration
Multi-agent: experimental parallel agent support built in
Sandbox: cross-platform security (macOS Seatbelt, Linux Landlock)

OpenCode¶

Open-source, multi-provider terminal agent. Works with Claude, GPT, Gemini, and local models (Ollama). 95K+ GitHub stars. No vendor lock-in.

Install¶

curl -fsSL https://opencode.ai/install | bash
# or
npm install -g opencode-ai@latest
# or
brew install opencode

# Verify
opencode --version

Bring your own API key (any provider). Or use local models for free.

Run experiments¶

# Single cycle
bin/run-agent --tool opencode --max 10

# Overnight
bin/run-agent --tool opencode --loop 10 --max 5

How it launches¶

OpenCode accepts prompts as command-line arguments for non-interactive mode. The launcher passes the prompt directly.

Customization¶

Multi-provider: switch models mid-session (Claude for reasoning, GPT for code generation, local for privacy)
LSP integration: type information and symbol definitions from your language server
Session persistence: SQLite-backed session storage
No vendor lock-in: the main reason to choose OpenCode over single-vendor CLIs

Custom CLI¶

Any AI tool that can accept a prompt and execute commands can be used.

# Set AI_CMD to your tool's command
AI_CMD="my-ai-tool --prompt" bin/run-agent --tool custom --max 10

The launcher pipes the prompt to $AI_CMD via stdin. Your tool needs to:

Accept a text prompt
Be able to read/write files in the working directory
Be able to run python3 commands
Write output to stdout (for logging)

After any run¶

# Text report
bin/analyze-log

# Generate progress chart
bin/analyze-log --plot

# Merge your results for PR submission
bin/merge-findings research/log.jsonl --scoreboard

Comparing tools¶

Feature	Claude Code	Gemini CLI	Codex CLI	OpenCode	Antigravity
Headless mode	Yes (`-p`)	Yes (`-p --yolo`)	Yes (`-q`)	Yes (arg)	No (IDE)
`bin/run-agent`	Yes	Yes	Yes	Yes	No
Looped overnight	Yes	Yes	Yes	Yes	No
MCP servers	Yes	Yes	Yes	Yes	No
Custom skills/plugins	Yes	Yes (extensions)	Yes	Yes	No
Multi-provider	No (Anthropic)	No (Google)	No (OpenAI)	Yes (75+)	Yes
Context window	200K (Opus: 1M)	1M	Varies	Varies	Varies
Cost	Subscription or API	Free tier	ChatGPT Plus or API	BYO API key	Free preview
Open source	No	Yes (Apache 2.0)	Yes	Yes	No