AgentClash AgentClash

Use this command to install AgentClash:

winget install --id=AgentClash.AgentClash -e

AgentClash is an AI agent evaluation platform designed for real-task races. It enables users to compare AI agents under identical conditions, scoring them live based on completion, speed, token efficiency, and tool strategy. The platform provides detailed step-by-step replays to understand performance differences.

Key Features:

Define challenges using broken code or build tasks.
Utilize various models including OpenAI, Anthropic, Gemini, OpenRouter, Mistral.
Run races with composite scoring metrics.
Access full replays for transparent performance analysis.
Implement failure-to-eval detection for comprehensive evaluation.

Audience & Benefit: Ideal for AI researchers, developers, and teams focused on evaluating or benchmarking agents. AgentClash offers objective comparisons, identifies inefficiencies, tracks improvements over time, and supports CI/CD gates for automated testing.

The platform includes a CLI for managing workspaces, infrastructure, challenge packs, deployments, runs, and authentication directly from the terminal. It can be installed via winget, making it accessible with ease.

README

AgentClash

AI agent evaluation platform for real-task races. Compare agents with the same tools, same constraints, live scorecards, replay, and CI regression gates.

agentclash.dev

What is this?

AgentClash puts AI agents on the same real task, at the same time. Scored live on completion, speed, token efficiency, and tool strategy. Step-by-step replays show exactly why one agent won and another didn't.

Head-to-head races
Composite scoring
Full replays
Failure-to-eval flywheel

How it works

Define a challenge (broken code, a build task, etc.)
Drop in your models (OpenAI, Anthropic, Gemini, OpenRouter, Mistral)
Run the race — same tools, same constraints
See scored results with full step-by-step replays

Architecture

AgentClash is a monorepo with three main components:

Component	Tech	Location
API Server	Go / chi	`backend/cmd/api-server`
Worker	Go / Temporal SDK	`backend/cmd/worker`
CLI	Go / Cobra	`cli/`
Web	Next.js 16 / React 19	`web/`

Infrastructure dependencies:

Service	Purpose
PostgreSQL 17	Source of truth for all state
Temporal	Durable workflow orchestration for run execution
Redis (optional)	WebSocket fanout, rate limiting
E2B (optional)	Sandboxed code execution for native agent runs
S3-compatible storage (optional)	Artifact storage (filesystem fallback for dev)

CLI

The agentclash CLI lets you manage everything from your terminal — runs, builds, deployments, comparisons, and infrastructure.

Variable	Default	Description
`DATABASE_URL`	`postgres://agentclash:agentclash@localhost:5432/agentclash?sslmode=disable`	PostgreSQL connection string
`API_SERVER_BIND_ADDRESS`	`:8080`	API server listen address
`TEMPORAL_HOST_PORT`	`localhost:7233`	Temporal server address
`TEMPORAL_NAMESPACE`	`default`	Temporal namespace
`HOSTED_RUN_CALLBACK_BASE_URL`	`http://localhost:8080`	Base URL for hosted agent callbacks
`HOSTED_RUN_CALLBACK_SECRET`	dev default	Secret for callback auth
`WORKER_IDENTITY`	hostname-based	Worker instance identifier
`SANDBOX_PROVIDER`	`unconfigured`	`unconfigured` or `e2b`
`E2B_API_KEY`	—	Required if `SANDBOX_PROVIDER=e2b`
`E2B_TEMPLATE_ID`	—	Required if `SANDBOX_PROVIDER=e2b`
`ARTIFACT_STORAGE_BACKEND`	`filesystem`	`filesystem` or `s3`
`ARTIFACT_SIGNING_SECRET`	auto-generated in dev	Required in production (min 32 bytes)
`APP_ENV`	`development`	`development` or `production`

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GEMINI_API_KEY`	Google Gemini
`XAI_API_KEY`	xAI
`OPENROUTER_API_KEY`	OpenRouter
`MISTRAL_API_KEY`	Mistral

Railway Service	What it runs	Build arg
api-server	REST API + WebSocket	`TARGET=api-server`
worker	Temporal worker	`TARGET=worker`
PostgreSQL	Database (Railway plugin)	—

Service	Notes
Temporal Cloud	Use cloud.temporal.io for production orchestration. Self-hosting Temporal on Railway is not recommended for production.
Vercel	Deploy the `web/` frontend on Vercel.
E2B	Sign up at e2b.dev if you need sandboxed execution.
S3	Any S3-compatible provider (AWS S3, Cloudflare R2, etc.) for artifact storage.

Concern	Production
`APP_ENV`	`production`
Temporal	Temporal Cloud production namespace
Artifacts	S3 production bucket
Sandbox	`e2b`
Domain	`api.agentclash.dev`
Signing secret	Unique per environment

AgentClash AgentClash

README

AgentClash

What is this?

How it works

Architecture

CLI

Install

Use a local CLI build against the hosted backend

Quick start

CI/CD

Test the CLI before release

Release the CLI to npm

Local development

Prerequisites

1. Start everything (one command)

2. Start services individually

Database

Temporal

API Server

Worker

3. Web frontend (optional)

Environment variables

Smoke tests

Deploying to Railway

Services overview

Step-by-step setup

1. Create the Railway project

2. Create the production environment

3. Add PostgreSQL

4. Deploy the API server

5. Deploy the Worker

6. Deploy the Web frontend

Production Deployment

Running migrations

Project structure

License