Terminal coding agent for DeepSeek V4. It runs from the deepseek command, streams reasoning blocks, edits local workspaces with approval gates, and includes an auto mode that chooses both model and thinking level per turn.
README
CodeWhale
> Terminal coding agent for DeepSeek V4. It runs from the codewhale command, streams reasoning blocks, edits local workspaces with approval gates, and includes an auto mode that chooses both model and thinking level per turn.
codewhale installs as a matched pair of self-contained Rust release binaries:
the codewhale dispatcher command and the sibling codewhale-tui runtime it
launches for interactive sessions. npm and Docker install both for you; Cargo
and manual installs must put both binaries in the same directory
(normally a directory on your PATH). The npm package is only an
installer/wrapper for those release binaries; the agent does not run on Node.
# 1. npm — easiest if you already use Node. The package downloads the
# matching prebuilt Rust binaries from GitHub Releases.
npm install -g codewhale
# 2. Cargo — no Node needed. Requires Rust 1.88+ (the crates use the
# 2024 edition; older toolchains fail with "feature `edition2024` is
# required"). Run `rustup update` first, or use a non-Cargo path below.
cargo install codewhale-cli --locked # `codewhale` (entry point)
cargo install codewhale-tui --locked # `codewhale-tui` (TUI binary)
# 3. Homebrew — legacy compatibility only.
# The tap/formula still uses the old deepseek-tui name. Prefer npm, Cargo,
# Docker, or direct downloads for new installs until the formula is renamed.
brew tap Hmbown/deepseek-tui
brew install deepseek-tui
# 4. Direct download — platform archive from GitHub Releases.
# https://github.com/Hmbown/CodeWhale/releases
# Archives include both codewhale and codewhale-tui plus an install script.
# Individual binaries are also attached for scripts; keep the pair together.
# 5. Docker — prebuilt release image.
docker volume create codewhale-home
docker run --rm -it \
-e DEEPSEEK_API_KEY="$DEEPSEEK_API_KEY" \
-v codewhale-home:/home/codewhale/.codewhale \
-v "$PWD:/workspace" \
-w /workspace \
ghcr.io/hmbown/codewhale:latest
> In mainland China, speed up the npm path with
> --registry=https://registry.npmmirror.com, or use the
> Cargo mirror below.
>
> Download safety: official release binaries live under
> https://github.com/Hmbown/CodeWhale/releases. For manual downloads,
> verify the SHA-256 manifest and avoid look-alike repositories or search-result
> mirrors. See .
A model answers a question. An agent finishes a task. The difference is
the harness — a system of rules, evidence, and feedback that keeps the
model oriented instead of drifting.
CodeWhale is that harness, built around DeepSeek V4 and guided by three ideas:
Principle
How it works
Start with trust
Every turn begins with "A" — possibility before certainty, craft before convenience
Clear jurisdiction
A written Constitution with nine tiers of authority. User intent outranks stale instructions. Verification outranks confidence.
Recursive improvement
V4 helped write the harness. As the harness improves, V4 becomes more effective — and helps improve the harness further. Each turn starts stronger.
It's open source, terminal-native, and packaged as a matched codewhale /
codewhale-tui Rust binary pair.
How the Harness Works
Agentic models deal with conflicting information at scale: user intent,
project rules, system defaults, tool output, and stale memory all compete
for authority in a single turn. LLM-as-a-judge needs jurisdiction — which
source wins when they disagree?
CodeWhale answers this with a Constitution (prompts/base.md). It's a
formal hierarchy of law — Article VII ranks nine tiers from the
Constitution's own articles down to prior-session handoffs. The user's
current message outranks stale project instructions. Live tool output
outranks assumptions. Verification outranks confidence. The model inherits
a clear chain of authority every turn and never has to guess which
directive to follow.
Six Articles define the model's identity, duties, and agency (Article VII
is the hierarchy itself): a verification mandate (Article V — every action
leaves evidence, never declare success on faith), a coordination legacy
(Article VI — leave the workspace cleaner and the handoff truthful for the
next intelligence), and a primacy-of-truth clause (Article II —
non-negotiable; not even a user request may override the duty of truth).
DeepSeek V4's prefix caching makes this practical. The Constitution is long
and detailed, but once cached it costs roughly 100× less per turn than a
cold read. The model references it recursively — peeking, scanning, and
querying through RLM sessions — revisiting information on demand rather
than relying on a single memorized pass. It performs more like an
open-book test than a closed one.
Because the authority structure is explicit, failure isn't hidden. Non-zero
exit codes, type errors from rust-analyzer arriving between turns, sandbox
denials — these are fed back as correction vectors. The model uses its own
drift to self-correct.
Three modes control the action space. Plan is read-only. Agent gates
destructive operations behind approval. YOLO auto-approves in trusted
workspaces. macOS Seatbelt is the active sandbox; Linux Landlock is
detected but not yet enforced; Windows sandboxing is not yet advertised.
Fin — a cheap Flash call with thinking off — handles model auto-routing per
turn. --model auto is the default.
Every turn records a side-git snapshot outside your repo's .git.
/restore and revert_turn roll back the workspace.
Sub-agents run concurrently (up to 20). agent_open returns immediately;
results arrive inline as completion sentinels with a summary. Full
transcripts stay behind bounded handles through agent_eval. See
docs/SUBAGENTS.md.
The rest of the surface: LSP diagnostics after every edit (rust-analyzer,
pyright, typescript-language-server, gopls, clangd, jdtls,
vue-language-server), RLM sessions for batched analysis, MCP protocol,
HTTP/SSE runtime API, persistent task queue, ACP adapter for Zed,
SWE-bench export, and live cost tracking with cache hit/miss breakdowns.
The Harness
codewhale (dispatcher CLI) → codewhale-tui (companion binary) → ratatui interface ↔ async engine ↔ OpenAI-compatible streaming client. Tool calls route through a typed registry (shell, file ops, git, web, sub-agents, MCP, RLM) and results stream back into the transcript. The engine manages session state, turn tracking, the durable task queue, and an LSP subsystem that feeds post-edit diagnostics into the model's context before the next reasoning step.
CodeWhale can dispatch multiple sub-agents that run in parallel — like a concurrent task queue:
Non-blocking launch.agent_open returns immediately. The child gets its own fresh context and tool registry and runs independently. The parent keeps working.
Background execution. Sub-agents execute concurrently (default cap: 10, configurable to 20). The engine manages the pool — no polling loop needed.
Completion notification. When a sub-agent finishes, the runtime injects a `` sentinel into the parent's transcript. The human-readable summary — including the child's findings, changed files, and any risks — sits on the line immediately before the sentinel. The parent model reads that summary and integrates findings without an extra tool call.
Bounded result retrieval. The full child transcript lives behind a transcript_handle accessible through agent_eval. When the summary isn't enough, the parent calls handle_read for slices, line ranges, or JSONPath projections — keeping the parent context lean without losing access to the details.
npm install -g codewhale
codewhale --version
codewhale --model auto
Prebuilt binary pairs and platform archives are published for Linux x64, Linux ARM64 (v0.8.8+), macOS x64, macOS ARM64, and Windows x64. For other targets (musl, riscv64, FreeBSD, etc.), see Install from source or docs/INSTALL.md.
On first launch you'll be prompted for your DeepSeek API key. The key is saved to ~/.codewhale/config.toml (legacy ~/.deepseek/config.toml also supported) so it works from any directory without OS credential prompts.
You can also set it ahead of time:
codewhale auth set --provider deepseek # saves to ~/.codewhale/config.toml
codewhale auth status # shows the active credential source
export DEEPSEEK_API_KEY="YOUR_KEY" # env var alternative; use ~/.zshenv for non-interactive shells
codewhale
codewhale doctor # verify setup
If codewhale doctor says the rejected key came from DEEPSEEK_API_KEY, remove
the stale export from your shell startup file, open a fresh shell, or run
codewhale auth set --provider deepseek. Use codewhale auth status to see the
config, keyring, and env-var source state without printing the key. Saved config
keys take precedence over the keyring and environment and are easier to rotate.
> To rotate or remove a saved key: codewhale auth clear --provider deepseek.
Tencent Cloud / CNB Remote-First Path
For an always-on workspace you can control from a phone, use the Tencent-native
path: CNB mirror/source, Tencent Lighthouse HK, a Feishu/Lark long-connection
bridge, and optional EdgeOne for a deliberate public HTTPS edge. The runtime API
stays bound to localhost; EdgeOne is not used to expose /v1/*.
Use codewhale --model auto or /model auto when you want codewhale to decide how much model and reasoning power a turn needs.
Auto mode controls two settings together:
Model: deepseek-v4-flash or deepseek-v4-pro
Thinking: off, high, or max
Before the real turn is sent, the app makes a small deepseek-v4-flash routing call with thinking off. That router looks at the latest request and recent context, then selects a concrete model and thinking level for the real request. Short/simple turns can stay on Flash with thinking off; coding, debugging, release work, architecture, security review, or ambiguous multi-step tasks can move up to Pro and/or higher thinking.
auto is local to codewhale. The upstream API never receives model: "auto"; it receives the concrete model and thinking setting chosen for that turn. The TUI shows the selected route, and cost tracking is charged against the model that actually ran. If the router call fails or returns an invalid answer, the app falls back to a local heuristic. Sub-agents inherit auto mode unless you assign them an explicit model.
Use a fixed model or fixed thinking level when you want repeatable benchmarking, a strict cost ceiling, or a specific provider/model mapping.
Linux ARM64 (Raspberry Pi, Asahi, Graviton, HarmonyOS PC)
npm i -g codewhale works on glibc-based ARM64 Linux from v0.8.8 onward. You can also download prebuilt binaries from the Releases page and place them side by side on your PATH.
China / Mirror-friendly Installation
If GitHub or npm downloads are slow from mainland China, use a Cargo registry mirror:
Prebuilt binaries can also be downloaded from GitHub Releases. Use DEEPSEEK_TUI_RELEASE_BASE_URL for mirrored release assets.
Windows (Scoop)
Scoop is a Windows package manager. The codewhale package is listed
in Scoop's main bucket, but that manifest updates independently and can lag the
GitHub/npm/Cargo release. Run scoop update first, then verify the installed
version with codewhale --version:
Both binaries are required. Cross-compilation and platform-specific notes: docs/INSTALL.md.
Other API Providers
For the full shipped provider registry, including model IDs, auth variables,
base URLs, and capability boundaries, see docs/PROVIDERS.md.
Think of provider and model as separate choices: provider is the route,
account, and endpoint; model is the model ID on that route. DeepSeek-family
models can be reached through several routes, so /config exposes both
provider and provider_url.
Route
Typical DeepSeek model ID
deepseek
deepseek-v4-pro
nvidia-nim
deepseek-ai/deepseek-v4-pro
openrouter
deepseek/deepseek-v4-pro
fireworks
accounts/fireworks/models/deepseek-v4-pro
siliconflow
deepseek-ai/DeepSeek-V4-Pro
openai
Your gateway's model ID
huggingface
deepseek-ai/DeepSeek-V4-Pro
# NVIDIA NIM
codewhale auth set --provider nvidia-nim --api-key "YOUR_NVIDIA_API_KEY"
codewhale --provider nvidia-nim
# AtlasCloud
codewhale auth set --provider atlascloud --api-key "YOUR_ATLASCLOUD_API_KEY"
codewhale --provider atlascloud
codewhale --provider atlascloud --model vendor/model-id
# Wanjie Ark
codewhale auth set --provider wanjie-ark --api-key "YOUR_WANJIE_API_KEY"
codewhale --provider wanjie-ark --model deepseek-reasoner
# OpenRouter
codewhale auth set --provider openrouter --api-key "YOUR_OPENROUTER_API_KEY"
codewhale --provider openrouter --model deepseek/deepseek-v4-pro
codewhale --provider openrouter --model arcee-ai/trinity-large-thinking
codewhale --provider openrouter --model minimax/minimax-m3
Arcee AI offers direct API access to its powerful Trinity models, including the reasoning-capable Trinity-Large Thinking. This section provides comprehensive setup instructions and model comparisons.
## Configuration
### API Key
The primary authentication method is the `ARCEE_API_KEY` environment variable or the `[providers.arcee]` configuration section in `~/.codewhale/config.toml`:
```toml
[providers.arcee]
# api_key = "your-arcee-api-key"
# base_url = "https://api.arcee.ai/api/v1"
# model = "trinity-large-thinking" # or "trinity-large-preview", "trinity-mini"
Environment Variables
ARCEE_API_KEY: Your Arcee API key (required)
ARCEE_BASE_URL: Custom base URL (optional, defaults to https://api.arcee.ai/api/v1)
ARCEE_MODEL: Default model to use (optional, defaults to trinity-large-thinking)
Model Support
CodeWhale supports three Arcee models:
Model
Reasoning
Context Window
Max Output
Best For
trinity-large-thinking
✅ Yes
262,144 tokens
262,144 tokens
Complex reasoning, coding, math
trinity-large-preview
❌ No
262,144 tokens
4,096 tokens
High-accuracy non-reasoning tasks
trinity-mini
❌ No
128,000 tokens
4,096 tokens
Faster, cost-effective tasks
Note: The trinity-large-thinking model supports reasoning (thinking mode) and can handle very large contexts, making it ideal for complex programming tasks. The other models do not support reasoning but offer larger context windows than many other providers.
codewhale auth set --provider arcee --api-key "YOUR_ARCEE_API_KEY"
codewhale --provider arcee --model trinity-large-thinking
codewhale --provider arcee --model trinity-large-preview
Xiaomi MiMo
codewhale auth set --provider xiaomi-mimo --api-key "YOUR_XIAOMI_KEY"
Inside the TUI, `/provider` opens the provider picker and `/model` opens the
local model/thinking picker. `/provider openrouter` and `/model ` switch
directly, while `/models` explicitly fetches and lists live API models when the
active provider supports model listing.
---
## Release Notes
Release-specific changes live in [CHANGELOG.md](CHANGELOG.md). This README
stays focused on current install paths, core workflows, provider setup, runtime
interfaces, and extension points.
---
## Usage
```bash
codewhale # interactive TUI
codewhale "explain this function" # one-shot prompt
codewhale exec --auto --output-format stream-json "fix this bug" # NDJSON backend stream
codewhale exec --resume "follow up" # continue a non-interactive session
codewhale --model deepseek-v4-flash "summarize" # model override
codewhale --model auto "fix this bug" # auto-select model + thinking
codewhale --yolo # auto-approve tools
codewhale auth set --provider deepseek # save API key
codewhale doctor # check setup & connectivity
codewhale doctor --json # machine-readable diagnostics
codewhale setup --status # read-only setup status
codewhale setup --tools --plugins # scaffold tool/plugin dirs
codewhale models # list live API models
codewhale sessions # list saved sessions with timestamps
codewhale resume --last # resume the most recent session in this workspace
codewhale resume # resume a specific session by UUID
codewhale fork # fork a saved session into a sibling path
codewhale serve --http # HTTP/SSE API server
codewhale serve --mobile # LAN mobile control page; token-gated by default
codewhale serve --acp # ACP stdio adapter for Zed/custom agents
codewhale run pr # fetch PR and pre-seed review prompt
codewhale mcp list # list configured MCP servers
codewhale mcp validate # validate MCP config/connectivity
codewhale mcp-server # run dispatcher MCP stdio server
codewhale update # check for and apply binary updates
Inside the interactive TUI composer, prefix a line with ! to run a shell
command through the normal approval, sandbox, and output surfaces, for example
! cargo test -p codewhale-tui sidebar.
Branching Conversations
Saved sessions are intentionally branchable. codewhale fork copies
an existing saved session into a new sibling session, records the parent session
id in metadata, and opens that fork so you can explore an alternate direction
without polluting the original path. The session picker and codewhale sessions
mark forked sessions with their parent id.
codewhale sessions lists saved sessions across workspaces and includes the
last-updated timestamp. codewhale resume --last and codewhale --continue
choose the latest session for the current workspace; pass an explicit session id
when resuming work from another directory.
Inside the TUI, Esc-Esc backtrack can rewind the active transcript to a prior
user prompt and put that prompt back in the composer for editing. /restore
and revert_turn are separate workspace rollback tools: they restore files
from side-git snapshots but do not rewrite conversation history.
Docker images are published to GHCR for release builds:
The first ACP slice supports new sessions and prompt responses through your
existing DeepSeek config/API key. Tool-backed editing and checkpoint replay are
not exposed through ACP yet.
Community-maintained adapter: acp-codewhale-adapter
bridges codewhale exec --auto to cc-connect for users who need tool-backed
ACP workflows outside the built-in Zed slice.
Keyboard Shortcuts
Key
Action
Tab
Complete / or @ entries; while running, queue draft as follow-up; otherwise cycle mode
Shift+Tab
Cycle reasoning-effort: off → high → max
F1
Searchable help overlay
Esc
Back / dismiss
Ctrl+K
Command palette
Ctrl+R
Resume an earlier session
Alt+R
Search prompt history and recover cleared drafts
Ctrl+S
Stash current draft (/stash list, /stash pop to recover)
Read-only investigation — model explores and proposes a plan before making changes; multi-step investigations use checklist_write
Agent 🤖
Default interactive mode — multi-step tool use with approval gates; substantial work is tracked with checklist_write
YOLO ⚡
Auto-approve all tools in a trusted workspace; multi-step work still keeps a visible checklist
Configuration
User config: ~/.codewhale/config.toml (legacy ~/.deepseek/config.toml fallback). Project overlay: /.codewhale/config.toml (legacy /.deepseek/config.toml) (denied: api_key, base_url, provider, mcp_config_path). config.example.toml has every option.
The TUI footer can be trimmed with /statusline, or by setting
[tui].status_items in config. Current footer customization selects from the
built-in chips such as mode, model, status, git_branch, tokens, and
cache; chip order is controlled by the order of keys in status_items in
config.toml. The interactive picker writes the canonical order. Multi-line
layouts, custom colors, and external command widgets are not part of the
current statusline surface.
Custom DeepSeek-compatible endpoints usually do not need a new provider. Keep
provider = "deepseek" and set [providers.deepseek].base_url / model, or
use provider = "openai" for generic OpenAI-compatible gateways. Keep
provider, api_key, and base_url in user config or environment variables;
project overlays cannot set them.
Key environment variables:
Variable
Purpose
DEEPSEEK_API_KEY
API key
DEEPSEEK_BASE_URL
API base URL
DEEPSEEK_HTTP_HEADERS
Optional custom model request headers, e.g. X-Model-Provider-Id=your-model-provider
DEEPSEEK_MODEL
Default model
DEEPSEEK_STREAM_IDLE_TIMEOUT_SECS
Stream idle timeout in seconds, default 300, clamped to 1..=3600
Set locale in settings.toml, use /config locale zh-Hans, or rely on LC_ALL/LANG to choose UI chrome and the fallback language sent to V4 models. The latest user message still wins for natural-language reasoning and replies, so Chinese user turns stay Chinese even on an English system locale. See docs/CONFIGURATION.md and docs/MCP.md.
Models & Pricing
Model
Context
Input (cache hit)
Input (cache miss)
Output
deepseek-v4-pro
1M
$0.003625 / 1M
$0.435 / 1M
$0.87 / 1M
deepseek-v4-flash
1M
$0.0028 / 1M
$0.14 / 1M
$0.28 / 1M
DeepSeek Platform defaults to https://api.deepseek.com/beta so beta-gated API features can be tested without extra setup. Set base_url = "https://api.deepseek.com" to opt out.
Legacy aliases deepseek-chat / deepseek-reasoner map to deepseek-v4-flash and retire after July 24, 2026. NVIDIA NIM variants use your NVIDIA account terms.
> [!Note]
> DeepSeek's pricing page now lists the V4 Pro rates above as the permanent prices: the previous 75% promotional discount has been folded into a one-quarter base-rate adjustment as the promotion window closes on 15:59 UTC on 31 May 2026. The TUI cost estimator already uses these values, so no behavioural change is required. For any future price changes, consult the official DeepSeek pricing page.
Publishing Your Own Skill
codewhale discovers skills from workspace directories (.agents/skills → skills → .opencode/skills → .claude/skills → .cursor/skills) and global directories (~/.agents/skills → ~/.claude/skills → ~/.codewhale/skills → ~/.deepseek/skills). Each skill is a directory with a SKILL.md file:
~/.agents/skills/my-skill/
└── SKILL.md
Frontmatter required:
---
name: my-skill
description: Use this when DeepSeek should follow my custom workflow.
---
# My Skill
Instructions for the agent go here.
Commands: /skills (list), /skill (activate), /skill new (scaffold), /skill install github:/ (community), /skill update / uninstall / trust. Community installs from GitHub require no backend service. Installed skills appear in the model-visible session context; the agent can auto-select relevant skills via the load_skill tool when your task matches their descriptions.
First launch also installs bundled system skills for common workflows:
skill-creator, delegate, v4-best-practices, plugin-creator,
skill-installer, mcp-builder, documents, presentations,
spreadsheets, pdf, and feishu. These live under
~/.codewhale/skills (or legacy ~/.deepseek/skills) and are versioned so new bundles are added on upgrade
without recreating skills the user deliberately deleted.
dfwqdyl-ui — model ID case-sensitivity compatibility report (#729)
Oliver-ZPLiu — stale working... state bug report, Windows clipboard fallback, MCP Streamable HTTP session fixes, and Homebrew tap automation (#738, #850, #1643, #1631)
reidliu41 — resume hint, workspace trust persistence, Ollama provider support, thinking-block stream finalization, CI cache hardening, streaming wrap, and DeepSeek model completions (#863, #870, #921, #1078, #1603, #1628, #1601)