Testing

OpenClaw có ba bộ test Vitest (unit/integration, e2e, live) và một số Docker runners.

Tài liệu này là hướng dẫn “cách mình test”:

Mỗi test suite bao gồm những gì (và những gì nó cố ý không bao gồm)
Lệnh nào cần chạy cho các workflow phổ biến (local, pre-push, debugging)
Cách live tests tìm credentials và chọn models/providers
Cách thêm regressions cho các vấn đề model/provider thực tế

Quick start

Hầu hết các ngày:

Full gate (cần chạy trước khi push): pnpm build && pnpm check && pnpm test

Khi các bạn sửa tests hoặc muốn thêm độ tin cậy:

Coverage gate: pnpm test:coverage
E2E suite: pnpm test:e2e

Khi debug các providers/models thực (cần credentials thật):

Live suite (models + gateway tool/image probes): pnpm test:live

Tip: khi các bạn chỉ cần một test case bị lỗi, nên thu hẹp live tests qua các biến môi trường allowlist được mô tả bên dưới.

Test suites (chạy cái gì ở đâu)

Hãy nghĩ về các suites như “tăng dần tính thực tế” (và tăng dần flakiness/chi phí):

Unit / integration (mặc định)

Lệnh: pnpm test
Config: vitest.config.ts
Files: src/**/*.test.ts
Phạm vi:
- Pure unit tests
- In-process integration tests (gateway auth, routing, tooling, parsing, config)
- Deterministic regressions cho các bugs đã biết
Kỳ vọng:
- Chạy trong CI
- Không cần keys thật
- Nên nhanh và ổn định

E2E (gateway smoke)

Lệnh: pnpm test:e2e
Config: vitest.e2e.config.ts
Files: src/**/*.e2e.test.ts
Phạm vi:
- Multi-instance gateway end-to-end behavior
- WebSocket/HTTP surfaces, node pairing, và networking nặng hơn
Kỳ vọng:
- Chạy trong CI (khi được bật trong pipeline)
- Không cần keys thật
- Nhiều phần chuyển động hơn unit tests (có thể chậm hơn)

Live (real providers + real models)

Lệnh: pnpm test:live
Config: vitest.live.config.ts
Files: src/**/*.live.test.ts
Mặc định: enabled bởi pnpm test:live (đặt OPENCLAW_LIVE_TEST=1)
Phạm vi:
- “Provider/model này có thực sự hoạt động hôm nay với credentials thật không?”
- Bắt các thay đổi format của provider, quirks tool-calling, vấn đề auth, và hành vi rate limit
Kỳ vọng:
- Không ổn định trong CI theo thiết kế (mạng thật, chính sách provider thật, quotas, outages)
- Tốn tiền / sử dụng rate limits
- Nên chạy các tập con thu hẹp thay vì “mọi thứ”
- Live runs sẽ source ~/.profile để lấy các API keys còn thiếu
- Anthropic key rotation: đặt OPENCLAW_LIVE_ANTHROPIC_KEYS="sk-...,sk-..." (hoặc OPENCLAW_LIVE_ANTHROPIC_KEY=sk-...) hoặc nhiều biến ANTHROPIC_API_KEY*; tests sẽ retry khi gặp rate limits

Nên chạy suite nào?

Dùng bảng quyết định này:

Sửa logic/tests: chạy pnpm test (và pnpm test:coverage nếu các bạn thay đổi nhiều)
Sửa gateway networking / WS protocol / pairing: thêm pnpm test:e2e
Debug “bot của mình bị down” / lỗi provider-specific / tool calling: chạy pnpm test:live thu hẹp

Live: model smoke (profile keys)

Live tests được chia thành hai layers để mình có thể cô lập các lỗi:

“Direct model” cho mình biết provider/model có thể trả lời được không với key đã cho.
“Gateway smoke” cho mình biết toàn bộ pipeline gateway+agent hoạt động cho model đó (sessions, history, tools, sandbox policy, v.v.).

Layer 1: Direct model completion (không có gateway)

Test: src/agents/models.profiles.live.test.ts
Mục tiêu:
- Liệt kê các models đã phát hiện
- Dùng getApiKeyForModel để chọn models các bạn có credentials
- Chạy một completion nhỏ cho mỗi model (và các regressions có mục tiêu khi cần)
Cách bật:
- pnpm test:live (hoặc OPENCLAW_LIVE_TEST=1 nếu gọi Vitest trực tiếp)
Đặt OPENCLAW_LIVE_MODELS=modern (hoặc all, alias cho modern) để thực sự chạy suite này; nếu không nó sẽ skip để giữ pnpm test:live tập trung vào gateway smoke
Cách chọn models:
- OPENCLAW_LIVE_MODELS=modern để chạy modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)
- OPENCLAW_LIVE_MODELS=all là alias cho modern allowlist
- hoặc OPENCLAW_LIVE_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,..." (comma allowlist)
Cách chọn providers:
- OPENCLAW_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli" (comma allowlist)
Keys đến từ đâu:
- Mặc định: profile store và env fallbacks
- Đặt OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 để bắt buộc chỉ profile store
Tại sao cái này tồn tại:
- Tách biệt “provider API bị hỏng / key không hợp lệ” khỏi “gateway agent pipeline bị hỏng”
- Chứa các regressions nhỏ, cô lập (ví dụ: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)

Layer 2: Gateway + dev agent smoke (những gì “@openclaw” thực sự làm)

Test: src/gateway/gateway-models.profiles.live.test.ts
Mục tiêu:
- Khởi động một in-process gateway
- Tạo/patch một agent:dev:* session (model override mỗi lần chạy)
- Lặp qua models-with-keys và assert:
  - Response “có ý nghĩa” (không có tools)
  - Một tool invocation thật hoạt động (read probe)
  - Các tool probes bổ sung tùy chọn (exec+read probe)
  - Các đường dẫn regression OpenAI (tool-call-only → follow-up) tiếp tục hoạt động
Chi tiết Probe (để các bạn có thể giải thích lỗi nhanh):
- read probe: test ghi một nonce file trong workspace và yêu cầu agent read nó và echo nonce trả lại.
- exec+read probe: test yêu cầu agent exec-write một nonce vào temp file, sau đó read nó trả lại.
- image probe: test đính kèm một PNG được tạo (cat + randomized code) và mong đợi model trả về cat <CODE>.
- Tham khảo implementation: src/gateway/gateway-models.profiles.live.test.ts và src/gateway/live-image-probe.ts.
Cách bật:
- pnpm test:live (hoặc OPENCLAW_LIVE_TEST=1 nếu gọi Vitest trực tiếp)
Cách chọn models:
- Mặc định: modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)
- OPENCLAW_LIVE_GATEWAY_MODELS=all là alias cho modern allowlist
- Hoặc đặt OPENCLAW_LIVE_GATEWAY_MODELS="provider/model" (hoặc comma list) để thu hẹp
Cách chọn providers (tránh “OpenRouter everything”):
- OPENCLAW_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax" (comma allowlist)
Tool + image probes luôn bật trong live test này:
- read probe + exec+read probe (tool stress)
- image probe chạy khi model quảng cáo hỗ trợ image input
- Flow (high level):
  - Test tạo một PNG nhỏ với “CAT” + random code (src/gateway/live-image-probe.ts)
  - Gửi nó qua agent attachments: [{ mimeType: "image/png", content: "<base64>" }]
  - Gateway parse attachments thành images[] (src/gateway/server-methods/agent.ts + src/gateway/chat-attachments.ts)
  - Embedded agent forwards một multimodal user message đến model
  - Assertion: reply chứa cat + code (OCR tolerance: cho phép sai sót nhỏ)

Tip: để xem các bạn có thể test gì trên máy của mình (và các provider/model ids chính xác), chạy:

openclaw models list
openclaw models list --json

Live: Anthropic setup-token smoke

Test: src/agents/anthropic.setup-token.live.test.ts
Mục tiêu: xác minh Claude Code CLI setup-token (hoặc một pasted setup-token profile) có thể hoàn thành một Anthropic prompt.
Bật:
- pnpm test:live (hoặc OPENCLAW_LIVE_TEST=1 nếu gọi Vitest trực tiếp)
- OPENCLAW_LIVE_SETUP_TOKEN=1
Token sources (chọn một):
- Profile: OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test
- Raw token: OPENCLAW_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...
Model override (tùy chọn):
- OPENCLAW_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-5

Ví dụ setup:

openclaw models auth paste-token --provider anthropic --profile-id anthropic:setup-token-test
OPENCLAW_LIVE_SETUP_TOKEN=1 OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test pnpm test:live src/agents/anthropic.setup-token.live.test.ts

Live: CLI backend smoke (Claude Code CLI hoặc các local CLIs khác)

Test: src/gateway/gateway-cli-backend.live.test.ts
Mục tiêu: validate Gateway + agent pipeline sử dụng local CLI backend, mà không chạm vào config mặc định của các bạn.
Bật:
- pnpm test:live (hoặc OPENCLAW_LIVE_TEST=1 nếu gọi Vitest trực tiếp)
- OPENCLAW_LIVE_CLI_BACKEND=1
Mặc định:
- Model: claude-cli/claude-sonnet-4-5
- Command: claude
- Args: ["-p","--output-format","json","--dangerously-skip-permissions"]
Overrides (tùy chọn):
- OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-opus-4-5"
- OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.2-codex"
- OPENCLAW_LIVE_CLI_BACKEND_COMMAND="/full/path/to/claude"
- OPENCLAW_LIVE_CLI_BACKEND_ARGS='["-p","--output-format","json","--permission-mode","bypassPermissions"]'
- OPENCLAW_LIVE_CLI_BACKEND_CLEAR_ENV='["ANTHROPIC_API_KEY","ANTHROPIC_API_KEY_OLD"]'
- OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1 để gửi một real image attachment (paths được inject vào prompt).
- OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG="--image" để truyền image file paths dưới dạng CLI args thay vì prompt injection.
- OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE="repeat" (hoặc "list") để kiểm soát cách image args được truyền khi IMAGE_ARG được đặt.
- OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1 để gửi turn thứ hai và validate resume flow.
OPENCLAW_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0 để giữ Claude Code CLI MCP config enabled (mặc định disable MCP config với một temporary empty file).

Ví dụ:

OPENCLAW_LIVE_CLI_BACKEND=1 \
  OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-sonnet-4-5" \
  pnpm test:live src/gateway/gateway-cli-backend.live.test.ts

Recommended live recipes

Các allowlists thu hẹp, rõ ràng là nhanh nhất và ít flaky nhất:

Single model, direct (không có gateway):
- OPENCLAW_LIVE_MODELS="openai/gpt-5.2" pnpm test:live src/agents/models.profiles.live.test.ts
Single model, gateway smoke:
- OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Tool calling across several providers:
- OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-flash-preview,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Google focus (Gemini API key + Antigravity):
- Gemini (API key): OPENCLAW_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
- Antigravity (OAuth): OPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts

Lưu ý:

google/... sử dụng Gemini API (API key).
google-antigravity/... sử dụng Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
google-gemini-cli/... sử dụng local Gemini CLI trên máy các bạn (auth riêng + tooling quirks).
Gemini API vs Gemini CLI:
- API: OpenClaw gọi Google’s hosted Gemini API qua HTTP (API key / profile auth); đây là những gì hầu hết người dùng muốn nói khi nói “Gemini”.
- CLI: OpenClaw shells out đến một gemini binary local; nó có auth riêng và có thể hoạt động khác (streaming/tool support/version skew).

Live: model matrix (những gì mình cover)

Không có “CI model list” cố định (live là opt-in), nhưng đây là các models được khuyến nghị để cover thường xuyên trên dev machine với keys.

Modern smoke set (tool calling + image)

Đây là “common models” run mình mong đợi sẽ tiếp tục hoạt động:

OpenAI (non-Codex): openai/gpt-5.2 (tùy chọn: openai/gpt-5.1)
OpenAI Codex: openai-codex/gpt-5.2 (tùy chọn: openai-codex/gpt-5.2-codex)
Anthropic: anthropic/claude-opus-4-5 (hoặc anthropic/claude-sonnet-4-5)
Google (Gemini API): google/gemini-3-pro-preview và google/gemini-3-flash-preview (tránh các Gemini 2.x models cũ hơn)
Google (Antigravity): google-antigravity/claude-opus-4-5-thinking và google-antigravity/gemini-3-flash
Z.AI (GLM): zai/glm-4.7
MiniMax: minimax/minimax-m2.1

Chạy gateway smoke với tools + image: OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.2,openai-codex/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts

Baseline: tool calling (Read + optional Exec)

Chọn ít nhất một per provider family:

OpenAI: openai/gpt-5.2 (hoặc openai/gpt-5-mini)
Anthropic: anthropic/claude-opus-4-5 (hoặc anthropic/claude-sonnet-4-5)
Google: google/gemini-3-flash-preview (hoặc google/gemini-3-pro-preview)
Z.AI (GLM): zai/glm-4.7
MiniMax: minimax/minimax-m2.1

Optional additional coverage (nice to have):

xAI: xai/grok-4 (hoặc latest available)
Mistral: mistral/… (chọn một “tools” capable model các bạn đã bật)
Cerebras: cerebras/… (nếu các bạn có access)
LM Studio: lmstudio/… (local; tool calling phụ thuộc vào API mode)

Vision: image send (attachment → multimodal message)

Bao gồm ít nhất một image-capable model trong OPENCLAW_LIVE_GATEWAY_MODELS (Claude/Gemini/OpenAI vision-capable variants, v.v.) để thực hiện image probe.

Aggregators / alternate gateways

Nếu các bạn có keys enabled, mình cũng hỗ trợ testing qua:

OpenRouter: openrouter/... (hàng trăm models; dùng openclaw models scan để tìm tool+image capable candidates)
OpenCode Zen: opencode/... (auth qua OPENCODE_API_KEY / OPENCODE_ZEN_API_KEY)

Nhiều providers các bạn có thể bao gồm trong live matrix (nếu các bạn có creds/config):

Built-in: openai, openai-codex, anthropic, google, google-vertex, google-antigravity, google-gemini-cli, zai, openrouter, opencode, xai, groq, cerebras, mistral, github-copilot
Qua models.providers (custom endpoints): minimax (cloud/API), cộng với bất kỳ OpenAI/Anthropic-compatible proxy nào (LM Studio, vLLM, LiteLLM, v.v.)

Tip: đừng cố hardcode “all models” trong docs. Danh sách authoritative là bất cứ thứ gì discoverModels(...) trả về trên máy các bạn + bất cứ keys nào available.

Credentials (không bao giờ commit)

Live tests phát hiện credentials theo cách giống như CLI làm. Ý nghĩa thực tế:

Nếu CLI hoạt động, live tests nên tìm thấy cùng các keys.
Nếu một live test nói “no creds”, debug theo cách các bạn sẽ debug openclaw models list / model selection.
Profile store: ~/.openclaw/credentials/ (ưu tiên; những gì “profile keys” có nghĩa trong tests)
Config: ~/.openclaw/openclaw.json (hoặc OPENCLAW_CONFIG_PATH)

Nếu các bạn muốn dựa vào env keys (ví dụ: exported trong ~/.profile của các bạn), chạy local tests sau source ~/.profile, hoặc dùng Docker runners bên dưới (chúng có thể mount ~/.profile vào container).

Deepgram live (audio transcription)

Test: src/media-understanding/providers/deepgram/audio.live.test.ts
Bật: DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live src/media-understanding/providers/deepgram/audio.live.test.ts

Docker runners (optional “works in Linux” checks)

Những cái này chạy pnpm test:live bên trong repo Docker image, mounting local config dir và workspace của các bạn (và sourcing ~/.profile nếu mounted):

Direct models: pnpm test:docker:live-models (script: scripts/test-live-models-docker.sh)
Gateway + dev agent: pnpm test:docker:live-gateway (script: scripts/test-live-gateway-models-docker.sh)
Onboarding wizard (TTY, full scaffolding): pnpm test:docker:onboard (script: scripts/e2e/onboard-docker.sh)
Gateway networking (two containers, WS auth + health): pnpm test:docker:gateway-network (script: scripts/e2e/gateway-network-docker.sh)
Plugins (custom extension load + registry smoke): pnpm test:docker:plugins (script: scripts/e2e/plugins-docker.sh)

Các biến môi trường hữu ích:

OPENCLAW_CONFIG_DIR=... (mặc định: ~/.openclaw) mounted đến /home/node/.openclaw
OPENCLAW_WORKSPACE_DIR=... (mặc định: ~/.openclaw/workspace) mounted đến /home/node/.openclaw/workspace
OPENCLAW_PROFILE_FILE=... (mặc định: ~/.profile) mounted đến /home/node/.profile và sourced trước khi chạy tests
OPENCLAW_LIVE_GATEWAY_MODELS=... / OPENCLAW_LIVE_MODELS=... để thu hẹp run
OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 để đảm bảo creds đến từ profile store (không phải env)

Docs sanity

Chạy docs checks sau khi sửa doc: pnpm docs:list.

Offline regression (CI-safe)

Đây là các “real pipeline” regressions mà không có real providers:

Gateway tool calling (mock OpenAI, real gateway + agent loop): src/gateway/gateway.tool-calling.mock-openai.test.ts
Gateway wizard (WS wizard.start/wizard.next, writes config + auth enforced): src/gateway/gateway.wizard.e2e.test.ts

Agent reliability evals (skills)

Mình đã có một vài CI-safe tests hoạt động như “agent reliability evals”:

Mock tool-calling through the real gateway + agent loop (src/gateway/gateway.tool-calling.mock-openai.test.ts).
End-to-end wizard flows validate session wiring và config effects (src/gateway/gateway.wizard.e2e.test.ts).

Những gì vẫn còn thiếu cho skills (xem Skills):

Decisioning: khi skills được liệt kê trong prompt, agent có chọn đúng skill không (hoặc tránh những cái không liên quan)?
Compliance: agent có đọc SKILL.md trước khi sử dụng và tuân theo các bước/args bắt buộc không?
Workflow contracts: các kịch bản multi-turn assert tool order, session history carryover, và sandbox boundaries.

Future evals nên giữ deterministic trước:

Một scenario runner sử dụng mock providers để assert tool calls + order, skill file reads, và session wiring.
Một suite nhỏ các kịch bản tập trung vào skill (use vs avoid, gating, prompt injection).
Optional live evals (opt-in, env-gated) chỉ sau khi CI-safe suite đã có sẵn.

Adding regressions (hướng dẫn)

Khi các bạn fix một vấn đề provider/model được phát hiện trong live:

Thêm một CI-safe regression nếu có thể (mock/stub provider, hoặc capture exact request-shape transformation)
Nếu nó inherently live-only (rate limits, auth policies), giữ live test thu hẹp và opt-in qua env vars
Ưu tiên targeting layer nhỏ nhất bắt được bug:
- provider request conversion/replay bug → direct models test
- gateway session/history/tool pipeline bug → gateway live smoke hoặc CI-safe gateway mock test