Deep Analysis

针对 Claude Code 的生产级工具包，包含 28 条命令、5 个智能体和 6 个领域技能，通过 6 个月的日常使用经验验证

Core Features

Technical Implementation

Highlights

生产验证：6 个月+日常使用经验
与 Anthropic 官方最佳实践完全对齐
无需 MCP 即可运行，有 MCP 时功能增强
渐进式信息披露模式，节省 70%+ token
Token 限制强制工具（mdtoken）防止文件膨胀
5 个专用智能体（架构师、测试工程师、代码审查员等）
自动 git 安全防护

Use Cases

系统化任务执行：从需求分析到代码交付的完整工作流
会话管理：跨上下文窗口边界的无缝工作继续
代码质量保证：深度代码分析、审查、测试创建
项目持久化记忆：跨会话的项目知识积累
MCP 集成工作流
领域特定开发：量化金融、专业写作等

Limitations

依赖 Claude Code 2.0+
MCP 服务器为可选项，某些高级功能受限
需要 jq、Node.js 等外部依赖
不能克服 Claude 的基础行为限制
上下文窗口大小仍是核心制约因素
复杂工作流的学习曲线相对陡峭

Tech Stack

PythonJavaScript/Node.jsMarkdownJSONGitBashjqMCPClaude Code 2.0+

README

View on GitHub

Claude Code Toolkit

Name: claude-code-toolkit
Rating: 3.7 (7 reviews)
Author: applied-artificial-intelligence

Production-tested commands, skills, and patterns for Claude Code

Task Execution Workflow: Explore → Plan → Next → Ship

Overview

A collection of plugins, skills, and patterns developed through 6+ months of daily Claude Code use. Copy what works, adapt to your needs.

What's included: 28 commands, 5 agents, and 6 domain skills across 6 core plugins.

Design Goals

Stateless architecture - Commands execute independently, state persisted in files
File-based persistence - JSON and Markdown for all state management
MCP integration - Optional Model Context Protocol tools with graceful degradation
Progressive disclosure - Load context incrementally to optimize token usage
Self-containment - All logic inline, no external script dependencies

Key Capabilities

Workflow management: explore → plan → next → ship pattern
Memory persistence: Cross-session context with automatic reflection
Quality automation: Git safety, pre/post hooks, compliance auditing
MCP integration: Optional tools (Serena, Sequential Thinking) with graceful degradation
Domain adaptation: Examples showing how to extend Claude Code beyond software (quant, writing)

Reality Check: The Limits of Customization

Before diving into the toolkit, understand what you're working with.

The Instruction Training Boundary

Claude's behavior comes from two layers:

Layer	What It Contains	Can You Change It?
Instruction training	Core personality, instincts, behavioral drivers	No—immutable
Customization	System prompts, CLAUDE.md, agent definitions	Yes—but limited

No matter how clever your prompting, you're working with Claude's pre-trained personality. You can nudge it, structure it, guide it—but the underlying instincts remain.

Behaviors You'll Encounter

Sycophancy

The tendency to agree, validate, and praise. We developed elaborate anti-sycophancy protocols—they failed. Give contradictory prompts and you'll still get "You're absolutely right!"

Completion Bias

The urge to deliver results regardless of completeness. Claude will proceed without full specifications, filling gaps with assumptions rather than asking. It's trained to deliver, not to pause and question.

Action Over Reflection

Bias toward doing, not critically examining. Tunnel vision on declared goals. Big picture thinking and honest pushback require deliberate prompting.

Context Limitations

The precious context window determines what Claude can "see." Important details get forgotten, overlooked, or deprioritized. Specify too much and things get lost; too little and Claude fills gaps incorrectly.

What This Means for You

Thorough inspection is non-negotiable. Claude can make mistakes in every respect. Never assume output matches expectations—everything requires review, especially things that look correct.

Proper testing is essential. Validate behavior, not just structure. Test edge cases Claude may not have considered. Don't trust "it should work"—verify it does.

The toolkit helps, but doesn't eliminate these behaviors. We provide structure, workflows, and guardrails. Claude's base personality operates within that structure—sometimes it will frustrate you because it just wants to deliver.

Why We're Honest About This

We removed dubious metrics from this toolkit ("80% token reduction," "zero hallucinations") because they weren't validated—and overclaiming doesn't help anyone.

Claude Code is genuinely powerful:

Amazing at navigating complex codebases
Excellent at using tools to solve problems
Incredibly convenient in the terminal environment

But it has real limitations you'll encounter repeatedly. This toolkit provides patterns for working with those constraints rather than pretending they don't exist.

Anthropic Best Practices Alignment

This toolkit implements patterns and recommendations from Anthropic's official Claude Code documentation. It represents "what Anthropic says you should be doing, here implemented."

Anthropic Best Practices Alignment

Multi-Context Window Workflows

Anthropic Recommendation (Claude 4 Best Practices):

"For complex, multi-step projects, we recommend starting with a first context window devoted to designing a structured representation of the project—tests, code stubs, documentation—that can be passed into subsequent context windows... Claude is adept at working across context window resets as long as clear instructions and todo lists are provided."

Toolkit Implementation:

/workflow:explore → /workflow:plan → /workflow:next → /workflow:ship workflow
state.json tracks task status across sessions (equivalent to Anthropic's tests.json)
/transition:handoff + /transition:continue for explicit context window transitions
TodoWrite integration for progress tracking across resets

State Tracking with JSON + Unstructured Notes + Git

Anthropic Recommendation:

"JSON provides structured data about what still needs to be done... Unstructured progress notes can be easier for Claude to incrementally update... Git checkpoints let Claude 'save progress' so it has a safe state to roll back to."

Toolkit Implementation:

JSON: state.json (task tracking), metadata.json (work unit metadata)
Unstructured: exploration.md (analysis notes), implementation-plan.md (task breakdown)
Git checkpoints: Automatic commits after each /workflow:next task completion

Progressive Disclosure Pattern

Anthropic Recommendation (Agent Skills):

"Metadata Layer: Skill name and description are pre-loaded... Core Documentation: The full SKILL.md content loads when Claude determines relevance... Reference Materials: Additional linked files load only as needed."

Toolkit Implementation:

Skills (skills/) use Anthropic's exact 3-layer disclosure pattern
Plugins load command metadata first, full instructions only when invoked
Memory (/memory:*) stores context for retrieval, not constant presence
Result: 70%+ token savings vs. loading everything upfront

Model-Invoked vs User-Invoked Actions

Anthropic Recommendation:

"Skills are model-invoked—Claude autonomously decides when to use them based on your request and the Skill's description. This is different from slash commands, which are user-invoked."

Toolkit Implementation:

Skills (skills/): Model-invoked, Claude loads when domain-relevant (RAG, Docker, SQL, etc.)
Commands (commands/): User-invoked via /command syntax
Agents (agents/): Claude-selected via Task tool based on task complexity
Clear separation between automatic (skills) and explicit (commands) invocation

Memory Tool Patterns

Anthropic Recommendation (Memory Tool Beta):

"ALWAYS VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE... Record status/progress/thoughts in your memory. ASSUME INTERRUPTION: Your context window might be reset at any moment."

Toolkit Implementation:

.claude/memory/ directory for persistent project knowledge
/memory:index creates persistent project understanding
/memory:memory-review displays current memory state
/transition:handoff saves critical context before session boundaries
Memory philosophy: Write down key decisions, discard outdated information

Quality Gates and Hooks

Anthropic Recommendation:

"Use pre-commit hooks to enforce code quality standards."

Toolkit Implementation:

git-safe-commit wrapper blocks --no-verify (no bypassing quality checks)
hooks/ directory for event handlers (pre/post tool use)
Example ruff-check-hook.sh demonstrates actionable feedback patterns
/system:audit validates framework compliance

Subagent Architecture

Anthropic Recommendation (Subagents):

"Use subagents for complex tasks that benefit from specialized focus... Each agent has its own context window, custom prompt, and can have restricted tool access."

Toolkit Implementation:

5 specialized agents: architect, test-engineer, code-reviewer, auditor, reasoning-specialist
Agents invoked via Claude Code's native Task tool
Each agent has focused responsibility and appropriate tool permissions
Pattern: Match agent specialization to task complexity

Graceful Degradation

Anthropic Recommendation (implicit in MCP documentation):

"All features should work without MCP tools, enhanced when available."

Toolkit Implementation:

Every command works without MCP servers
MCP enhances but never required for core functionality
Commands auto-detect MCP availability and adapt
Clear documentation of MCP benefits vs. baseline behavior

Why This Alignment Matters

This toolkit evolved through 6+ months of daily Claude Code use, facing the same pain points that Anthropic has been addressing in their platform improvements. Context limits, session boundaries, state persistence, quality degradation—these challenges are inherent to working with LLMs in development workflows.

It's no surprise that our solutions align with Anthropic's recommendations: **we're solving the same

applied-artificial-intelligence/claude-code-toolkit

Deep Analysis

Core Features

Technical Implementation

Claude Code Toolkit