gaearon/woodshed

Create, run, rate, and iterate on your Claude Skills

License:MITLanguage:TypeScript550
claude-skills

Deep Analysis

Claude Skills development testing framework supporting skill creation, running, evaluation, and iterative improvement

推荐(谨慎使用)

Core Features

Quickly create new skill development workspace via npx woodshed create

Support multiple variants of same skill for comparative testing

Default runs 10 complete test matrix iterations, supports idempotency

Define evaluation criteria via eval.md, auto-score using eval agent

Complete results folder management with logs, workspaces, and eval data

Technical Implementation

Architecture:CLI tool architecture using workspace mode to organize skills, test fixtures, and assets, with main agent and eval agent dual-agent mode
Execution Flow:

Key Components:
TypeScript
Claude YOLO Mode
Vitest
Husky
Highlights
  • Developed by React core developer Dan Abramov
  • Idempotent design auto-skips completed tests on rerun
  • Dual-agent evaluation mode ensures skill quality
  • A/B testing style multi-variant comparison support
  • Innovative workflow collaborating with another Claude instance to analyze results
Use Cases
  • Rapidly iterate new Claude Skills development
  • Compare test different skill implementation approaches
  • Establish skill quality evaluation benchmarks
  • Personal skill experimentation and optimization
Limitations
  • Alpha version, personal use only
  • YOLO mode may cause data loss
  • May consume substantial tokens
  • Code entirely AI-generated, author hasn't reviewed code
  • No liability disclaimer
Tech Stack
TypeScriptNode.jsnpxVitestHuskyPrettier

woodshed

Create, run, rate, and iterate on your Claude Skills.

screenshot

⚠️ WARNING ⚠️

This is alpha software written for personal use.

  • It runs Claude in yolo mode which can and will wipe your data.
  • It can also burn through a ton of tokens if your Skills aren't lean.
  • It is 100% vibecoded, and this time I have not read the code.

If you end up burning all your tokens only to brick your computer, I'm not responsible.

Usage

Create a new workspace:

npx woodshed create my-idea

That gives you a place work on your Skills:

cd my-idea

Time to shed:

npx woodshed

By default, the entire matrix runs 10 times.

Shed is idempotent so by default re-running it will instantly "skip over" past results as if they happened instantly.

Pass npx woodshed --reset to force re-runs. This will delete each re-run's data in results/ before attempting it. You can also delete the results/ yourself if you want.

Take a close look at the results/ folder after your first successful run. It contains the log from the main agent with your fixture's prompt.md, the log from the evaluating agent with your fixture's eval.md, the workspace folder in which they both run, and probably some other junk.

Folder Conventions

my-idea/
  skills/
    # Skills you want to create or refine
    my-skill/
      # Each Skill can have one or more variants
      baseline/SKILL.md
      experiment/SKILL.md
      silly/SKILL.md
  fixtures/
    # Fixtures that test your Skill(s)
    my-fixture/
      prompt.md
      eval.md
  assets/
    # Data shared by fixtures
    words.txt
  results/
    # Outputs and past runs appear here

Workflow

I recommend running another Claude instance and to talk with it about the /results.

Then you can use insights from that convo to refine your eval.md and SKILL.md.

Tip: If you're iterating on a skill, ask Claude to write down each experiment in a doc so you can see what works and what doesn't.

Options

--runs <n>      Number of runs per variant (default: 10)
--reset         Delete old results and start fresh
--reeval        Re-run evaluation on existing workspaces
--cache-only    Show cached results only

License

MIT

Highly Recommended
agents

wshobson/agents

wshobson

Intelligent automation and multi-agent orchestration for Claude Code

The most comprehensive Claude Code plugin ecosystem, covering full-stack development scenarios with a three-tier model strategy balancing performance and cost.

25.6k2.8k3 days ago
Highly Recommended
awesome-claude-skills

ComposioHQ/awesome-claude-skills

ComposioHQ

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

The most comprehensive Claude Skills resource list; connect-apps is a killer feature.

19.9k2.0k3 days ago
Recommended
oh-my-opencode

code-yeongyu/oh-my-opencode

code-yeongyu

The Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.

Powerful multi-agent coding tool, but note OAuth limitations.

17.5k1.2k3 days ago
Recommended
claude-mem

thedotmack/claude-mem

thedotmack

A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

A practical solution for Claude's memory issues.

14.0k9143 days ago
Highly Recommended
planning-with-files

OthmanAdi/planning-with-files

OthmanAdi

Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.

Context engineering best practices; an open-source implementation of Manus mode.

9.3k8113 days ago
Highly Recommended
Skill_Seekers

yusufkaraaslan/Skill_Seekers

yusufkaraaslan

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

An automation powerhouse for skill creation, dramatically improving efficiency.

6.8k6833 days ago