gaearon/woodshed
Create, run, rate, and iterate on your Claude Skills
Deep Analysis
Claude Skills development testing framework supporting skill creation, running, evaluation, and iterative improvement
Core Features
Quickly create new skill development workspace via npx woodshed create
Support multiple variants of same skill for comparative testing
Default runs 10 complete test matrix iterations, supports idempotency
Define evaluation criteria via eval.md, auto-score using eval agent
Complete results folder management with logs, workspaces, and eval data
Technical Implementation
- Developed by React core developer Dan Abramov
- Idempotent design auto-skips completed tests on rerun
- Dual-agent evaluation mode ensures skill quality
- A/B testing style multi-variant comparison support
- Innovative workflow collaborating with another Claude instance to analyze results
- Rapidly iterate new Claude Skills development
- Compare test different skill implementation approaches
- Establish skill quality evaluation benchmarks
- Personal skill experimentation and optimization
- Alpha version, personal use only
- YOLO mode may cause data loss
- May consume substantial tokens
- Code entirely AI-generated, author hasn't reviewed code
- No liability disclaimer
woodshed
Create, run, rate, and iterate on your Claude Skills.
⚠️ WARNING ⚠️
This is alpha software written for personal use.
- It runs Claude in yolo mode which can and will wipe your data.
- It can also burn through a ton of tokens if your Skills aren't lean.
- It is 100% vibecoded, and this time I have not read the code.
If you end up burning all your tokens only to brick your computer, I'm not responsible.
Usage
Create a new workspace:
npx woodshed create my-idea
That gives you a place work on your Skills:
cd my-idea
Time to shed:
npx woodshed
By default, the entire matrix runs 10 times.
Shed is idempotent so by default re-running it will instantly "skip over" past results as if they happened instantly.
Pass npx woodshed --reset to force re-runs. This will delete each re-run's data in results/ before attempting it. You can also delete the results/ yourself if you want.
Take a close look at the results/ folder after your first successful run. It contains the log from the main agent with your fixture's prompt.md, the log from the evaluating agent with your fixture's eval.md, the workspace folder in which they both run, and probably some other junk.
Folder Conventions
my-idea/
skills/
# Skills you want to create or refine
my-skill/
# Each Skill can have one or more variants
baseline/SKILL.md
experiment/SKILL.md
silly/SKILL.md
fixtures/
# Fixtures that test your Skill(s)
my-fixture/
prompt.md
eval.md
assets/
# Data shared by fixtures
words.txt
results/
# Outputs and past runs appear here
Workflow
I recommend running another Claude instance and to talk with it about the /results.
Then you can use insights from that convo to refine your eval.md and SKILL.md.
Tip: If you're iterating on a skill, ask Claude to write down each experiment in a doc so you can see what works and what doesn't.
Options
--runs <n> Number of runs per variant (default: 10)
--reset Delete old results and start fresh
--reeval Re-run evaluation on existing workspaces
--cache-only Show cached results only
License
MIT
Related Skills
wshobson/agents
wshobsonIntelligent automation and multi-agent orchestration for Claude Code
The most comprehensive Claude Code plugin ecosystem, covering full-stack development scenarios with a three-tier model strategy balancing performance and cost.
ComposioHQ/awesome-claude-skills
ComposioHQA curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
The most comprehensive Claude Skills resource list; connect-apps is a killer feature.
code-yeongyu/oh-my-opencode
code-yeongyuThe Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.
Powerful multi-agent coding tool, but note OAuth limitations.
thedotmack/claude-mem
thedotmackA Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
A practical solution for Claude's memory issues.
OthmanAdi/planning-with-files
OthmanAdiClaude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.
Context engineering best practices; an open-source implementation of Manus mode.
yusufkaraaslan/Skill_Seekers
yusufkaraaslanConvert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
An automation powerhouse for skill creation, dramatically improving efficiency.

