wsimmonds/claude-nextjs-skills
A proof of concept set of Claude Skills designed to improve the pass rates against the Next.js evals on https://nextjs.org/evals
Deep Analysis
Skill set designed to improve Claude's performance on Vercel Next.js evaluation tests, raising pass rate from 32% to 78%
Core Features
Technical Implementation
- Significantly improves eval pass rates: Haiku from 32% to 78%, Sonnet from 32% to 76%
- Raises key reflection: does improved eval score equal improved real code quality
- Using Claude to develop Next.js App Router applications
- AI assistance needed for Next.js server/client component patterns
- Projects integrating Vercel AI SDK
- Only proof-of-concept project, actual results may vary by project
- Optimized for specific evaluation, may have overfitting risk
Claude Next.js Skills
Claude Next.js Skills is a POC bundle of automations aimed at seeing if creating skills quickly could improve the Next.js eval scores published by Vercel at https://nextjs.org/evals.
Baseline (before skills)
| Model | Success Rate |
|---|---|
| Claude Haiku 4.5 | 32% |
| Claude Sonnet 4.5 | 32% |
Skilled Runs (after skills)
| Model | Success Rate | Expected Leaderboard Slot |
|---|---|---|
| Claude Haiku 4.5 "Skilled" | 78% (39/50) | 1 |
| Claude Sonnet 4.5 "Skilled" | 76% (38/50) | 2 |
Result Snapshot
Claude Haiku 4.5:
π Results:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Eval | Claude Code |
|---|---|
| 000-app-router-migration-simple | β β β (204.8s) |
| 001-server-component | β β β (94.4s) |
| 002-client-component | β β β (94.7s) |
| 003-cookies | β β β (136.7s) |
| 004-search-params | β β β (219.1s) |
| 005-react-use-api | β β β (138.0s) |
| 006-server-metadata | β β β (91.4s) |
| 007-client-metadata | β β β (86.0s) |
| 008-generate-static-params | ββ β (83.0s) |
| 009-og-images | β β β (108.0s) |
| 010-route-handlers | β β β (69.1s) |
| 011-client-server-form | β β β (167.9s) |
| 012-parallel-routes | β β β (157.7s) |
| 013-pathname-server | βββ (70.8s) |
| 014-server-routing | β β β (94.0s) |
| 015-server-actions-exports | β β β (69.6s) |
| 016-client-cookies | ββ β (65.9s) |
| 017-use-search-params | β β β (59.7s) |
| 018-use-router | β β β (72.6s) |
| 019-use-action-state | β β β (120.3s) |
| 020-no-use-effect | β β β (92.0s) |
| 021-avoid-fetch-in-effect | β β β (238.6s) |
| 022-prefer-server-actions | β β β (312.2s) |
| 023-avoid-getserversideprops | β β β (166.2s) |
| 024-avoid-redundant-usestate | β β β (111.5s) |
| 025-prefer-next-link | β β β (120.9s) |
| 026-no-serial-await | ββ β (153.3s) |
| 027-prefer-next-image | β β β (98.7s) |
| 028-prefer-next-font | β β β (90.5s) |
| 029-use-cache-directive | β β β (212.0s) |
| 030-app-router-migration-hard | β β β (453.7s) |
| 031-ai-sdk-migration-simple | β β β (178.7s) |
| 032-ai-sdk-model-specification-string | β β β (194.9s) |
| 033-ai-sdk-v4-model-specification-function | β β β (166.7s) |
| 034-ai-sdk-render-visual-info | βββ (659.4s) |
| 035-ai-sdk-call-tools | β β β (338.5s) |
| 036-ai-sdk-call-tools-multiple-steps | β β β (307.9s) |
| 037-ai-sdk-embed-text | β β β (155.0s) |
| 038-ai-sdk-mcp | β β β (260.5s) |
| 039-parallel-routes | β β β (217.6s) |
| 040-intercepting-routes | β β β (208.4s) |
| 041-route-groups | β β β (139.1s) |
| 042-loading-ui | β β β (123.9s) |
| 043-error-boundaries | β β β (138.4s) |
| 044-metadata-api | β β β (115.6s) |
| 045-server-actions-form | β β β (88.8s) |
| 046-streaming | β β β (129.9s) |
| 047-middleware | β β β (78.2s) |
| 048-draft-mode | β β β (207.6s) |
| 049-revalidation | β β β (124.2s) |
| -------------------------------------------- | -------------------------- |
| Overall (B/L/T) | 45/48/40 (90%, 96%, 80%) |
Claude Sonnet 4.5:
π Results:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Eval | Claude Code |
|---|---|
| 000-app-router-migration-simple | β β β (164.1s) |
| 001-server-component | β β β (88.8s) |
| 002-client-component | β β β (109.4s) |
| 003-cookies | β β β (138.6s) |
| 004-search-params | β β β (223.4s) |
| 005-react-use-api | β β β (132.7s) |
| 006-server-metadata | β β β (90.8s) |
| 007-client-metadata | β β β (81.1s) |
| 008-generate-static-params | ββ β (78.0s) |
| 009-og-images | β β β (105.3s) |
| 010-route-handlers | β β β (75.1s) |
| 011-client-server-form | β β β (179.3s) |
| 012-parallel-routes | β β β (153.3s) |
| 013-pathname-server | βββ (78.2s) |
| 014-server-routing | ββ β (103.1s) |
| 015-server-actions-exports | β β β (76.0s) |
| 016-client-cookies | ββ β (71.5s) |
| 017-use-search-params | β β β (64.1s) |
| 018-use-router | β β β (77.2s) |
| 019-use-action-state | β β β (123.8s) |
| 020-no-use-effect | β β β (94.5s) |
| 021-avoid-fetch-in-effect | β β β (269.8s) |
| 022-prefer-server-actions | β β β (242.4s) |
| 023-avoid-getserversideprops | β β β (207.7s) |
| 024-avoid-redundant-usestate | β β β (126.4s) |
| 025-prefer-next-link | β β β (96.4s) |
| 026-no-serial-await | β β β (220.4s) |
| 027-prefer-next-image | β β β (103.4s) |
| 028-prefer-next-font | β β β (81.4s) |
| 029-use-cache-directive | β β β (208.8s) |
| 030-app-router-migration-hard | β β β (397.9s) |
| 031-ai-sdk-migration-simple | β β β (233.2s) |
| 032-ai-sdk-model-specification-string | β β β (150.4s) |
| 033-ai-sdk-v4-model-specification-function | β β β (118.6s) |
| 034-ai-sdk-render-visual-info | β β β (567.8s) |
| 035-ai-sdk-call-tools | β β β (209.6s) |
| 036-ai-sdk-call-tools-multiple-steps | β β β (247.7s) |
| 037-ai-sdk-embed-text | β β β (181.0s) |
| 038-ai-sdk-mcp | βββ |
| 039-parallel-routes | β β β (162.7s) |
| 040-intercepting-routes | β β β (201.2s) |
| 041-route-groups | β β β (147.7s) |
| 042-loading-ui | β β β (141.3s) |
| 043-error-boundaries | β β β (228.7s) |
| 044-metadata-api | β β β (110.0s) |
| 045-server-actions-form | β β β (101.0s) |
| 046-streaming | β β β (126.4s) |
| 047-middleware | β β β (91.2s) |
| 048-draft-mode | β β β (210.0s) |
| 049-revalidation | ββ β (195.9s) |
| -------------------------------------------- | -------------------------- |
| Overall (B/L/T) | 44/48/39 (88%, 96%, 78%) |
Outcomes
Does this translate into improved real world code or is Claude now just optimised to better pass evals? This would be a great area to explore.
Please try and let me know.

