wsimmonds/claude-nextjs-skills

A proof of concept set of Claude Skills designed to improve the pass rates against the Next.js evals on https://nextjs.org/evals

License:MITLanguage:N/A402

Deep Analysis

Skill set designed to improve Claude's performance on Vercel Next.js evaluation tests, raising pass rate from 32% to 78%

Core Features

Technical Implementation

Highlights
  • Significantly improves eval pass rates: Haiku from 32% to 78%, Sonnet from 32% to 76%
  • Raises key reflection: does improved eval score equal improved real code quality
Use Cases
  • Using Claude to develop Next.js App Router applications
  • AI assistance needed for Next.js server/client component patterns
  • Projects integrating Vercel AI SDK
Limitations
  • Only proof-of-concept project, actual results may vary by project
  • Optimized for specific evaluation, may have overfitting risk
Tech Stack
Next.js App RouterVercel AI SDKClaude Skills

Claude Next.js Skills

Claude Next.js Skills is a POC bundle of automations aimed at seeing if creating skills quickly could improve the Next.js eval scores published by Vercel at https://nextjs.org/evals.

Baseline (before skills)

Model Success Rate
Claude Haiku 4.5 32%
Claude Sonnet 4.5 32%

Skilled Runs (after skills)

Model Success Rate Expected Leaderboard Slot
Claude Haiku 4.5 "Skilled" 78% (39/50) 1
Claude Sonnet 4.5 "Skilled" 76% (38/50) 2

Result Snapshot

Claude Haiku 4.5:

πŸ“Š Results:
════════════════════════════════════════════════════════════════════════════════

Eval Claude Code
000-app-router-migration-simple βœ…βœ…βœ… (204.8s)
001-server-component βœ…βœ…βœ… (94.4s)
002-client-component βœ…βœ…βœ… (94.7s)
003-cookies βœ…βœ…βœ… (136.7s)
004-search-params βœ…βœ…βœ… (219.1s)
005-react-use-api βœ…βœ…βœ… (138.0s)
006-server-metadata βœ…βœ…βœ… (91.4s)
007-client-metadata βœ…βœ…βŒ (86.0s)
008-generate-static-params βŒβœ…βŒ (83.0s)
009-og-images βœ…βœ…βœ… (108.0s)
010-route-handlers βœ…βœ…βœ… (69.1s)
011-client-server-form βœ…βœ…βœ… (167.9s)
012-parallel-routes βœ…βœ…βœ… (157.7s)
013-pathname-server ❌❌❌ (70.8s)
014-server-routing βœ…βœ…βŒ (94.0s)
015-server-actions-exports βœ…βœ…βŒ (69.6s)
016-client-cookies βŒβœ…βŒ (65.9s)
017-use-search-params βœ…βœ…βŒ (59.7s)
018-use-router βœ…βœ…βœ… (72.6s)
019-use-action-state βœ…βœ…βœ… (120.3s)
020-no-use-effect βœ…βœ…βŒ (92.0s)
021-avoid-fetch-in-effect βœ…βœ…βœ… (238.6s)
022-prefer-server-actions βœ…βœ…βœ… (312.2s)
023-avoid-getserversideprops βœ…βœ…βœ… (166.2s)
024-avoid-redundant-usestate βœ…βœ…βœ… (111.5s)
025-prefer-next-link βœ…βœ…βœ… (120.9s)
026-no-serial-await βŒβœ…βœ… (153.3s)
027-prefer-next-image βœ…βœ…βœ… (98.7s)
028-prefer-next-font βœ…βœ…βœ… (90.5s)
029-use-cache-directive βœ…βœ…βœ… (212.0s)
030-app-router-migration-hard βœ…βœ…βœ… (453.7s)
031-ai-sdk-migration-simple βœ…βœ…βœ… (178.7s)
032-ai-sdk-model-specification-string βœ…βœ…βœ… (194.9s)
033-ai-sdk-v4-model-specification-function βœ…βœ…βœ… (166.7s)
034-ai-sdk-render-visual-info ❌❌❌ (659.4s)
035-ai-sdk-call-tools βœ…βœ…βœ… (338.5s)
036-ai-sdk-call-tools-multiple-steps βœ…βœ…βœ… (307.9s)
037-ai-sdk-embed-text βœ…βœ…βœ… (155.0s)
038-ai-sdk-mcp βœ…βœ…βœ… (260.5s)
039-parallel-routes βœ…βœ…βœ… (217.6s)
040-intercepting-routes βœ…βœ…βœ… (208.4s)
041-route-groups βœ…βœ…βœ… (139.1s)
042-loading-ui βœ…βœ…βœ… (123.9s)
043-error-boundaries βœ…βœ…βœ… (138.4s)
044-metadata-api βœ…βœ…βœ… (115.6s)
045-server-actions-form βœ…βœ…βŒ (88.8s)
046-streaming βœ…βœ…βœ… (129.9s)
047-middleware βœ…βœ…βœ… (78.2s)
048-draft-mode βœ…βœ…βœ… (207.6s)
049-revalidation βœ…βœ…βœ… (124.2s)
-------------------------------------------- --------------------------
Overall (B/L/T) 45/48/40 (90%, 96%, 80%)

Claude Sonnet 4.5:

πŸ“Š Results:
════════════════════════════════════════════════════════════════════════════════

Eval Claude Code
000-app-router-migration-simple βœ…βœ…βœ… (164.1s)
001-server-component βœ…βœ…βœ… (88.8s)
002-client-component βœ…βœ…βœ… (109.4s)
003-cookies βœ…βœ…βœ… (138.6s)
004-search-params βœ…βœ…βœ… (223.4s)
005-react-use-api βœ…βœ…βœ… (132.7s)
006-server-metadata βœ…βœ…βœ… (90.8s)
007-client-metadata βœ…βœ…βŒ (81.1s)
008-generate-static-params βŒβœ…βŒ (78.0s)
009-og-images βœ…βœ…βœ… (105.3s)
010-route-handlers βœ…βœ…βœ… (75.1s)
011-client-server-form βœ…βœ…βœ… (179.3s)
012-parallel-routes βœ…βœ…βœ… (153.3s)
013-pathname-server ❌❌❌ (78.2s)
014-server-routing βŒβœ…βŒ (103.1s)
015-server-actions-exports βœ…βœ…βŒ (76.0s)
016-client-cookies βŒβœ…βŒ (71.5s)
017-use-search-params βœ…βœ…βŒ (64.1s)
018-use-router βœ…βœ…βœ… (77.2s)
019-use-action-state βœ…βœ…βœ… (123.8s)
020-no-use-effect βœ…βœ…βŒ (94.5s)
021-avoid-fetch-in-effect βœ…βœ…βœ… (269.8s)
022-prefer-server-actions βœ…βœ…βœ… (242.4s)
023-avoid-getserversideprops βœ…βœ…βœ… (207.7s)
024-avoid-redundant-usestate βœ…βœ…βœ… (126.4s)
025-prefer-next-link βœ…βœ…βœ… (96.4s)
026-no-serial-await βœ…βœ…βœ… (220.4s)
027-prefer-next-image βœ…βœ…βœ… (103.4s)
028-prefer-next-font βœ…βœ…βœ… (81.4s)
029-use-cache-directive βœ…βœ…βœ… (208.8s)
030-app-router-migration-hard βœ…βœ…βœ… (397.9s)
031-ai-sdk-migration-simple βœ…βœ…βœ… (233.2s)
032-ai-sdk-model-specification-string βœ…βœ…βœ… (150.4s)
033-ai-sdk-v4-model-specification-function βœ…βœ…βœ… (118.6s)
034-ai-sdk-render-visual-info βœ…βœ…βŒ (567.8s)
035-ai-sdk-call-tools βœ…βœ…βœ… (209.6s)
036-ai-sdk-call-tools-multiple-steps βœ…βœ…βœ… (247.7s)
037-ai-sdk-embed-text βœ…βœ…βœ… (181.0s)
038-ai-sdk-mcp ❌❌❌
039-parallel-routes βœ…βœ…βœ… (162.7s)
040-intercepting-routes βœ…βœ…βœ… (201.2s)
041-route-groups βœ…βœ…βœ… (147.7s)
042-loading-ui βœ…βœ…βœ… (141.3s)
043-error-boundaries βœ…βœ…βœ… (228.7s)
044-metadata-api βœ…βœ…βœ… (110.0s)
045-server-actions-form βœ…βœ…βŒ (101.0s)
046-streaming βœ…βœ…βœ… (126.4s)
047-middleware βœ…βœ…βœ… (91.2s)
048-draft-mode βœ…βœ…βœ… (210.0s)
049-revalidation βŒβœ…βœ… (195.9s)
-------------------------------------------- --------------------------
Overall (B/L/T) 44/48/39 (88%, 96%, 78%)

Outcomes

Does this translate into improved real world code or is Claude now just optimised to better pass evals? This would be a great area to explore.

Please try and let me know.