yusufkaraaslan/Skill_Seekers
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
Deep Analysis
Automatically converts documentation websites, GitHub repositories, and PDFs into Claude AI skills, completing in minutes what would take hours.
Core Features
Supports documentation websites, GitHub repositories, and PDF files
Auto-detects LLM-friendly documentation format for 10x acceleration
Deep code parsing, detects documentation and code conflicts
Extracts best examples and key concepts
Technical Implementation
- Complete skill creation in 20-40 minutes instead of hours
- Automatically detect conflicts between documentation and code implementation
- llms.txt format support for 10x acceleration
- Validated with 700+ test cases
- Create Claude skills for any framework/API
- Convert game engine docs to skills (Godot, Unity)
- Merge internal docs + code repos into skills
- Discover inconsistencies between documentation and code
- Requires Python 3.10+
- Complex websites may need scraping configuration adjustments
Skill Seeker
Automatically convert documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes.
📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!
What is Skill Seeker?
Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready Claude AI skills. Instead of manually reading and summarizing documentation, Skill Seeker:
- Scrapes multiple sources (docs, GitHub repos, PDFs) automatically
- Analyzes code repositories with deep AST parsing
- Detects conflicts between documentation and code implementation
- Organizes content into categorized reference files
- Enhances with AI to extract best examples and key concepts
- Packages everything into an uploadable
.zipfile for Claude
Result: Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.
Why Use This?
- 🎯 For Developers: Create skills from documentation + GitHub repos with conflict detection
- 🎮 For Game Devs: Generate skills for game engines (Godot docs + GitHub, Unity, etc.)
- 🔧 For Teams: Combine internal docs + code repositories into single source of truth
- 📚 For Learners: Build comprehensive skills from docs, code examples, and PDFs
- 🔍 For Open Source: Analyze repos to find documentation gaps and outdated examples
Key Features
🌐 Documentation Scraping
- ✅ llms.txt Support - Automatically detects and uses LLM-ready documentation files (10x faster)
- ✅ Universal Scraper - Works with ANY documentation website
- ✅ Smart Categorization - Automatically organizes content by topic
- ✅ Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc.
- ✅ 8 Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more
📄 PDF Support (v1.2.0)
- ✅ Basic PDF Extraction - Extract text, code, and images from PDF files
- ✅ OCR for Scanned PDFs - Extract text from scanned documents
- ✅ Password-Protected PDFs - Handle encrypted PDFs
- ✅ Table Extraction - Extract complex tables from PDFs
- ✅ Parallel Processing - 3x faster for large PDFs
- ✅ Intelligent Caching - 50% faster on re-runs
🐙 GitHub Repository Scraping (v2.0.0)
- ✅ Deep Code Analysis - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
- ✅ API Extraction - Functions, classes, methods with parameters and types
- ✅ Repository Metadata - README, file tree, language breakdown, stars/forks
- ✅ GitHub Issues & PRs - Fetch open/closed issues with labels and milestones
- ✅ CHANGELOG & Releases - Automatically extract version history
- ✅ Conflict Detection - Compare documented APIs vs actual code implementation
- ✅ MCP Integration - Natural language: "Scrape GitHub repo facebook/react"
🔄 Unified Multi-Source Scraping (NEW - v2.0.0)
- ✅ Combine Multiple Sources - Mix documentation + GitHub + PDF in one skill
- ✅ Conflict Detection - Automatically finds discrepancies between docs and code
- ✅ Intelligent Merging - Rule-based or AI-powered conflict resolution
- ✅ Transparent Reporting - Side-by-side comparison with ⚠️ warnings
- ✅ Documentation Gap Analysis - Identifies outdated docs and undocumented features
- ✅ Single Source of Truth - One skill showing both intent (docs) and reality (code)
- ✅ Backward Compatible - Legacy single-source configs still work
🤖 Multi-LLM Platform Support (NEW - v2.5.0)
- ✅ 4 LLM Platforms - Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
- ✅ Universal Scraping - Same documentation works for all platforms
- ✅ Platform-Specific Packaging - Optimized formats for each LLM
- ✅ One-Command Export -
--targetflag selects platform - ✅ Optional Dependencies - Install only what you need
- ✅ 100% Backward Compatible - Existing Claude workflows unchanged
| Platform | Format | Upload | Enhancement | API Key |
|---|---|---|---|---|
| Claude AI | ZIP + YAML | ✅ Auto | ✅ Yes | ANTHROPIC_API_KEY |
| Google Gemini | tar.gz | ✅ Auto | ✅ Yes | GOOGLE_API_KEY |
| OpenAI ChatGPT | ZIP + Vector Store | ✅ Auto | ✅ Yes | OPENAI_API_KEY |
| Generic Markdown | ZIP | ❌ Manual | ❌ No | None |
# Claude (default - no changes needed!)
skill-seekers package output/react/
skill-seekers upload react.zip
# Google Gemini
pip install skill-seekers[gemini]
skill-seekers package output/react/ --target gemini
skill-seekers upload react-gemini.tar.gz --target gemini
# OpenAI ChatGPT
pip install skill-seekers[openai]
skill-seekers package output/react/ --target openai
skill-seekers upload react-openai.zip --target openai
# Generic Markdown (universal export)
skill-seekers package output/react/ --target markdown
# Use the markdown files directly in any LLM
Installation:
# Install with Gemini support
pip install skill-seekers[gemini]
# Install with OpenAI support
pip install skill-seekers[openai]
# Install with all LLM platforms
pip install skill-seekers[all-llms]
🌊 Three-Stream GitHub Architecture (NEW - v2.6.0)
- ✅ Triple-Stream Analysis - Split GitHub repos into Code, Docs, and Insights streams
- ✅ Unified Codebase Analyzer - Works with GitHub URLs AND local paths
- ✅ C3.x as Analysis Depth - Choose 'basic' (1-2 min) or 'c3x' (20-60 min) analysis
- ✅ Enhanced Router Generation - GitHub metadata, README quick start, common issues
- ✅ Issue Integration - Top problems and solutions from GitHub issues
- ✅ Smart Routing Keywords - GitHub labels weighted 2x for better topic detection
- ✅ 81 Tests Passing - Comprehensive E2E validation (0.44 seconds)
Three Streams Explained:
- Stream 1: Code - Deep C3.x analysis (patterns, examples, guides, configs, architecture)
- Stream 2: Docs - Repository documentation (README, CONTRIBUTING, docs/*.md)
- Stream 3: Insights - Community knowledge (issues, labels, stars, forks)
from skill_seekers.cli.unified_codebase_analyzer import UnifiedCodebaseAnalyzer
# Analyze GitHub repo with all three streams
analyzer = UnifiedCodebaseAnalyzer()
result = analyzer.analyze(
source="https://github.com/facebook/react",
depth="c3x", # or "basic" for fast analysis
fetch_github_metadata=True
)
# Access code stream (C3.x analysis)
print(f"Design patterns: {len(result.code_analysis['c3_1_patterns'])}")
print(f"Test examples: {result.code_analysis['c3_2_examples_count']}")
# Access docs stream (repository docs)
print(f"README: {result.github_docs['readme'][:100]}")
# Access insights stream (GitHub metadata)
print(f"Stars: {result.github_insights['metadata']['stars']}")
print(f"Common issues: {len(result.github_insights['common_problems'])}")
See complete documentation: Three-Stream Implementation Summary
🔐 Private Config Repositories (NEW - v2.2.0)
- ✅ Git-Based Config Sources - Fetch configs from private/team git repositories
- ✅ Multi-Source Management - Register unlimited GitHub, GitLab, Bitbucket repos
- ✅ Team Collaboration - Share custom configs across 3-5 person teams
- ✅ Enterprise Support - Scale to 500+ developers with priority-based resolution
- ✅ Secure Authentication - Environment variable tokens (GITHUB_TOKEN, GITLAB_TOKEN)
- ✅ Intelligent Caching - Clone once, pull updates automatically
- ✅ Offline Mode - Work with cached configs when offline
- ✅ Backward Compatible - Existing API-based configs still work
🤖 Codebase Analysis & AI Enhancement (C3.x - NEW!)
C3.4: Configuration Pattern Extraction with AI Enhancement
- ✅ 9 Config Formats - JSON, YAML, TOML, ENV, INI, Python, JavaScript, Dockerfile, Docker Compose
- ✅ 7 Pattern Types - Database, API, logging, cache, email, auth, server configurations
- ✅ AI Enhancement (NEW!) - Optional dual-mode AI analysis (API + LOCAL, like C3.3)
- Explains what each config does
- Suggests best practices and improvements
- Security analysis - Finds hardcoded secrets, exposed credentials
- Migration suggestions - Consolidation opportunities
- Context-aware documentation
- ✅ Auto-Documentation - Generates JSON + Markdown documentation of all configs
- ✅ Type Inference - Automatically detects setting types and environment variables
- ✅ MCP Integration -
extract_config_patternstool with enhancement support
C3.3: AI-Enhanced How-To Guides
- ✅ Comprehensive AI Enhancement - Transforms basic guides (⭐⭐) into professional tutorials (⭐⭐
... (内容已截断)
Related Skills
wshobson/agents
wshobsonIntelligent automation and multi-agent orchestration for Claude Code
The most comprehensive Claude Code plugin ecosystem, covering full-stack development scenarios with a three-tier model strategy balancing performance and cost.
ComposioHQ/awesome-claude-skills
ComposioHQA curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
The most comprehensive Claude Skills resource list; connect-apps is a killer feature.
code-yeongyu/oh-my-opencode
code-yeongyuThe Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.
Powerful multi-agent coding tool, but note OAuth limitations.
thedotmack/claude-mem
thedotmackA Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
A practical solution for Claude's memory issues.
OthmanAdi/planning-with-files
OthmanAdiClaude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.
Context engineering best practices; an open-source implementation of Manus mode.
K-Dense-AI/claude-scientific-skills
K-Dense-AIA set of ready to use scientific skills for Claude
Essential for researchers; used by top institutions like Stanford and MIT.


