yusufkaraaslan/Skill_Seekers

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

License:MITLanguage:Python6.8k683
ai-toolsast-parser自动化claude-aiclaude-skillscode-analysisconflict-detectiondocumentationdocumentation-generatorGitHubgithub-scrapermcpmcp-servermulti-sourceocrpdfPythonweb-scraping

Deep Analysis

Automatically converts documentation websites, GitHub repositories, and PDFs into Claude AI skills, completing in minutes what would take hours.

Highly Recommended

Core Features

Supports documentation websites, GitHub repositories, and PDF files

Auto-detects LLM-friendly documentation format for 10x acceleration

Deep code parsing, detects documentation and code conflicts

Extracts best examples and key concepts

Technical Implementation

Architecture:Crawler + AST Analysis + AI Enhancement + Packaging
Execution Flow:

Key Components:
Web Scraper
AST Parser
MCP
Highlights
  • Complete skill creation in 20-40 minutes instead of hours
  • Automatically detect conflicts between documentation and code implementation
  • llms.txt format support for 10x acceleration
  • Validated with 700+ test cases
Use Cases
  • Create Claude skills for any framework/API
  • Convert game engine docs to skills (Godot, Unity)
  • Merge internal docs + code repos into skills
  • Discover inconsistencies between documentation and code
Limitations
  • Requires Python 3.10+
  • Complex websites may need scraping configuration adjustments
Tech Stack
Python 3.10+MCPAST Parser

MseeP.ai Security Assessment Badge

Skill Seeker

Version
License: MIT
Python 3.10+
MCP Integration
Tested
Project Board
PyPI version
PyPI - Downloads
PyPI - Python Version

Automatically convert documentation websites, GitHub repositories, and PDFs into Claude AI skills in minutes.

📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!

What is Skill Seeker?

Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready Claude AI skills. Instead of manually reading and summarizing documentation, Skill Seeker:

  1. Scrapes multiple sources (docs, GitHub repos, PDFs) automatically
  2. Analyzes code repositories with deep AST parsing
  3. Detects conflicts between documentation and code implementation
  4. Organizes content into categorized reference files
  5. Enhances with AI to extract best examples and key concepts
  6. Packages everything into an uploadable .zip file for Claude

Result: Get comprehensive Claude skills for any framework, API, or tool in 20-40 minutes instead of hours of manual work.

Why Use This?

  • 🎯 For Developers: Create skills from documentation + GitHub repos with conflict detection
  • 🎮 For Game Devs: Generate skills for game engines (Godot docs + GitHub, Unity, etc.)
  • 🔧 For Teams: Combine internal docs + code repositories into single source of truth
  • 📚 For Learners: Build comprehensive skills from docs, code examples, and PDFs
  • 🔍 For Open Source: Analyze repos to find documentation gaps and outdated examples

Key Features

🌐 Documentation Scraping

  • llms.txt Support - Automatically detects and uses LLM-ready documentation files (10x faster)
  • Universal Scraper - Works with ANY documentation website
  • Smart Categorization - Automatically organizes content by topic
  • Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc.
  • 8 Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more

📄 PDF Support (v1.2.0)

  • Basic PDF Extraction - Extract text, code, and images from PDF files
  • OCR for Scanned PDFs - Extract text from scanned documents
  • Password-Protected PDFs - Handle encrypted PDFs
  • Table Extraction - Extract complex tables from PDFs
  • Parallel Processing - 3x faster for large PDFs
  • Intelligent Caching - 50% faster on re-runs

🐙 GitHub Repository Scraping (v2.0.0)

  • Deep Code Analysis - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
  • API Extraction - Functions, classes, methods with parameters and types
  • Repository Metadata - README, file tree, language breakdown, stars/forks
  • GitHub Issues & PRs - Fetch open/closed issues with labels and milestones
  • CHANGELOG & Releases - Automatically extract version history
  • Conflict Detection - Compare documented APIs vs actual code implementation
  • MCP Integration - Natural language: "Scrape GitHub repo facebook/react"

🔄 Unified Multi-Source Scraping (NEW - v2.0.0)

  • Combine Multiple Sources - Mix documentation + GitHub + PDF in one skill
  • Conflict Detection - Automatically finds discrepancies between docs and code
  • Intelligent Merging - Rule-based or AI-powered conflict resolution
  • Transparent Reporting - Side-by-side comparison with ⚠️ warnings
  • Documentation Gap Analysis - Identifies outdated docs and undocumented features
  • Single Source of Truth - One skill showing both intent (docs) and reality (code)
  • Backward Compatible - Legacy single-source configs still work

🤖 Multi-LLM Platform Support (NEW - v2.5.0)

  • 4 LLM Platforms - Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
  • Universal Scraping - Same documentation works for all platforms
  • Platform-Specific Packaging - Optimized formats for each LLM
  • One-Command Export - --target flag selects platform
  • Optional Dependencies - Install only what you need
  • 100% Backward Compatible - Existing Claude workflows unchanged
Platform Format Upload Enhancement API Key
Claude AI ZIP + YAML ✅ Auto ✅ Yes ANTHROPIC_API_KEY
Google Gemini tar.gz ✅ Auto ✅ Yes GOOGLE_API_KEY
OpenAI ChatGPT ZIP + Vector Store ✅ Auto ✅ Yes OPENAI_API_KEY
Generic Markdown ZIP ❌ Manual ❌ No None
# Claude (default - no changes needed!)
skill-seekers package output/react/
skill-seekers upload react.zip

# Google Gemini
pip install skill-seekers[gemini]
skill-seekers package output/react/ --target gemini
skill-seekers upload react-gemini.tar.gz --target gemini

# OpenAI ChatGPT
pip install skill-seekers[openai]
skill-seekers package output/react/ --target openai
skill-seekers upload react-openai.zip --target openai

# Generic Markdown (universal export)
skill-seekers package output/react/ --target markdown
# Use the markdown files directly in any LLM

Installation:

# Install with Gemini support
pip install skill-seekers[gemini]

# Install with OpenAI support
pip install skill-seekers[openai]

# Install with all LLM platforms
pip install skill-seekers[all-llms]

🌊 Three-Stream GitHub Architecture (NEW - v2.6.0)

  • Triple-Stream Analysis - Split GitHub repos into Code, Docs, and Insights streams
  • Unified Codebase Analyzer - Works with GitHub URLs AND local paths
  • C3.x as Analysis Depth - Choose 'basic' (1-2 min) or 'c3x' (20-60 min) analysis
  • Enhanced Router Generation - GitHub metadata, README quick start, common issues
  • Issue Integration - Top problems and solutions from GitHub issues
  • Smart Routing Keywords - GitHub labels weighted 2x for better topic detection
  • 81 Tests Passing - Comprehensive E2E validation (0.44 seconds)

Three Streams Explained:

  • Stream 1: Code - Deep C3.x analysis (patterns, examples, guides, configs, architecture)
  • Stream 2: Docs - Repository documentation (README, CONTRIBUTING, docs/*.md)
  • Stream 3: Insights - Community knowledge (issues, labels, stars, forks)
from skill_seekers.cli.unified_codebase_analyzer import UnifiedCodebaseAnalyzer

# Analyze GitHub repo with all three streams
analyzer = UnifiedCodebaseAnalyzer()
result = analyzer.analyze(
    source="https://github.com/facebook/react",
    depth="c3x",  # or "basic" for fast analysis
    fetch_github_metadata=True
)

# Access code stream (C3.x analysis)
print(f"Design patterns: {len(result.code_analysis['c3_1_patterns'])}")
print(f"Test examples: {result.code_analysis['c3_2_examples_count']}")

# Access docs stream (repository docs)
print(f"README: {result.github_docs['readme'][:100]}")

# Access insights stream (GitHub metadata)
print(f"Stars: {result.github_insights['metadata']['stars']}")
print(f"Common issues: {len(result.github_insights['common_problems'])}")

See complete documentation: Three-Stream Implementation Summary

🔐 Private Config Repositories (NEW - v2.2.0)

  • Git-Based Config Sources - Fetch configs from private/team git repositories
  • Multi-Source Management - Register unlimited GitHub, GitLab, Bitbucket repos
  • Team Collaboration - Share custom configs across 3-5 person teams
  • Enterprise Support - Scale to 500+ developers with priority-based resolution
  • Secure Authentication - Environment variable tokens (GITHUB_TOKEN, GITLAB_TOKEN)
  • Intelligent Caching - Clone once, pull updates automatically
  • Offline Mode - Work with cached configs when offline
  • Backward Compatible - Existing API-based configs still work

🤖 Codebase Analysis & AI Enhancement (C3.x - NEW!)

C3.4: Configuration Pattern Extraction with AI Enhancement

  • 9 Config Formats - JSON, YAML, TOML, ENV, INI, Python, JavaScript, Dockerfile, Docker Compose
  • 7 Pattern Types - Database, API, logging, cache, email, auth, server configurations
  • AI Enhancement (NEW!) - Optional dual-mode AI analysis (API + LOCAL, like C3.3)
    • Explains what each config does
    • Suggests best practices and improvements
    • Security analysis - Finds hardcoded secrets, exposed credentials
    • Migration suggestions - Consolidation opportunities
    • Context-aware documentation
  • Auto-Documentation - Generates JSON + Markdown documentation of all configs
  • Type Inference - Automatically detects setting types and environment variables
  • MCP Integration - extract_config_patterns tool with enhancement support

C3.3: AI-Enhanced How-To Guides

  • Comprehensive AI Enhancement - Transforms basic guides (⭐⭐) into professional tutorials (⭐⭐
    ... (内容已截断)
Highly Recommended
agents

wshobson/agents

wshobson

Intelligent automation and multi-agent orchestration for Claude Code

The most comprehensive Claude Code plugin ecosystem, covering full-stack development scenarios with a three-tier model strategy balancing performance and cost.

25.6k2.8k3 days ago
Highly Recommended
awesome-claude-skills

ComposioHQ/awesome-claude-skills

ComposioHQ

A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows

The most comprehensive Claude Skills resource list; connect-apps is a killer feature.

19.9k2.0k3 days ago
Recommended
oh-my-opencode

code-yeongyu/oh-my-opencode

code-yeongyu

The Best Agent Harness. Meet Sisyphus: The Batteries-Included Agent that codes like you.

Powerful multi-agent coding tool, but note OAuth limitations.

17.5k1.2k3 days ago
Recommended
claude-mem

thedotmack/claude-mem

thedotmack

A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

A practical solution for Claude's memory issues.

14.0k9143 days ago
Highly Recommended
planning-with-files

OthmanAdi/planning-with-files

OthmanAdi

Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.

Context engineering best practices; an open-source implementation of Manus mode.

9.3k8113 days ago
Highly Recommended
claude-scientific-skills

K-Dense-AI/claude-scientific-skills

K-Dense-AI

A set of ready to use scientific skills for Claude

Essential for researchers; used by top institutions like Stanford and MIT.

6.1k7393 days ago