This guide provides step-by-step instructions for setting up an automated workflow to scan your GitHub starred repositories, generate AI-powered descriptions, and organize them with relevant keywords and use cases.
Overview
The Starred Repository Scanner helps you:
- Fetch all repositories you’ve starred on GitHub
- Generate concise, AI-enhanced descriptions for each repository
- Extract keywords and potential use cases
- Organize repositories into categories
- Output structured data for easy filtering and discovery
Prerequisites
Before starting, ensure you have:
- GitHub Account: With starred repositories to analyze
- API Access: Either GitHub MCP server or GitHub Personal Access Token
- AI Tool Access: GitHub Copilot, Claude, or similar AI assistant
- Python 3.8+: For running the scanner script (optional)
- Git: For cloning and version control
Method 1: Using GitHub MCP Server (Recommended)
Step 1: List Your Starred Repositories
Use the github-mcp-server-list_starred_repositories tool to fetch your starred repositories:
// Fetch starred repositories with pagination
const stars = await list_starred_repositories({
perPage: 100,
page: 1,
sort: "updated", // Sort by most recently updated
direction: "desc"
});
Parameters:
username(optional): Target username (defaults to authenticated user)perPage(optional): Results per page (max 100)page(optional): Page number for paginationsort(optional): Sort by “created” or “updated”direction(optional): “asc” or “desc”
Step 2: Extract Repository Metadata
For each starred repository, collect:
- Repository name and owner
- Description
- Primary language
- Topics/tags
- Star count
- Last update timestamp
Step 3: Fetch Detailed Information
For repositories that need deeper analysis:
// Get repository README
const readme = await get_file_contents({
owner: repo.owner,
repo: repo.name,
path: "README.md"
});
// Get repository topics
const repoDetails = await get_repository({
owner: repo.owner,
repo: repo.name
});
Step 4: Generate AI Analysis
Use the Repository Analyzer prompt to analyze each repository:
- Prepare repository information:
Repository: owner/repo-name Description: [from GitHub] Language: [primary language] Topics: [comma-separated topics] Stars: [star count] README Summary: [first few sections] -
Submit to AI with the Repository Analyzer prompt
- Receive structured JSON output with:
- Enhanced description
- Keywords/tags
- Use cases
- Classification
- Integration opportunities
Step 5: Aggregate and Store Results
Compile all analyzed repositories into a structured format:
{
"scan_date": "2024-01-15T10:30:00Z",
"total_repositories": 150,
"repositories": [
{
"repository": "owner/repo-name",
"github_url": "https://github.com/owner/repo-name",
"stars": 1234,
"language": "Python",
"topics": ["machine-learning", "nlp"],
"ai_description": "...",
"keywords": ["..."],
"use_cases": [...],
"classification": {...}
}
]
}
Method 2: Using GitHub REST API
Step 1: Generate Personal Access Token
- Go to GitHub Settings → Developer settings → Personal access tokens
- Generate new token with
public_reposcope - Save token securely
Step 2: Fetch Starred Repositories
# Using curl
curl -H "Authorization: token YOUR_TOKEN" \
https://api.github.com/user/starred?per_page=100&page=1
# Using Python
import requests
headers = {"Authorization": f"token {GITHUB_TOKEN}"}
response = requests.get(
"https://api.github.com/user/starred",
headers=headers,
params={"per_page": 100, "page": 1}
)
stars = response.json()
Step 3: Process Each Repository
For each repository in the response:
- Extract metadata
- Fetch README content (if needed)
- Apply AI analysis
- Store results
Output Formats
JSON Format (Recommended)
Best for programmatic access and filtering:
{
"repositories": [...]
}
Markdown Table Format
Best for human-readable summaries:
| Repository | Description | Keywords | Use Cases | Category |
|------------|-------------|----------|-----------|----------|
| owner/repo | AI-generated description | tag1, tag2 | Use case summary | Category |
CSV Format
Best for spreadsheet analysis:
repository,description,keywords,primary_category,stars,language
owner/repo,"Description","tag1,tag2",Category,1234,Python
Automation Options
Option 1: GitHub Actions Workflow
Create .github/workflows/scan-starred-repos.yml:
name: Scan Starred Repositories
on:
schedule:
- cron: '0 0 * * 0' # Weekly on Sunday
workflow_dispatch: # Manual trigger
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install requests
- name: Scan starred repositories
env:
GITHUB_TOKEN: $
run: python scripts/scan_starred_repos.py
- name: Commit results
run: |
git config user.name github-actions
git config user.email github-actions@github.com
git add data/starred-repos.json
git commit -m "Update starred repositories scan"
git push
Option 2: Local Script
Run the scanner manually on your local machine:
# Install dependencies
pip install requests
# Run scanner
python scripts/scan_starred_repos.py --output data/starred-repos.json
# Review results
cat data/starred-repos.json | jq '.repositories[0]'
Option 3: Interactive Analysis
Use AI assistant interactively:
- Manually fetch starred repositories list
- Select repositories to analyze
- Use Repository Analyzer prompt for each
- Compile results manually or with simple script
Best Practices
Efficiency
- Batch Processing: Analyze multiple repositories in one session
- Caching: Save intermediate results to avoid re-fetching
- Rate Limiting: Respect GitHub API rate limits (5000 req/hour for authenticated)
- Incremental Updates: Only analyze new stars since last scan
Quality
- Verify AI Output: Review AI-generated descriptions for accuracy
- Manual Review: Flag repositories that need human verification
- Consistent Format: Use structured JSON for easy parsing
- Metadata Preservation: Keep original GitHub data alongside AI analysis
Organization
- Categorization: Group repositories by primary category
- Tagging: Use consistent keyword taxonomy
- Priority: Mark high-priority repositories to explore
- Notes: Add personal notes and learning goals
Use Cases
Personal Knowledge Management
- Build a searchable database of tools and libraries
- Track technologies you want to learn
- Document how you’ve used specific repositories
Project Planning
- Identify tools for new projects
- Find alternatives to existing solutions
- Discover integration opportunities
Learning and Development
- Organize learning resources by topic
- Track progress through technologies
- Build curriculum from starred repositories
Team Collaboration
- Share curated tool recommendations
- Document team’s technology stack
- Onboard new team members with organized resources
Troubleshooting
API Rate Limits
Problem: Hitting GitHub API rate limits
Solution: Use authentication, reduce frequency, or implement exponential backoff
Large Repository Sets
Problem: Too many repositories to process
Solution: Filter by language, recency, or star count; process in batches
AI Context Limits
Problem: README too large for AI context
Solution: Extract only key sections (first few paragraphs, features, usage)
Inconsistent Results
Problem: AI generates different formats
Solution: Use structured prompt with explicit JSON schema requirement
Future Enhancements
Planned improvements for this workflow:
- Advanced Clustering: Group similar repositories using ML
- Trend Analysis: Identify trending topics in your stars
- Duplicate Detection: Find similar/alternative tools
- Quality Scoring: Rank repositories by various metrics
- Export Formats: Additional output formats (HTML, PDF, etc.)
- Web Interface: Browser-based visualization and filtering
Related Resources
Ready to start scanning?
- Choose your method (MCP Server or REST API)
- Prepare your environment
- Run the scanner
- Review and organize results
- Set up automation for regular updates