Automatically generate LLMS.txt files for any website to help Large Language Models better understand and reference your content
LLMS.txt is a proposed standard that is intended to aid in context engineering for website content. The LLMS.txt Generator crawls websites and creates standardized LLMS.txt files following the specification at llmstxt.org.
Automatically detects website structure and content categories for any type of site
Intelligently discovers content sections from URL patterns rather than using hardcoded templates
Generates .txt, .md, and .json versions with comprehensive metadata
Automatically integrates with the master dashboard for visual analysis and reporting
Clone the repository
Install dependencies
Set up API keys (optional)
For AI-enhanced descriptions:
Run your first generation
Customize the tool’s behavior through config.yaml
:
Website Settings
Configure target website and basic information:
Generation Settings
Control crawling behavior and section detection:
Content Analysis
Configure AI-powered content analysis:
Output Options
Control output formats and organization:
Website Discovery
Crawls the target website respecting robots.txt and following links systematically
Dynamic Section Detection
Analyzes URL patterns to automatically discover content categories and sections
Content Extraction
Extracts titles, descriptions, and metadata from each discovered page
Intelligent Categorization
Groups pages by detected sections with configurable filtering and limits
LLMS.txt Generation
Creates structured output following the official LLMS.txt specification
Multi-Format Export
Generates .txt, .md, and .json versions with comprehensive metadata
Dashboard Integration
Creates dashboard-compatible data for visual analysis and reporting
Unlike template-based approaches, the tool discovers sections directly from your website’s URL structure
Analyzes URL paths to identify natural content groupings and hierarchies
Ignores common URL segments like IDs, pagination, and utility pages
Works with any website type: documentation, e-commerce, blogs, or corporate sites
Results are organized by date for easy historical tracking:
Two dashboard options: Local tool-specific dashboard and Master multi-tool dashboard
Success rates, pages crawled, and sections discovered with key metrics
Interactive visualization of discovered content categories and sections
Breakdown of pages by section with detailed statistics
Current settings and parameters used for generation
Website URL to generate LLMS.txt for (optional if using config file)
Configuration file path (default: config.yaml)
Website/project name (overrides auto-detection)
Website/project description (overrides auto-detection)
Maximum number of pages to crawl
Maximum crawl depth
Disable AI-generated descriptions
Output directory for results
Launch dashboard after generation
Launch dashboard without running generation
Enable verbose logging
Show program version number
Result: Sections like shop
, sale
, help
, company
automatically detected from URL patterns like /shop/
, /sale/
, /help/
, /company/
Success Rate: 47% with 7 sections discovered from 100 pages
Processing time scales with website size and complexity
Small Sites (<50 pages)
Medium Sites (50-200 pages)
min_pages_per_section
to 3-5Large Sites (200+ pages)
max_pages: 100-200
to limit scopeLow Success Rate
Symptoms: Less than 20% success rate or very few sections detected
Solutions:
min_pages_per_section
in configmax_depth
for deeper crawlingNo Pages Found
Symptoms: “No pages crawled” or empty results
Solutions:
user_agent
in crawling configurationAI Descriptions Not Working
Symptoms: Generic descriptions despite enabling AI
Solutions:
echo $OPENAI_API_KEY
--no-ai
flag to disable and test basic functionalityDashboard Not Loading
Symptoms: Dashboard shows no data or fails to start
Solutions:
dashboard-data.json
exists in results directorypython llmstxtgenerator.py --dashboard-only
Control which URLs to include or exclude:
Fine-tune automatic section detection:
For faster generation without AI descriptions:
Key Python packages (automatically installed):
requests
- Web crawling and HTTP requestsbeautifulsoup4
- HTML parsing and content extractionpyyaml
- Configuration file handlingopenai
- AI-powered descriptions (optional)anthropic
- Alternative AI provider (optional)plotly
- Dashboard visualizationpandas
- Data processing and analysisLLMS.txt Generator is part of the larger Airbais AI Tools Suite with centralized dashboard
Centralized view of all AI tool results at ../dashboard/
New tools are automatically detected and integrated
JSON output compatible with other suite tools
Shared Airbais design system across all tools
We welcome contributions in these key areas:
Improved algorithms for automatic content categorization
Better handling of complex website structures and content types
Additional export formats and integration capabilities
Optimization for large-scale websites and faster processing
Enhanced AI Integration
Better content summarization and automatic description generation
Real-time Updates
Monitor websites for changes and automatically update LLMS.txt files
API Integration
Direct integration with popular CMS platforms and website builders
Multi-language Support
Generate LLMS.txt files for international websites with language detection
Advanced Analytics
Content quality scoring and SEO optimization recommendations
Part of the Airbais AI Tools Suite - Comprehensive tools for AI-powered business intelligence and content optimization.