Overview
Intent Crawler is a Python tool that crawls websites and interprets and analyzes the intents conveyed to LLMs and AI agents on your exisiting website. Part of the Airbais AI Tools Suite, Intent Crawler extracts content, discovers intents dynamically using multiple ML techniques, and provides results and recommendations on an interactive dashboard. This let’s you review what your site is saying to AI and identify what needs adjusting.
Key Features
Intelligent Web Crawling
Respectful crawling with robots.txt compliance, automatic sitemap discovery, and configurable rate limiting
Advanced Intent Discovery
User-focused analysis (default), plus LDA topic modeling, sentence embeddings, and clustering
Modern Dashboard
A web dashboard with all the results from your evaluation. Suppors light/dark mode and responsive layout
Structured Exports
Outputs in llmstxt format and JSON for in depth LLM evaluation
Getting Started
Installation
1
Clone the repository
2
Install dependencies
3
Run your first analysis
Quick Examples
- Basic Usage
- With Dashboard
- View Past Results
- List All Results
Configuration
Customize the tool’s behavior throughconfig.yaml
:
Crawler Settings
Crawler Settings
Control how the tool crawls websites:
Intent Extraction
Intent Extraction
Configure the ML-powered intent discovery:
Output Organization
Output Organization
Configure how results are stored:
How It Works
Intent Discovery Process
1
Content Extraction
Crawls website pages and extracts clean, structured content
2
Text Preprocessing
Removes noise, normalizes text, and prepares for analysis
3
Feature Extraction
- TF-IDF: Identifies important keywords
- Embeddings: Captures semantic meaning
- N-grams: Detects meaningful phrases
4
Intent Clustering
- LDA: Discovers latent topics
- DBSCAN: Groups semantically similar content
- Keywords: Matches known patterns
5
Intent Merging
Combines similar intents based on configurable similarity threshold
6
Naming & Scoring
Automatically generates descriptive intent names and confidence scores
ML Techniques Explained
LDA Topic Modeling
Discovers latent topics across all content with configurable topic counts
Embedding Clustering
Uses sentence transformers and DBSCAN for semantic understanding
Keyword Fallback
Configurable keywords ensure baseline intent detection
Output Structure
Results are organized by date for easy historical tracking:Example Output
Dashboard Features
Two dashboard options: Local tool-specific dashboard and Master multi-tool dashboard
Modern Design System
Airbais Design
Professional orange/gray color scheme with Inter font family
Light/Dark Mode
Toggle themes with persistent user preferences
Responsive Layout
Works perfectly on desktop and mobile devices
Fast Performance
Optimized loading and smooth interactions
- Overview Stats
- Interactive Charts
- Dashboard Options
- Export Features
- Total Pages: Number of pages analyzed
- Discovered Intents: Count of unique user intents
- Site Sections: Structural breakdown of the website
- Confidence Indicators: Visual quality scores
Command Line Reference
The website URL to analyze
Path to custom configuration file
Override default output directory
Set logging level: DEBUG, INFO, WARNING, ERROR
Launch dashboard after analysis completes
View existing results without running analysis
View results from specific date (YYYY-MM-DD)
List all available result dates
Performance Guidelines
Processing time increases with site size and enabled ML features
Small Sites (<100 pages)
Small Sites (<100 pages)
All features work well with default settings
Medium Sites (100-500 pages)
Medium Sites (100-500 pages)
Consider reducing LDA topics for faster processing
Large Sites (500-1000 pages)
Large Sites (500-1000 pages)
May need to disable embeddings or increase rate limiting
Troubleshooting
No intents discovered
No intents discovered
- Check minimum cluster size in config
- Ensure content has sufficient text
- Try enabling fallback keywords
Slow processing
Slow processing
- Reduce number of LDA topics
- Disable embedding extraction
- Increase rate limit delay
Dashboard not loading
Dashboard not loading
- Check if results exist in date folder
- Verify dashboard-data.json is present
- Check console for port conflicts
Requirements
System Requirements
- Python 3.8+
- See
requirements.txt
for full dependency list - Optional: GPU for faster embeddings processing
AI Tools Suite Integration
IntentCrawler is part of the larger Airbais AI Tools Suite with centralized dashboard
Master Dashboard
Centralized view of all AI tool results at
../dashboard/
Auto-Discovery
New tools are automatically detected and integrated
Standard Format
JSON output compatible with other suite tools
Consistent Design
Shared Airbais design system across all tools
Master Dashboard Benefits
- Multi-Tool View: See results from all AI tools in one interface
- Tool Selection: Dropdown to choose between different analysis tools
- Date Selection: Browse historical results across all tools
- Future-Ready: Architecture designed for easy tool addition
Future Roadmap
1
Multi-language Support
Expand beyond English content analysis
2
Real-time Tracking
Monitor intent changes over time with the master dashboard
3
A/B Testing
Compare intents across different site versions
4
Suite Expansion
Add sentiment analysis, performance monitoring, and SEO tools
5
API Access
Programmatic access to all suite tools through unified API
Contributing
We welcome contributions in these key areas:Algorithms
Additional clustering algorithms and ML techniques
Visualization
Enhanced dashboard features and data visualization
Performance
Optimization for large-scale websites
Integrations
CMS plugins and third-party tool connections