Changelog¶
All notable changes to LLM Answer Watcher will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Planned¶
- Additional browser runners (Claude, Gemini web UIs)
- Enhanced cost tracking for browser runners
- DeepEval integration for quality metrics
- Trends command for historical analysis
[0.2.0] - 2025-11-08¶
Added - Major Features¶
- π Browser Runners (BETA): Steel API integration for web-based LLM interfaces
- ChatGPT web UI runner with session management
- Perplexity web UI runner with citation extraction
- Screenshot capture and HTML snapshot support
- Session reuse for cost optimization
- Plugin system for extensible browser automation
-
See Browser Runners Guide for details
-
β‘ Async/Await Parallelization: 3-4x performance improvement
- Parallel query execution across models
- Async progress callbacks
-
RuntimeWarning fixes for async operations
-
π Google Search Grounding: Enhanced Gemini model support
- Google Search grounding for Gemini models
- Accurate web search cost calculation
-
Grounded responses with citations
-
π― Post-Intent Operations: Dynamic workflow support
- Configurable operations to run after each intent
- Operation models with validation
- Config filename tracking in reports
-
Model capability detection
-
π Advanced Analysis Features:
- Sentiment Analysis: Analyze tone (positive/neutral/negative) and context of each brand mention
- Intent Classification: Classify user queries by intent type, buyer journey stage, and urgency signals
- Intent types: transactional, informational, navigational, commercial_investigation
- Buyer stages: awareness, consideration, decision
- Urgency signals: high, medium, low
- Confidence scoring and reasoning explanations
- Brand visibility score in reports
-
HTML report filtering and web search badges
-
π Documentation Expansion:
- Comprehensive MkDocs documentation with Material theme (60+ pages)
- Browser runners guide with Steel integration
- Google Search grounding documentation
- 44 example configurations across 8 directories
Added - Database & Storage¶
- New database tables and columns for sentiment and intent data
mentionstable:sentimentandmention_contextcolumnsintent_classificationstable with query hash caching- 5 new indexes for filtering by sentiment, context, intent type, buyer stage, and urgency
- SQLite schema version 5 (migration support included)
Added - Configuration¶
- Configuration options:
enable_sentiment_analysisandenable_intent_classification(both default true) - Runner plugin configuration system
- Browser runner specific settings (Steel API, screenshots, sessions)
Changed¶
- Breaking: Configuration format updated to support runner plugins
- Improved test coverage to 100% for core modules
- Enhanced error messages for better debugging
- Function calling extraction schema expanded with sentiment/context fields
- Correct Responses API format with required type field
- Improved validation, error handling, and config validation
Fixed¶
- Database schema mismatches and exception handling in CLI
- Rank display in HTML reports (shows actual positions not match positions)
- GPT-4.1 model support in OpenAI client
- Code review findings (validation, error handling, config)
- RuntimeWarnings for async operations
- Indentation in runner loop to process all models
Cost Impact¶
- Intent classification: ~$0.00012 per query (one-time per unique query, cached)
- Sentiment extraction: ~33% increase per extraction call (integrated into function calling)
- Browser runners: $0.10-0.30/hour via Steel (not yet tracked in cost estimates)
Known Limitations (v0.2.0)¶
- Browser runner cost tracking returns $0.00 (placeholder - actual Steel costs not calculated)
- Browser runners are BETA quality (added Nov 8, 2025)
- CSS selectors for browser runners may break if web UIs change
- No authentication handling documented for ChatGPT login
- Response completion detection is heuristic-based
[0.1.0] - 2025-11-05¶
Added¶
- Initial release of LLM Answer Watcher
- Multi-provider support: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, Perplexity
- Brand mention detection with word-boundary matching
- Rank extraction (pattern-based and LLM-assisted)
- SQLite database for historical tracking
- HTML report generation with Jinja2
- Dual-mode CLI (human-friendly Rich output, structured JSON for automation)
- Budget controls and cost estimation
- Dynamic pricing from llm-prices.com with 24-hour caching
- Web search cost calculation for OpenAI models
- Retry logic with exponential backoff
- Evaluation framework for extraction accuracy
- Configuration validation with Pydantic
- Exit codes for automation (0-4)
- Example configurations
- Comprehensive test suite (750+ tests)
- GitHub Actions CI/CD pipeline
Core Modules¶
config/: YAML loading and Pydantic validationllm_runner/: Multi-provider LLM client abstractionextractor/: Brand mention detection and rank extractionstorage/: SQLite schema and JSON writersreport/: HTML report generationutils/: Time utilities, logging, cost estimation, Rich consoleevals/: Evaluation framework
Supported Models¶
- OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
- Anthropic: claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus
- Mistral: mistral-large-latest, mistral-small-latest
- X.AI: grok-beta, grok-2-1212, grok-2-latest, grok-3
- Google: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash
- Perplexity: sonar, sonar-pro, sonar-reasoning
Documentation¶
- README with quick start and examples
- CLAUDE.md with development guidelines
- CONTRIBUTING.md with contribution guidelines
- SPECS.md with complete engineering specification
- TODO.md with milestone tracking
Security¶
- Environment variable-based API key management
- SQL injection prevention (parameterized queries)
- XSS prevention (Jinja2 autoescaping)
- No API key logging
Release Notes¶
Version 0.1.0 - Production Ready¶
This is the first production-ready release of LLM Answer Watcher. The tool is feature-complete for core brand monitoring use cases:
Highlights: - β 8,200 lines of production Python code - β 17,400 lines of test code (750+ tests) - β 100% coverage on critical paths - β 6 LLM providers supported - β Complete evaluation framework - β Full documentation
What's Working: - All core features tested and validated - Multi-provider queries with retry logic - Accurate brand mention detection (90%+ precision) - Historical tracking in SQLite - Professional HTML reports - Budget protection - CI/CD integration
Known Limitations (v0.1.0 - resolved in v0.2.0):
- No async support (intentionally - keeping it simple) - ADDED in v0.2.0
- Web search only for OpenAI models - Google Search grounding added in v0.2.0
- Perplexity request fees not yet in cost estimates
- Trends command not yet implemented (data collection works)
Upgrade Notes: - This is the initial release - no upgrades needed - SQLite schema version 1 - Configuration format stable
Future Roadmap¶
Planned Features¶
v0.2.0 - β
RELEASED 2025-11-08:
- β
Async support for parallel queries (3-4x faster)
- β
Enhanced web search support (Google Search grounding)
- β
Browser runners (BETA)
- β³ trends command for historical analysis (moved to v0.3.0)
- β³ Dashboard UI for visualizing trends (moved to v0.3.0)
- β³ DeepEval integration for quality metrics (moved to v0.3.0)
v0.3.0 (Q1 2025):
- trends command for historical analysis
- Dashboard UI for visualizing trends
- DeepEval integration for quality metrics
- Production-ready browser runners (cost tracking, authentication)
- Additional browser runners (Claude, Gemini web UIs)
- Cloud deployment option
- HTTP API (expose internal contract)
- Real-time alerts and webhooks
- Advanced analytics and insights
- Multi-user support
v1.0.0 (Q3 2025): - Enterprise features - Advanced provider integrations - Custom model support - White-label options - SaaS offering
Contributing¶
We welcome contributions! See CONTRIBUTING.md for guidelines.
Links¶
- Repository: github.com/nibzard/llm-answer-watcher
- Issues: github.com/nibzard/llm-answer-watcher/issues
- Documentation: This site
- License: MIT