Skip to content

Changelog

All notable changes to LLM Answer Watcher will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Planned

  • Additional browser runners (Claude, Gemini web UIs)
  • Enhanced cost tracking for browser runners
  • DeepEval integration for quality metrics
  • Trends command for historical analysis

[0.2.0] - 2025-11-08

Added - Major Features

  • 🌐 Browser Runners (BETA): Steel API integration for web-based LLM interfaces
  • ChatGPT web UI runner with session management
  • Perplexity web UI runner with citation extraction
  • Screenshot capture and HTML snapshot support
  • Session reuse for cost optimization
  • Plugin system for extensible browser automation
  • See Browser Runners Guide for details

  • ⚑ Async/Await Parallelization: 3-4x performance improvement

  • Parallel query execution across models
  • Async progress callbacks
  • RuntimeWarning fixes for async operations

  • πŸ” Google Search Grounding: Enhanced Gemini model support

  • Google Search grounding for Gemini models
  • Accurate web search cost calculation
  • Grounded responses with citations

  • 🎯 Post-Intent Operations: Dynamic workflow support

  • Configurable operations to run after each intent
  • Operation models with validation
  • Config filename tracking in reports
  • Model capability detection

  • πŸ“Š Advanced Analysis Features:

  • Sentiment Analysis: Analyze tone (positive/neutral/negative) and context of each brand mention
  • Intent Classification: Classify user queries by intent type, buyer journey stage, and urgency signals
    • Intent types: transactional, informational, navigational, commercial_investigation
    • Buyer stages: awareness, consideration, decision
    • Urgency signals: high, medium, low
    • Confidence scoring and reasoning explanations
  • Brand visibility score in reports
  • HTML report filtering and web search badges

  • πŸ“š Documentation Expansion:

  • Comprehensive MkDocs documentation with Material theme (60+ pages)
  • Browser runners guide with Steel integration
  • Google Search grounding documentation
  • 44 example configurations across 8 directories

Added - Database & Storage

  • New database tables and columns for sentiment and intent data
  • mentions table: sentiment and mention_context columns
  • intent_classifications table with query hash caching
  • 5 new indexes for filtering by sentiment, context, intent type, buyer stage, and urgency
  • SQLite schema version 5 (migration support included)

Added - Configuration

  • Configuration options: enable_sentiment_analysis and enable_intent_classification (both default true)
  • Runner plugin configuration system
  • Browser runner specific settings (Steel API, screenshots, sessions)

Changed

  • Breaking: Configuration format updated to support runner plugins
  • Improved test coverage to 100% for core modules
  • Enhanced error messages for better debugging
  • Function calling extraction schema expanded with sentiment/context fields
  • Correct Responses API format with required type field
  • Improved validation, error handling, and config validation

Fixed

  • Database schema mismatches and exception handling in CLI
  • Rank display in HTML reports (shows actual positions not match positions)
  • GPT-4.1 model support in OpenAI client
  • Code review findings (validation, error handling, config)
  • RuntimeWarnings for async operations
  • Indentation in runner loop to process all models

Cost Impact

  • Intent classification: ~$0.00012 per query (one-time per unique query, cached)
  • Sentiment extraction: ~33% increase per extraction call (integrated into function calling)
  • Browser runners: $0.10-0.30/hour via Steel (not yet tracked in cost estimates)

Known Limitations (v0.2.0)

  • Browser runner cost tracking returns $0.00 (placeholder - actual Steel costs not calculated)
  • Browser runners are BETA quality (added Nov 8, 2025)
  • CSS selectors for browser runners may break if web UIs change
  • No authentication handling documented for ChatGPT login
  • Response completion detection is heuristic-based

[0.1.0] - 2025-11-05

Added

  • Initial release of LLM Answer Watcher
  • Multi-provider support: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, Perplexity
  • Brand mention detection with word-boundary matching
  • Rank extraction (pattern-based and LLM-assisted)
  • SQLite database for historical tracking
  • HTML report generation with Jinja2
  • Dual-mode CLI (human-friendly Rich output, structured JSON for automation)
  • Budget controls and cost estimation
  • Dynamic pricing from llm-prices.com with 24-hour caching
  • Web search cost calculation for OpenAI models
  • Retry logic with exponential backoff
  • Evaluation framework for extraction accuracy
  • Configuration validation with Pydantic
  • Exit codes for automation (0-4)
  • Example configurations
  • Comprehensive test suite (750+ tests)
  • GitHub Actions CI/CD pipeline

Core Modules

  • config/: YAML loading and Pydantic validation
  • llm_runner/: Multi-provider LLM client abstraction
  • extractor/: Brand mention detection and rank extraction
  • storage/: SQLite schema and JSON writers
  • report/: HTML report generation
  • utils/: Time utilities, logging, cost estimation, Rich console
  • evals/: Evaluation framework

Supported Models

  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
  • Anthropic: claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus
  • Mistral: mistral-large-latest, mistral-small-latest
  • X.AI: grok-beta, grok-2-1212, grok-2-latest, grok-3
  • Google: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash
  • Perplexity: sonar, sonar-pro, sonar-reasoning

Documentation

  • README with quick start and examples
  • CLAUDE.md with development guidelines
  • CONTRIBUTING.md with contribution guidelines
  • SPECS.md with complete engineering specification
  • TODO.md with milestone tracking

Security

  • Environment variable-based API key management
  • SQL injection prevention (parameterized queries)
  • XSS prevention (Jinja2 autoescaping)
  • No API key logging

Release Notes

Version 0.1.0 - Production Ready

This is the first production-ready release of LLM Answer Watcher. The tool is feature-complete for core brand monitoring use cases:

Highlights: - βœ… 8,200 lines of production Python code - βœ… 17,400 lines of test code (750+ tests) - βœ… 100% coverage on critical paths - βœ… 6 LLM providers supported - βœ… Complete evaluation framework - βœ… Full documentation

What's Working: - All core features tested and validated - Multi-provider queries with retry logic - Accurate brand mention detection (90%+ precision) - Historical tracking in SQLite - Professional HTML reports - Budget protection - CI/CD integration

Known Limitations (v0.1.0 - resolved in v0.2.0): - No async support (intentionally - keeping it simple) - ADDED in v0.2.0 - Web search only for OpenAI models - Google Search grounding added in v0.2.0 - Perplexity request fees not yet in cost estimates - Trends command not yet implemented (data collection works)

Upgrade Notes: - This is the initial release - no upgrades needed - SQLite schema version 1 - Configuration format stable

Future Roadmap

Planned Features

v0.2.0 - βœ… RELEASED 2025-11-08: - βœ… Async support for parallel queries (3-4x faster) - βœ… Enhanced web search support (Google Search grounding) - βœ… Browser runners (BETA) - ⏳ trends command for historical analysis (moved to v0.3.0) - ⏳ Dashboard UI for visualizing trends (moved to v0.3.0) - ⏳ DeepEval integration for quality metrics (moved to v0.3.0)

v0.3.0 (Q1 2025): - trends command for historical analysis - Dashboard UI for visualizing trends - DeepEval integration for quality metrics - Production-ready browser runners (cost tracking, authentication) - Additional browser runners (Claude, Gemini web UIs) - Cloud deployment option - HTTP API (expose internal contract) - Real-time alerts and webhooks - Advanced analytics and insights - Multi-user support

v1.0.0 (Q3 2025): - Enterprise features - Advanced provider integrations - Custom model support - White-label options - SaaS offering

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.