# LLM Answer Watcher > Monitor how Large Language Models talk about your brand versus competitors in buyer-intent queries Production-ready CLI tool that monitors how large language models mention brands versus competitors in buyer-intent queries. Key Features: - Multi-provider support: OpenAI, Anthropic, Mistral, Grok, Google Gemini, Perplexity - Local-first SQLite storage with historical tracking - Dual-mode CLI: Beautiful Rich output for humans, structured JSON for AI agents - BYOK model: Bring Your Own API Keys - Word-boundary brand detection to prevent false positives - Cost management and budget controls - HTML reports with Jinja2 - Evaluation framework with metrics # Quick Start # Installation This guide covers all installation methods for LLM Answer Watcher. ## System Requirements ### Python Version LLM Answer Watcher requires **Python 3.12 or 3.13**. It uses modern Python features including: - Native union type syntax (`|` instead of `Union`) - Improved type hints - Performance optimizations Check your Python version: ```bash python3 --version # Should output: Python 3.12.x or Python 3.13.x ``` ### Installing Python 3.12+ ```bash # Using Homebrew brew install python@3.12 # Verify installation python3.12 --version ``` ```bash # Add deadsnakes PPA sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update # Install Python 3.12 sudo apt install python3.12 python3.12-venv python3.12-dev # Verify installation python3.12 --version ``` Download Python 3.12 from [python.org](https://www.python.org/downloads/) During installation: - βœ… Check "Add Python to PATH" - βœ… Check "Install pip" Verify installation: ```text python --version ``` ## Installation Methods ### Method 1: uv (Recommended) [uv](https://github.com/astral-sh/uv) is a fast, modern Python package installer written in Rust. It's significantly faster than pip and handles virtual environments automatically. #### Install uv ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` ```powershell powershell -c "irm https://astral.sh/uv/install.ps1 | iex" ``` ```bash pip install uv ``` #### Install LLM Answer Watcher ```bash # Clone the repository git clone https://github.com/nibzard/llm-answer-watcher.git cd llm-answer-watcher # Install all dependencies (creates .venv automatically) uv sync # For development with extra dependencies uv sync --dev ``` #### Activate Virtual Environment uv creates a `.venv` directory automatically. You can optionally activate it: ```bash # macOS/Linux source .venv/bin/activate # Windows .venv\Scripts\activate ``` uv handles activation automatically When you run `uv run llm-answer-watcher`, uv automatically uses the virtual environment. Explicit activation is optional. ### Method 2: pip Traditional pip installation with manual virtual environment management. ```bash # Clone the repository git clone https://github.com/nibzard/llm-answer-watcher.git cd llm-answer-watcher # Create virtual environment python3.12 -m venv .venv # Activate virtual environment source .venv/bin/activate # macOS/Linux # or .venv\Scripts\activate # Windows # Install package in editable mode pip install -e . # For development with extra dependencies pip install -e ".[dev]" ``` ### Method 3: PyPI (Coming Soon) Once published to PyPI, you'll be able to install directly: ```bash # Future installation method pip install llm-answer-watcher ``` ## Verify Installation Check that the installation was successful: ```bash llm-answer-watcher --version ``` You should see output like: ```text llm-answer-watcher version 0.1.0 ``` Test the CLI help: ```bash llm-answer-watcher --help ``` ## API Keys Setup LLM Answer Watcher requires API keys from LLM providers. You need at least one provider configured. ### Supported Providers | Provider | Environment Variable | Get API Key | | ------------------- | -------------------- | ------------------------------------------------------------------------ | | **OpenAI** | `OPENAI_API_KEY` | [platform.openai.com](https://platform.openai.com/api-keys) | | **Anthropic** | `ANTHROPIC_API_KEY` | [console.anthropic.com](https://console.anthropic.com/) | | **Mistral** | `MISTRAL_API_KEY` | [console.mistral.ai](https://console.mistral.ai/) | | **X.AI (Grok)** | `XAI_API_KEY` | [x.ai/api](https://x.ai/api) | | **Google (Gemini)** | `GOOGLE_API_KEY` | [aistudio.google.com](https://aistudio.google.com/app/apikey) | | **Perplexity** | `PERPLEXITY_API_KEY` | [www.perplexity.ai/settings/api](https://www.perplexity.ai/settings/api) | ### Setting API Keys #### Temporary (Current Session) ```bash export OPENAI_API_KEY=sk-your-key-here export ANTHROPIC_API_KEY=sk-ant-your-key-here ``` #### Persistent (`.env` file) Create a `.env` file in your project directory: ```bash # .env file OPENAI_API_KEY=sk-your-openai-key ANTHROPIC_API_KEY=sk-ant-your-anthropic-key MISTRAL_API_KEY=mistral-your-key XAI_API_KEY=xai-your-grok-key GOOGLE_API_KEY=AIza-your-google-key PERPLEXITY_API_KEY=pplx-your-perplexity-key ``` Load the file before running: ```bash source .env ``` Security: Never commit API keys Add `.env` to `.gitignore`: ```bash echo ".env" >> .gitignore ``` #### Using direnv (Recommended for Development) [direnv](https://direnv.net/) automatically loads `.env` when you enter the directory: ```bash # Install direnv brew install direnv # macOS # or sudo apt install direnv # Ubuntu/Debian # Create .envrc file echo 'source .env' > .envrc # Allow direnv to load it direnv allow ``` Now your keys load automatically when you `cd` into the directory. ## Development Dependencies If you're contributing or want to run tests: ```bash # With uv uv sync --dev # With pip pip install -e ".[dev]" ``` This installs additional tools: - **pytest** - Test runner - **pytest-httpx** - HTTP mocking for tests - **pytest-cov** - Coverage reporting - **pytest-mock** - Advanced mocking - **freezegun** - Time mocking - **ruff** - Fast Python linter and formatter - **mkdocs** - Documentation builder - **mkdocs-material** - Material theme for MkDocs ## Docker Installation (Optional) For containerized deployment: ```dockerfile # Dockerfile FROM python:3.12-slim WORKDIR /app # Install uv RUN pip install uv # Copy project files COPY . . # Install dependencies RUN uv sync # Set entrypoint ENTRYPOINT ["uv", "run", "llm-answer-watcher"] ``` Build and run: ```bash docker build -t llm-answer-watcher . docker run -e OPENAI_API_KEY=$OPENAI_API_KEY \ -v $(pwd)/output:/app/output \ llm-answer-watcher run --config config.yaml ``` ## Troubleshooting ### Python Version Issues If you have multiple Python versions: ```bash # Use specific Python version python3.12 -m venv .venv source .venv/bin/activate python --version # Verify it's 3.12.x ``` ### Permission Errors If you get permission errors during installation: ```bash # Don't use sudo with pip in virtual environments # Instead, ensure your virtual environment is activated source .venv/bin/activate pip install -e . ``` ### SSL Certificate Errors On macOS, you might need to install certificates: ```bash /Applications/Python\ 3.12/Install\ Certificates.command ``` ### Module Not Found Errors If you get `ModuleNotFoundError` after installation: ```bash # Ensure you're in the virtual environment which python # Should point to .venv/bin/python # Re-install the package pip install -e . ``` ### uv Installation Issues If `uv sync` fails: ```bash # Try updating uv pip install --upgrade uv # Or fall back to pip pip install -e . ``` ## Next Steps Now that LLM Answer Watcher is installed: 1. [Run your first monitoring job](../first-run/) 1. [Learn about configuration](../basic-configuration/) 1. [Explore supported providers](../../providers/overview/) ## Uninstallation To remove LLM Answer Watcher: ```bash # Remove the package pip uninstall llm-answer-watcher # Remove the virtual environment rm -rf .venv # Remove output data (optional) rm -rf output/ ``` # Quick Start Get LLM Answer Watcher up and running in 5 minutes. ## Prerequisites Before you begin, ensure you have: - **Python 3.12 or 3.13** installed - **API keys** for at least one LLM provider (OpenAI recommended for getting started) - **Basic terminal knowledge** ## Installation ### Option 1: Using uv (Recommended) [uv](https://github.com/astral-sh/uv) is a fast Python package installer and resolver. ```bash # Clone the repository git clone https://github.com/nibzard/llm-answer-watcher.git cd llm-answer-watcher # Install dependencies uv sync # Activate virtual environment (optional, uv handles this automatically) source .venv/bin/activate ``` ### Option 2: Using pip ```bash # Clone the repository git clone https://github.com/nibzard/llm-answer-watcher.git cd llm-answer-watcher # Create virtual environment python3.12 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install in development mode pip install -e . ``` ## Set Up API Keys LLM Answer Watcher uses environment variables for API keys. Set up at least one: ```bash # OpenAI (recommended for getting started) export OPENAI_API_KEY=sk-your-openai-key-here # Optional: Add more providers export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here export MISTRAL_API_KEY=mistral-your-key-here export XAI_API_KEY=xai-your-grok-key-here export GOOGLE_API_KEY=AIza-your-google-api-key-here export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here ``` Persistent API Keys Create a `.env` file to persist your keys: ```bash echo "OPENAI_API_KEY=sk-your-key" > .env source .env ``` Add `.env` to your `.gitignore` to avoid accidentally committing secrets! ## Your First Run LLM Answer Watcher includes example configurations you can use immediately. ### 1. Choose an Example Config The repository includes organized example configs in the `examples/` directory: - **Quick Start**: `examples/01-quickstart/minimal.config.yaml` - Simplest possible config (1 provider, 1 intent) - **Explained**: `examples/01-quickstart/explained.config.yaml` - Same config with detailed comments - **Multi-Provider**: `examples/02-providers/multi-provider-comparison.config.yaml` - Compare all 6 providers Start with the minimal example: ### 2. Run the Tool ```bash llm-answer-watcher run --config examples/01-quickstart/minimal.config.yaml ``` ### 3. View the Output You'll see a beautiful progress display: ```text πŸ” Running LLM Answer Watcher... β”œβ”€β”€ Configuration loaded from examples/watcher.config.yaml β”œβ”€β”€ Query 1/2: "What are the best email warmup tools?" β”œβ”€β”€ Query 2/2: "Compare the top email warmup tools" β”œβ”€β”€ Models: OpenAI gpt-4o-mini β”œβ”€β”€ Brands: 2 monitored, 4 competitors └── Output: ./output/2025-11-05T14-30-00Z/ βœ… Queries completed: 2/2 πŸ’° Total cost: $0.0042 πŸ“Š Report: ./output/2025-11-05T14-30-00Z/report.html ``` ### 4. Explore Results Open the HTML report in your browser: ```bash open ./output/2025-11-05T14-30-00Z/report.html # Or on Linux: xdg-open ./output/2025-11-05T14-30-00Z/report.html ``` The report shows: - **Summary**: Total costs, queries completed, brands found - **Brand Mentions**: Which brands appeared in each response - **Rank Distribution**: Visual charts of ranking positions - **Raw Responses**: Full LLM outputs for inspection ## Understanding the Output Each run creates a timestamped directory with: ```text output/2025-11-05T14-30-00Z/ β”œβ”€β”€ run_meta.json # Run summary and stats β”œβ”€β”€ report.html # Interactive HTML report β”œβ”€β”€ intent_*_raw_*.json # Raw LLM responses β”œβ”€β”€ intent_*_parsed_*.json # Extracted brand mentions └── intent_*_error_*.json # Error details (if any) ``` All data is also stored in a SQLite database at `./output/watcher.db` for historical analysis. ## What's Next? Now that you've run your first monitoring job, here are suggested next steps: ### Create Your Own Configuration Create `my-watcher.config.yaml`: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" brands: mine: - "YourBrand" - "YourBrand.io" competitors: - "CompetitorA" - "CompetitorB" - "IndustryTool" intents: - id: "best-tools-in-category" prompt: "What are the best [your category] tools?" - id: "comparison-query" prompt: "Compare the top [your category] tools" ``` Then run: ```bash llm-answer-watcher run --config my-watcher.config.yaml ``` ### Explore More Features - **Configuration Deep Dive** ______________________________________________________________________ Learn about all configuration options [Configuration Guide β†’](../../user-guide/configuration/overview/) - **Multiple Providers** ______________________________________________________________________ Add Anthropic, Mistral, Grok, and more [Provider Guide β†’](../../providers/overview/) - **Query Your Data** ______________________________________________________________________ Use SQL to analyze historical trends [Data Analytics β†’](../../data-analytics/sqlite-database/) - **Automate Monitoring** ______________________________________________________________________ Set up scheduled runs with cron or GitHub Actions [Automation Guide β†’](../../user-guide/usage/automation/) ## Common Issues ### "Command not found: llm-answer-watcher" Make sure you've activated your virtual environment: ```bash source .venv/bin/activate # On macOS/Linux # or .venv\Scripts\activate # On Windows ``` ### "Configuration error: API key not found" Ensure your API keys are exported: ```bash echo $OPENAI_API_KEY # Should print your key ``` If empty, export it: ```bash export OPENAI_API_KEY=sk-your-key-here ``` ### "ImportError: No module named 'llm_answer_watcher'" Re-install the package: ```bash pip install -e . ``` ## Explore More Examples The `examples/` directory is organized by use case: - **[01-quickstart/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart)** - Minimal examples for first-time users - **[02-providers/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers)** - All 6 LLM providers (OpenAI, Anthropic, Google, Mistral, Grok, Perplexity) - **[03-web-search/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/03-web-search)** - Real-time web search integration - **[04-extraction/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/04-extraction)** - Brand extraction methods (regex, function calling, hybrid) - **[05-operations/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/05-operations)** - Automated analysis and insights - **[06-advanced/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced)** - Budget controls, high concurrency, production configs - **[07-real-world/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world)** - Complete use case templates Each directory includes a README with detailed explanations. ## Getting Help - **Documentation**: Browse this site for comprehensive guides - **Examples**: Check the `examples/` directory in the repository - **Issues**: [Report bugs or ask questions](https://github.com/nibzard/llm-answer-watcher/issues) - **Contributing**: [Read the contributing guide](../../contributing/development-setup/) ______________________________________________________________________ Ready to dive deeper? Continue to the [Installation Guide](../installation/) for more installation options. # Your First Run This guide walks you through running LLM Answer Watcher for the first time and understanding the results. ## Before You Start Ensure you have: - βœ… Installed LLM Answer Watcher ([Installation Guide](../installation/)) - βœ… Set up at least one API key - βœ… Activated your virtual environment ## Step 1: Verify Installation Check that everything is working: ```bash # Verify the CLI is available llm-answer-watcher --version # Check help documentation llm-answer-watcher --help ``` ## Step 2: Validate Example Configuration Before running, validate the configuration file: ```bash llm-answer-watcher validate --config examples/watcher.config.yaml ``` Expected output: ```text βœ… Configuration valid β”œβ”€β”€ Models: 1 configured (openai gpt-4o-mini) β”œβ”€β”€ Brands: 2 mine, 4 competitors β”œβ”€β”€ Intents: 2 queries └── Estimated cost: $0.004 ``` If validation fails, you'll see specific error messages about what needs to be fixed. ## Step 3: Run Your First Monitoring Job Execute a monitoring run: ```bash llm-answer-watcher run --config examples/watcher.config.yaml ``` ### What Happens During a Run #### 1. Configuration Loading ```text πŸ” Loading configuration from examples/watcher.config.yaml... β”œβ”€β”€ βœ… YAML syntax valid β”œβ”€β”€ βœ… Schema validation passed β”œβ”€β”€ βœ… API keys found └── βœ… Output directory accessible ``` #### 2. Cost Estimation ```text πŸ’° Estimated cost breakdown: β”œβ”€β”€ OpenAI gpt-4o-mini: $0.002 Γ— 2 intents = $0.004 └── Total estimated cost: $0.004 Continue with this run? [Y/n]: ``` Press `Y` to continue, or `n` to abort. Skip confirmation prompts Use `--yes` flag to auto-confirm in automated scripts: ```bash llm-answer-watcher run --config config.yaml --yes ``` #### 3. Query Execution You'll see progress for each query: ```text πŸ“€ Query 1/2: "What are the best email warmup tools?" β”œβ”€β”€ Provider: OpenAI (gpt-4o-mini) β”œβ”€β”€ Sending request... ⏳ β”œβ”€β”€ βœ… Response received (1.2s) β”œβ”€β”€ Tokens: 145 input, 387 output β”œβ”€β”€ Cost: $0.002 └── Brands detected: 3 found (Lemwarm, Instantly, HubSpot) πŸ“€ Query 2/2: "Compare the top email warmup tools" β”œβ”€β”€ Provider: OpenAI (gpt-4o-mini) β”œβ”€β”€ Sending request... ⏳ β”œβ”€β”€ βœ… Response received (1.4s) β”œβ”€β”€ Tokens: 152 input, 421 output β”œβ”€β”€ Cost: $0.002 └── Brands detected: 4 found (Lemwarm, Lemlist, Instantly, Apollo.io) ``` #### 4. Results Summary ```text βœ… Run completed successfully! πŸ“Š Summary: β”œβ”€β”€ Run ID: 2025-11-05T14-30-00Z β”œβ”€β”€ Queries: 2/2 completed (100%) β”œβ”€β”€ Total cost: $0.004 β”œβ”€β”€ Brands found: 5 unique β”œβ”€β”€ Your brands mentioned: 2/2 queries β”œβ”€β”€ Competitor mentions: 4/2 queries └── Output: ./output/2025-11-05T14-30-00Z/ πŸ“ Artifacts created: β”œβ”€β”€ report.html - Interactive HTML report β”œβ”€β”€ run_meta.json - Run summary and metadata β”œβ”€β”€ *.raw.json - Raw LLM responses β”œβ”€β”€ *.parsed.json - Extracted brand mentions └── watcher.db - Historical SQLite database ``` ## Step 4: Explore the Results ### HTML Report Open the interactive report: ```bash # macOS open ./output/2025-11-05T14-30-00Z/report.html # Linux xdg-open ./output/2025-11-05T14-30-00Z/report.html # Windows start ./output/2025-11-05T14-30-00Z/report.html ``` The report contains: #### Summary Section - Total cost breakdown - Queries completed vs failed - Unique brands detected - Your brand mention rate #### Brand Mentions Table | Intent | Model | Your Brand | Competitors | Rank | | ----------------------- | ----------- | -------------------------- | ------------------------------ | ---- | | best-email-warmup-tools | gpt-4o-mini | Lemwarm (#1) | Instantly (#2), HubSpot (#3) | 1 | | email-warmup-comparison | gpt-4o-mini | Lemwarm (#1), Lemlist (#2) | Instantly (#3), Apollo.io (#4) | 1 | #### Rank Distribution Chart Visual representation of where your brand appears in ranked lists. #### Historical Trends If you've run multiple times, you'll see trend charts showing: - Brand mention frequency over time - Average ranking position changes - Competitor appearance patterns #### Raw Responses Expandable sections showing the full LLM response for each query. ### JSON Artifacts Each run creates structured JSON files: #### `run_meta.json` Summary of the entire run: ```json { "run_id": "2025-11-05T14-30-00Z", "timestamp_utc": "2025-11-05T14:30:00Z", "config_path": "examples/watcher.config.yaml", "total_cost_usd": 0.004, "queries_completed": 2, "queries_failed": 0, "brands_detected": { "mine": ["Lemwarm", "Lemlist"], "competitors": ["Instantly", "HubSpot", "Apollo.io"] } } ``` #### `intent_*_raw_*.json` Raw LLM response with metadata: ```json { "intent_id": "best-email-warmup-tools", "provider": "openai", "model_name": "gpt-4o-mini", "prompt": "What are the best email warmup tools?", "answer_text": "Here are the best email warmup tools:\n\n1. Lemwarm...", "tokens_used": 532, "cost_usd": 0.002, "timestamp_utc": "2025-11-05T14:30:00Z" } ``` #### `intent_*_parsed_*.json` Extracted brand mentions and ranks: ```json { "intent_id": "best-email-warmup-tools", "provider": "openai", "model_name": "gpt-4o-mini", "brands_found": { "mine": [ { "brand": "Lemwarm", "normalized": "lemwarm", "rank_position": 1, "context": "1. Lemwarm - Best for automated warmup" } ], "competitors": [ { "brand": "Instantly", "normalized": "instantly", "rank_position": 2, "context": "2. Instantly - Great deliverability features" } ] } } ``` ### SQLite Database All data is stored in `./output/watcher.db` for historical tracking: ```bash # Open the database sqlite3 ./output/watcher.db # View recent runs SELECT run_id, timestamp_utc, total_cost_usd, queries_completed FROM runs ORDER BY timestamp_utc DESC LIMIT 5; ``` ## Step 5: Run with Different Modes ### Agent Mode (Structured JSON Output) Perfect for automation and AI agents: ```bash llm-answer-watcher run --config examples/watcher.config.yaml --format json ``` Output: ```json { "run_id": "2025-11-05T14-30-00Z", "status": "success", "queries_completed": 2, "queries_failed": 0, "total_cost_usd": 0.004, "output_dir": "./output/2025-11-05T14-30-00Z", "brands_detected": { "mine": ["Lemwarm", "Lemlist"], "competitors": ["Instantly", "HubSpot", "Apollo.io"] } } ``` ### Quiet Mode (Minimal Output) For scripts and pipelines: ```bash llm-answer-watcher run --config examples/watcher.config.yaml --quiet ``` Output (tab-separated): ```text 2025-11-05T14-30-00Z success 2 0.004 ./output/2025-11-05T14-30-00Z ``` ### Automation Mode (No Prompts) Skip confirmation prompts: ```bash llm-answer-watcher run --config examples/watcher.config.yaml --yes --format json ``` ## Understanding Exit Codes LLM Answer Watcher uses exit codes for automation: ```bash llm-answer-watcher run --config config.yaml echo $? # Print exit code ``` | Exit Code | Meaning | When It Happens | | --------- | ------------------- | ------------------------------------------ | | **0** | Success | All queries completed successfully | | **1** | Configuration Error | Invalid YAML, missing API keys, bad schema | | **2** | Database Error | Cannot create/access SQLite database | | **3** | Partial Failure | Some queries failed, but run completed | | **4** | Complete Failure | No queries succeeded | Use in scripts: ```bash #!/bin/bash llm-answer-watcher run --config config.yaml --yes case $? in 0) echo "βœ… Success!" ;; 1) echo "❌ Configuration error" && exit 1 ;; 2) echo "❌ Database error" && exit 1 ;; 3) echo "⚠️ Partial failure" ;; 4) echo "❌ Complete failure" && exit 1 ;; esac ``` ## Common First-Run Issues ### Issue: "API key not found" **Solution**: Ensure API keys are exported: ```bash echo $OPENAI_API_KEY # Should print your key export OPENAI_API_KEY=sk-your-key-here ``` ### Issue: "Permission denied: ./output/" **Solution**: Create output directory with correct permissions: ```bash mkdir -p output chmod 755 output ``` ### Issue: "No brands detected" **Possible causes**: 1. **Brand name mismatch**: LLM used different name (e.g., "HubSpot CRM" vs "HubSpot") 1. **Not mentioned**: Brand wasn't included in LLM response 1. **Word boundary issue**: Brand name contains special characters **Solution**: Check raw response and add brand aliases: ```yaml brands: mine: - "YourBrand" - "YourBrand.io" - "YourBrand CRM" # Add variations ``` ### Issue: "Rate limit exceeded" **Solution**: LLM API rate limit hit. Wait and retry, or add retry configuration: ```yaml run_settings: retry_max_attempts: 5 retry_wait_exponential_multiplier: 2 ``` ## Next Steps Now that you've completed your first run: - **Customize Configuration** ______________________________________________________________________ Create your own config with your brands and intents [Basic Configuration β†’](../basic-configuration/) - **Query Your Data** ______________________________________________________________________ Use SQL to analyze results and track trends [Data Analytics β†’](../../data-analytics/sqlite-database/) - **Add More Providers** ______________________________________________________________________ Compare results across OpenAI, Anthropic, Mistral, and more [Provider Guide β†’](../../providers/overview/) - :material-calendar-repeat: **Automate Runs** ______________________________________________________________________ Set up scheduled monitoring with cron or GitHub Actions [Automation β†’](../../user-guide/usage/automation/) # Basic Configuration Learn how to create your first custom configuration file for LLM Answer Watcher. ## Configuration File Structure LLM Answer Watcher uses YAML configuration files with three main sections: ```yaml run_settings: # How and where to run brands: # What brands to monitor intents: # What questions to ask ``` ## Minimal Configuration The simplest possible configuration: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" brands: mine: - "YourBrand" competitors: - "CompetitorA" - "CompetitorB" intents: - id: "best-tools" prompt: "What are the best [your category] tools?" ``` This configuration: - Uses OpenAI's `gpt-4o-mini` (cost-effective) - Monitors 1 brand vs 2 competitors - Asks 1 intent question - Stores results in `./output/` ## Run Settings Section ### Basic Run Settings ```yaml run_settings: # Where to store output files output_dir: "./output" # SQLite database path for historical tracking sqlite_db_path: "./output/watcher.db" # Use LLM for rank extraction (more accurate but costs more) use_llm_rank_extraction: false ``` ### Model Configuration Define which LLM providers and models to use: ```yaml run_settings: models: # OpenAI configuration - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Anthropic configuration - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" ``` **Key Points:** - **provider**: Must be one of: `openai`, `anthropic`, `mistral`, `grok`, `google`, `perplexity` - **model_name**: Specific model identifier (see [Provider Guide](../../providers/overview/)) - **env_api_key**: Environment variable name containing your API key Model Selection Start with cost-effective models: - **OpenAI**: `gpt-4o-mini` ($0.15/1M input tokens) - **Anthropic**: `claude-3-5-haiku-20241022` ($0.80/1M input tokens) - **Mistral**: `mistral-small-latest` ($0.20/1M input tokens) - **Grok**: `grok-2-1212` ($2.00/1M input tokens) - **Google**: `gemini-2.0-flash-exp` (free tier available) ### Optional System Prompts Customize the system prompt for each model: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/gpt-4-default" # Uses built-in prompt ``` If not specified, uses the provider's default prompt. ## Brands Section ### Your Brands Define all variations of your brand name: ```yaml brands: mine: - "YourBrand" - "YourBrand.io" - "YourBrand CRM" - "yourbrand.com" ``` **Why multiple aliases?** LLMs might reference your brand differently: - "HubSpot" vs "HubSpot CRM" - "Lemwarm" vs "Lemwarm.io" - Domain names: "acme.com" Word Boundary Matching Brands are matched using word boundaries. "Hub" will NOT match in "GitHub". Add specific variations if needed. ### Competitors List all competitors to track: ```yaml brands: competitors: - "CompetitorA" - "CompetitorB" - "IndustryTool" - "Alternative.io" - "BigPlayer CRM" ``` **Tips:** - Include direct competitors (same category) - Include indirect competitors (adjacent use cases) - Use specific names, not generic terms - Add variations for well-known competitors ### Complete Brands Example ```yaml brands: mine: - "Lemwarm" - "Lemwarm.io" - "Lemlist" - "Lemlist.com" competitors: - "Instantly" - "Instantly.ai" - "Warmbox" - "Warmbox.ai" - "MailReach" - "HubSpot" - "Apollo.io" - "Woodpecker" ``` ## Intents Section Intents are the questions you want to ask LLMs. ### Basic Intent ```yaml intents: - id: "best-tools" prompt: "What are the best email warmup tools?" ``` **Intent Structure:** - **id**: Unique identifier (used in filenames and database) - **prompt**: The exact question to ask the LLM ### Multiple Intents Test different question types: ```yaml intents: # Direct question - id: "best-email-warmup-tools" prompt: "What are the best email warmup tools?" # Comparison query - id: "comparison-warmup-tools" prompt: "Compare the top email warmup tools for improving deliverability" # Specific use case - id: "cold-outreach-tools" prompt: "Which email warmup tools are best for cold outreach campaigns?" # Alternative phrasing - id: "recommended-warmup-services" prompt: "What email warmup services do you recommend for startups?" ``` ### Intent ID Best Practices Use descriptive, URL-safe IDs: βœ… Good IDs: - `best-crm-tools` - `email-automation-comparison` - `startup-friendly-options` ❌ Avoid: - `query1` (not descriptive) - `best CRM tools` (spaces) - `what's-best?` (special characters) ### Crafting Effective Prompts **Good prompts are:** 1. **Natural**: How a real user would ask 1. **Specific**: Target a particular use case or category 1. **Open-ended**: Allow for varied responses 1. **Buyer-intent**: Imply readiness to evaluate/purchase Examples: ```yaml intents: # βœ… Good: Natural buyer-intent query - id: "saas-analytics-tools" prompt: "What are the best analytics tools for SaaS companies?" # βœ… Good: Specific use case - id: "startup-crm-budget" prompt: "Which CRM is best for startups on a tight budget?" # ❌ Too broad - id: "software" prompt: "Tell me about software" # ❌ Not buyer-intent - id: "history" prompt: "What is the history of CRM software?" ``` ## Complete Basic Configuration Example Here's a complete, production-ready configuration: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: # Cost-effective model for regular monitoring - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # High-quality model for comparison - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" # Use regex-based extraction (faster, cheaper) use_llm_rank_extraction: false brands: mine: - "Lemwarm" - "Lemwarm.io" - "Lemlist" - "lemlist.com" competitors: - "Instantly" - "Instantly.ai" - "Warmbox" - "Warmbox.ai" - "MailReach" - "HubSpot" - "Apollo.io" - "Woodpecker" intents: # Direct question - id: "best-email-warmup-tools" prompt: "What are the best email warmup tools?" # Comparison query - id: "warmup-tools-comparison" prompt: "Compare the top email warmup tools for improving email deliverability" # Use case specific - id: "cold-outreach-warmup" prompt: "Which email warmup tools are best for cold outreach campaigns?" # Budget-conscious - id: "affordable-warmup-tools" prompt: "What are the most affordable email warmup tools for startups?" ``` ## Testing Your Configuration Always validate before running: ```bash llm-answer-watcher validate --config my-config.yaml ``` Expected output: ```text βœ… Configuration valid β”œβ”€β”€ Models: 2 configured β”‚ β”œβ”€β”€ openai: gpt-4o-mini β”‚ └── anthropic: claude-3-5-haiku-20241022 β”œβ”€β”€ Brands: 4 mine, 8 competitors β”œβ”€β”€ Intents: 4 queries └── Estimated cost: $0.016 (8 queries total) ``` ## Next Steps Now that you understand basic configuration: - **Advanced Configuration** ______________________________________________________________________ Budget controls, web search, custom operations [Configuration Guide β†’](../../user-guide/configuration/overview/) - **Run Your Config** ______________________________________________________________________ Execute monitoring with your custom configuration [First Run β†’](../first-run/) - **Add More Providers** ______________________________________________________________________ Learn about Mistral, Grok, Google, Perplexity [Providers β†’](../../providers/overview/) - **See Examples** ______________________________________________________________________ Browse organized configuration examples [Quickstart Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart) | [All Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples) # Configuration # Configuration Overview LLM Answer Watcher uses a YAML configuration file to control all aspects of monitoring: which LLMs to query, which brands to track, what questions to ask, and how to manage costs. ## Configuration Structure A complete configuration file has these main sections: ```yaml run_settings: # Output paths, models, and run behavior extraction_settings: # Optional: advanced extraction configuration brands: # Your brand and competitors to track intents: # Questions to ask LLMs global_operations: # Optional: operations run for every intent ``` ## Quick Start Example Here's a minimal configuration to get started: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" brands: mine: - "MyBrand" - "MyBrand.io" competitors: - "CompetitorA" - "CompetitorB" intents: - id: "best-tools" prompt: "What are the best tools for [your category]?" ``` Environment Variables Set your API keys as environment variables before running: ```bash export OPENAI_API_KEY=sk-your-key-here export ANTHROPIC_API_KEY=sk-ant-your-key-here ``` ## Configuration Sections Explained ### Run Settings Controls where output is stored, which models to query, and runtime behavior. **Key fields:** - `output_dir`: Directory for run results (JSON files, HTML reports) - `sqlite_db_path`: SQLite database path for historical tracking - `models`: List of LLM models to query (see [Model Configuration](../models/)) - `use_llm_rank_extraction`: Use LLM to extract rankings (slower, more accurate) - `budget`: Optional cost controls (see [Budget Configuration](../budget/)) **Example:** ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" use_llm_rank_extraction: false budget: enabled: true max_per_run_usd: 1.00 warn_threshold_usd: 0.50 ``` ### Extraction Settings (Optional) Advanced configuration for brand mention and rank extraction using function calling. **Key fields:** - `extraction_model`: Dedicated model for extraction (faster, cheaper than main models) - `method`: Extraction method (`function_calling`, `regex`, or `hybrid`) - `fallback_to_regex`: Fall back to regex if function calling fails - `min_confidence`: Minimum confidence threshold (0.0-1.0) **Example:** ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/extraction-default" method: "function_calling" fallback_to_regex: true min_confidence: 0.7 ``` When to Use Extraction Settings Use extraction settings when: - Regex extraction misses complex brand mentions - You need higher accuracy for ranking positions - You want to extract additional structured data Skip it when: - You're optimizing for cost (regex is free) - Your brand names are simple and unambiguous - You're running frequent monitoring jobs ### Brands Defines which brands to track in LLM responses. **Two categories:** 1. **mine**: Your brand aliases (at least one required) 1. **competitors**: Competitor brands to monitor **Example:** ```yaml brands: mine: - "Warmly" - "Warmly.io" - "Warmly AI" competitors: - "Instantly" - "Lemwarm" - "HubSpot" - "Apollo.io" - "Woodpecker" ``` Brand Alias Best Practices - Include all variations (with/without TLD, with/without product name) - Use word-boundary matching to avoid false positives - Add common misspellings if relevant - Keep list focused (10-20 competitors maximum) See [Brand Configuration](../brands/) for detailed strategies. ### Intents Questions you want to ask LLMs to test brand visibility. **Key fields:** - `id`: Unique identifier (alphanumeric, hyphens, underscores) - `prompt`: Natural language question to ask - `operations`: Optional post-query operations (see [Operations](../operations/)) **Example:** ```yaml intents: - id: "best-email-warmup-tools" prompt: "What are the best email warmup tools?" - id: "email-warmup-comparison" prompt: "Compare the top email warmup tools for improving deliverability" - id: "hubspot-alternatives" prompt: "What are the best alternatives to HubSpot for small sales teams?" ``` Intent Prompt Design Good prompts are: - **Natural**: How a real user would ask - **Specific**: Target a clear use case - **Buyer-focused**: Imply purchase intent - **Ranking-friendly**: Ask for "best" or "top" tools Bad prompts: - Too generic: "Tell me about CRM tools" - No ranking signal: "What is HubSpot?" - Biased: "Why is MyBrand better than CompetitorA?" ### Global Operations (Optional) Operations that run for **every** intent across all models. **Use cases:** - Quality scoring for all LLM responses - Sentiment analysis - Content gap detection - Consistent post-processing **Example:** ```yaml global_operations: - id: "quality-score" description: "Rate LLM response quality" prompt: | Rate this LLM response on a scale of 1-10 for accuracy and completeness: Question: {intent:prompt} Response: {intent:response} Provide a single number score (1-10) and brief justification. model: "gpt-4o-mini" enabled: true ``` Global vs Intent-Specific Operations Use **global operations** for: - Consistent quality checks - Standard metrics across all intents - Cost-effective batch analysis Use **intent-specific operations** for: - Detailed competitive analysis - Context-specific insights - Intent-dependent workflows ## Configuration Validation Validate your configuration before running: ```bash llm-answer-watcher validate --config watcher.config.yaml ``` **Common validation errors:** 1. **Missing API keys**: Environment variable not set 1. **Duplicate intent IDs**: Intent IDs must be unique 1. **Invalid provider**: Unsupported provider name 1. **Empty brand list**: At least one brand in `mine` required 1. **Invalid intent ID**: Must be alphanumeric with hyphens/underscores Validation Output ```text βœ… Configuration valid β”œβ”€β”€ 3 intents configured β”œβ”€β”€ 2 models configured (OpenAI, Anthropic) β”œβ”€β”€ 2 brands monitored β”œβ”€β”€ 5 competitors tracked └── Estimated cost: $0.0142 per run ``` ## Configuration Best Practices ### 1. Start Small Begin with one model and a few intents: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # Cheapest option env_api_key: "OPENAI_API_KEY" intents: - id: "primary-intent" prompt: "Your most important question" ``` ### 2. Use Budget Controls Prevent unexpected costs: ```yaml run_settings: budget: enabled: true max_per_run_usd: 1.00 warn_threshold_usd: 0.50 ``` ### 3. Keep Brand Lists Focused Track 10-20 competitors maximum: ```yaml brands: mine: - "YourBrand" # Exact name - "YourBrand.io" # With TLD competitors: # Top 5 direct competitors - "CompetitorA" - "CompetitorB" # Top 3 category leaders - "MarketLeader" ``` ### 4. Design Intent Prompts Carefully Ask natural questions with ranking signals: ```yaml intents: # Good: Natural, specific, implies ranking - id: "best-crm-for-startups" prompt: "What are the best CRM tools for early-stage startups?" # Bad: Generic, no ranking signal - id: "crm-info" prompt: "Tell me about CRM software" ``` ### 5. Use System Prompts Customize model behavior with system prompts: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/gpt-4-default" # Uses default prompt ``` System prompts are stored in `llm_answer_watcher/system_prompts/{provider}/{prompt_name}.json`. ### 6. Enable Web Search for Fresh Data Use web search for queries needing current information: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` Web Search Costs Web search adds (10-)25 per 1,000 calls depending on the model. See [Web Search Configuration](../web-search/). ### 7. Version Your Config Track configuration changes with git: ```bash git add watcher.config.yaml git commit -m "feat: add new competitor tracking" ``` This creates an audit trail of what you were monitoring when. ## Configuration Examples ### Production Monitoring Multi-model, budget-controlled, comprehensive tracking: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" budget: enabled: true max_per_run_usd: 5.00 warn_threshold_usd: 2.50 extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" fallback_to_regex: true min_confidence: 0.7 brands: mine: - "MyBrand" - "MyBrand.io" competitors: - "TopCompetitor" - "MainRival" - "IndustryLeader" intents: - id: "best-tools-general" prompt: "What are the best [category] tools?" - id: "best-tools-startups" prompt: "What are the best [category] tools for startups?" - id: "best-tools-enterprise" prompt: "What are the best [category] tools for enterprises?" ``` ### Development Testing Minimal config for fast iteration: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" budget: enabled: true max_per_run_usd: 0.10 brands: mine: - "TestBrand" competitors: - "CompetitorA" intents: - id: "test-intent" prompt: "What are the best tools for testing?" ``` ### CI/CD Regression Testing Automated monitoring with strict controls: ```yaml run_settings: output_dir: "./ci-output" sqlite_db_path: "./ci-output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" budget: enabled: true max_per_run_usd: 0.50 brands: mine: - "MyBrand" competitors: - "TopCompetitor" intents: - id: "regression-test" prompt: "What are the best [category] tools?" ``` ## Configuration File Location LLM Answer Watcher looks for configuration in these locations (in order): 1. Path specified with `--config` flag 1. `watcher.config.yaml` in current directory 1. `~/.config/llm-answer-watcher/config.yaml` **Best practice**: Keep config files in your project directory and specify explicitly: ```bash llm-answer-watcher run --config watcher.config.yaml ``` ## Environment Variables ### API Keys All API keys are loaded from environment variables for security: ```bash # OpenAI export OPENAI_API_KEY=sk-your-key-here # Anthropic export ANTHROPIC_API_KEY=sk-ant-your-key-here # Mistral export MISTRAL_API_KEY=your-mistral-key-here # X.AI Grok export XAI_API_KEY=xai-your-key-here # Google Gemini export GOOGLE_API_KEY=AIza-your-key-here # Perplexity export PERPLEXITY_API_KEY=pplx-your-key-here ``` ### Configuration Overrides Override config values with environment variables: ```bash # Override output directory export LLM_WATCHER_OUTPUT_DIR="./custom-output" # Override database path export LLM_WATCHER_DB_PATH="./custom.db" # Disable budget checks export LLM_WATCHER_BUDGET_ENABLED=false ``` ## Security Considerations ### Never Commit API Keys Add `.env` files to `.gitignore`: ```text # .gitignore .env .env.local *.env watcher.config.local.yaml ``` ### Use Environment-Specific Configs Create separate configs for each environment: ```text configs/ β”œβ”€β”€ watcher.config.dev.yaml # Development β”œβ”€β”€ watcher.config.staging.yaml # Staging β”œβ”€β”€ watcher.config.prod.yaml # Production ``` Load the appropriate config: ```bash llm-answer-watcher run --config configs/watcher.config.prod.yaml ``` ### Rotate API Keys Regularly Update API keys in your environment: ```bash # Update key export OPENAI_API_KEY=sk-new-key-here # Verify it works llm-answer-watcher validate --config watcher.config.yaml ``` ## Troubleshooting ### Configuration Validation Fails **Problem**: `Configuration error: Invalid YAML syntax` **Solution**: Check YAML syntax with a validator: ```bash python -c "import yaml; yaml.safe_load(open('watcher.config.yaml'))" ``` Common YAML errors: - Inconsistent indentation (use 2 spaces) - Missing colons after keys - Unquoted strings with special characters - Mixing tabs and spaces ______________________________________________________________________ **Problem**: `API key not found: OPENAI_API_KEY` **Solution**: Set the environment variable: ```bash export OPENAI_API_KEY=sk-your-key-here ``` Verify it's set: ```bash echo $OPENAI_API_KEY ``` ______________________________________________________________________ **Problem**: `Duplicate intent IDs found: best-tools` **Solution**: Make each intent ID unique: ```yaml intents: - id: "best-tools-general" # Changed from "best-tools" prompt: "What are the best tools?" - id: "best-tools-startups" # Changed from "best-tools" prompt: "What are the best tools for startups?" ``` ### Output Directory Issues **Problem**: `Cannot write to output directory: Permission denied` **Solution**: Check directory permissions: ```bash mkdir -p ./output chmod 755 ./output ``` Or change to a directory you own: ```yaml run_settings: output_dir: "~/llm-watcher-output" ``` ______________________________________________________________________ **Problem**: `SQLite database is locked` **Solution**: Ensure no other processes are using the database: ```bash # Check for locks lsof ./output/watcher.db # Kill blocking processes if safe kill -9 ``` Or use a separate database: ```yaml run_settings: sqlite_db_path: "./output/watcher-$(date +%s).db" ``` ### Model Configuration Issues **Problem**: `Unsupported provider: openai-gpt4` **Solution**: Use correct provider names: ```yaml # ❌ Wrong provider: "openai-gpt4" # βœ… Correct provider: "openai" model_name: "gpt-4o-mini" ``` Supported providers: `openai`, `anthropic`, `mistral`, `grok`, `google`, `perplexity` ______________________________________________________________________ **Problem**: `Model not found: gpt-4o-mini-turbo` **Solution**: Use valid model names: ```yaml # ❌ Wrong (doesn't exist) model_name: "gpt-4o-mini-turbo" # βœ… Correct model_name: "gpt-4o-mini" ``` Check [Model Configuration](../models/) for valid model names. ## Next Steps Now that you understand the configuration structure, dive into specific sections: - **[Model Configuration](../models/)**: Choose the right models for your use case - **[Brand Configuration](../brands/)**: Optimize brand detection strategies - **[Intent Configuration](../intents/)**: Design effective prompts - **[Budget Configuration](../budget/)**: Control costs and prevent overruns - **[Web Search Configuration](../web-search/)**: Enable real-time information retrieval - **[Operations Configuration](../operations/)**: Automate post-query analysis Or jump to usage guides: - **[CLI Commands](../../usage/cli-commands/)**: Run your first monitoring job - **[Output Modes](../../usage/output-modes/)**: Understand output formats - **[Automation](../../usage/automation/)**: Set up scheduled monitoring # Model Configuration Model configuration controls which LLMs to query and how they're accessed. LLM Answer Watcher supports multiple providers with unified configuration. ## Supported Providers | Provider | Models Available | Pricing | Best For | | -------------- | -------------------------------------------------- | ------------------- | -------------------------------- | | **OpenAI** | gpt-4o-mini, gpt-4o, gpt-4-turbo | (0.15-)10/1M tokens | Fast, cost-effective, production | | **Anthropic** | claude-3-5-haiku, claude-3-5-sonnet, claude-3-opus | (0.80-)75/1M tokens | High-quality reasoning | | **Mistral** | mistral-large, mistral-medium, mistral-small | (2-)8/1M tokens | European compliance | | **X.AI Grok** | grok-beta, grok-2-1212, grok-3 | (2-)25/1M tokens | Real-time X integration | | **Google** | gemini-2.0-flash, gemini-1.5-pro | (0.075-)7/1M tokens | Multimodal, fast | | **Perplexity** | sonar, sonar-pro, sonar-reasoning | (1-)15/1M tokens | Web-grounded answers | ## Basic Model Configuration ### Single Model Setup Minimal configuration with one model: ```yaml run_settings: models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` **Required fields:** - `provider`: Provider name (see supported providers above) - `model_name`: Specific model identifier - `env_api_key`: Environment variable name containing API key ### Multi-Model Setup Query multiple models for comparison: ```yaml run_settings: models: # Fast and cheap - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # High quality - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" env_api_key: "ANTHROPIC_API_KEY" # Web-grounded - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` Multi-Model Benefits Querying multiple models helps you: - **Compare providers**: See which LLMs favor your brand - **Reduce variance**: Average rankings across models - **Hedge risk**: Don't depend on one provider's algorithm - **Track trends**: Monitor provider-specific changes over time ## Provider-Specific Configuration ### OpenAI **Supported models:** - `gpt-4o-mini`: Fast, cheap, production-ready ((0.15/)0.60 per 1M input/output tokens) - `gpt-4o`: High quality, balanced cost ((2.50/)10 per 1M tokens) - `gpt-4-turbo`: Fast GPT-4, good for complex tasks ((10/)30 per 1M tokens) - `gpt-3.5-turbo`: Legacy, very cheap ((0.50/)1.50 per 1M tokens) **Basic configuration:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` **With custom system prompt:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/gpt-4-default" ``` **With web search enabled:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` Web Search Costs OpenAI web search adds (10-)25 per 1,000 calls. See [Web Search Configuration](../web-search/). **API key setup:** ```bash export OPENAI_API_KEY=sk-your-openai-key-here ``` Get your API key from: [platform.openai.com/api-keys](https://platform.openai.com/api-keys) ______________________________________________________________________ ### Anthropic (Claude) **Supported models:** - `claude-3-5-haiku-20241022`: Fast, cheap, smart ((0.80/)4 per 1M tokens) - `claude-3-5-sonnet-20241022`: Balanced quality/cost ((3/)15 per 1M tokens) - `claude-3-opus-20240229`: Highest quality ((15/)75 per 1M tokens) **Basic configuration:** ```yaml models: - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" ``` **With custom system prompt:** ```yaml models: - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" env_api_key: "ANTHROPIC_API_KEY" system_prompt: "anthropic/default" ``` **API key setup:** ```bash export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here ``` Get your API key from: [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys) Claude Strengths Claude models excel at: - **Nuanced reasoning**: Better at understanding context - **Longer responses**: More comprehensive answers - **Safety**: Strong content moderation - **Instruction following**: Precise adherence to prompts ______________________________________________________________________ ### Mistral **Supported models:** - `mistral-large-latest`: Flagship model ((2/)6 per 1M tokens) - `mistral-medium-latest`: Balanced ((2.50/)7.50 per 1M tokens) - `mistral-small-latest`: Fast and cheap ((0.20/)0.60 per 1M tokens) **Basic configuration:** ```yaml models: - provider: "mistral" model_name: "mistral-large-latest" env_api_key: "MISTRAL_API_KEY" ``` **API key setup:** ```bash export MISTRAL_API_KEY=your-mistral-api-key-here ``` Get your API key from: [console.mistral.ai/api-keys](https://console.mistral.ai/api-keys) Mistral Strengths Mistral models are ideal for: - **European compliance**: GDPR-friendly European provider - **Multilingual**: Strong performance in French, German, Spanish - **Cost efficiency**: Competitive pricing - **Open weights**: Some models have open weights available ______________________________________________________________________ ### X.AI (Grok) **Supported models:** - `grok-beta`: Beta access model ((2/)10 per 1M tokens) - `grok-2-1212`: Latest stable version ((2/)10 per 1M tokens) - `grok-2-latest`: Always latest version ((2/)10 per 1M tokens) - `grok-3`: Next-generation model ((5/)25 per 1M tokens) - `grok-3-mini`: Fast, lightweight ((2/)8 per 1M tokens) **Basic configuration:** ```yaml models: - provider: "grok" model_name: "grok-2-1212" env_api_key: "XAI_API_KEY" ``` **API key setup:** ```bash export XAI_API_KEY=xai-your-grok-key-here ``` Get your API key from: [console.x.ai](https://console.x.ai) Grok Strengths Grok models offer: - **X platform integration**: Real-time data from X (Twitter) - **OpenAI compatibility**: Drop-in replacement for OpenAI API - **Current events**: Up-to-date information - **Humor**: Unique personality in responses ______________________________________________________________________ ### Google (Gemini) **Supported models:** | Model | Cost (Input/Output) | Grounding | Best For | | ----------------------- | ------------------- | --------------- | ---------------------------- | | `gemini-2.5-flash` | (0.04/)0.12 per 1M | βœ… Yes | **Recommended** - production | | `gemini-2.5-flash-lite` | (0.02/)0.06 per 1M | ❌ No | High-volume, non-grounded | | `gemini-2.5-pro` | (0.60/)1.80 per 1M | βœ… Yes | Highest quality | | `gemini-2.0-flash-exp` | (0.075/)0.30 per 1M | ⚠️ Experimental | Testing | | `gemini-1.5-pro` | (1.25/)5 per 1M | ❌ No | Legacy (not recommended) | **Basic configuration** (without grounding): ```yaml models: - provider: "google" model_name: "gemini-2.5-flash-lite" env_api_key: "GEMINI_API_KEY" ``` **With Google Search grounding** (recommended for brand monitoring): ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} # Enable Google Search ``` **API key setup:** ```bash export GEMINI_API_KEY=AIza-your-google-api-key-here ``` Get your API key from: [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey) Gemini Strengths Gemini models excel at: - **Google Search grounding**: Real-time web data with no per-request fees - **Speed**: Very fast inference - **Cost**: Most cost-effective for web-grounded queries - **Multimodal**: Built for text, image, video, audio - **Long context**: Up to 2M token context window Configuration Format Difference Google uses `google_search: {}` (dictionary format) while OpenAI uses `type: "web_search"` (typed format). This reflects different provider API specifications. See [Google provider docs](../../../providers/google/) for details. ______________________________________________________________________ ### Perplexity **Supported models:** - `sonar`: Fast, web-grounded ((1/)1 per 1M tokens + request fees) - `sonar-pro`: High-quality grounded ((3/)15 per 1M tokens + request fees) - `sonar-reasoning`: Enhanced reasoning ((1/)5 per 1M tokens + request fees) - `sonar-reasoning-pro`: Best reasoning ((3/)15 per 1M tokens + request fees) - `sonar-deep-research`: In-depth research ((3/)15 per 1M tokens + request fees) **Basic configuration:** ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` **API key setup:** ```bash export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here ``` Get your API key from: [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api) Perplexity Request Fees Perplexity charges additional **request fees** based on search context: - Basic searches: ~$0.005 per request - Complex searches: ~(0.01-)0.03 per request These fees are **not yet included** in cost estimates. Budget accordingly. Perplexity Strengths Perplexity models offer: - **Web grounding**: All answers cite web sources - **Fresh data**: Real-time web search - **Citations**: Transparent source attribution - **Research mode**: Deep-dive analysis ## Advanced Model Configuration ### Custom System Prompts System prompts customize model behavior. LLM Answer Watcher includes default prompts for each provider. **Using default provider prompt:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Uses openai/default.json automatically ``` **Using custom prompt:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/extraction-default" ``` **Prompt file structure:** System prompts are stored in `llm_answer_watcher/system_prompts/{provider}/{prompt_name}.json`: ```json { "role": "system", "content": "You are a helpful assistant that provides accurate, comprehensive answers to user questions about software tools and services. When asked for recommendations, provide a balanced view of multiple options with their strengths and weaknesses." } ``` **Creating custom prompts:** 1. Create a new prompt file in the provider directory 1. Reference it in your configuration 1. Test with validation: ```bash llm-answer-watcher validate --config watcher.config.yaml ``` System Prompt Best Practices - **Be specific**: Clear instructions produce better results - **Stay neutral**: Don't bias toward your brand - **Request structure**: Ask for ranked lists, numbered items - **Test variations**: Try different prompts, measure impact ### Temperature and Sampling Control response randomness (some providers only): ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" temperature: 0.7 # 0.0 = deterministic, 1.0 = creative top_p: 0.9 # Nucleus sampling ``` Temperature Guide - **0.0-0.3**: Deterministic, consistent answers (recommended for monitoring) - **0.4-0.7**: Balanced creativity and consistency - **0.8-1.0**: Creative, varied responses (not recommended for tracking) ### Max Tokens Limit response length: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" max_tokens: 1000 # Limit to ~750 words ``` Max Tokens and Cost Setting `max_tokens` limits output cost but may truncate responses. For monitoring, allow enough tokens for complete answers (500-2000 recommended). ### Tools and Function Calling Enable tools like web search: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # or "required", "none" ``` **Tool choice options:** - `auto`: Model decides when to use tools (recommended) - `required`: Model must use tools for every query - `none`: Disable tools for this query See [Web Search Configuration](../web-search/) for details. ## Model Selection Strategies ### Cost-Optimized Minimize costs with cheap models: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # $0.15/$0.60 per 1M tokens env_api_key: "OPENAI_API_KEY" - provider: "google" model_name: "gemini-2.0-flash-exp" # $0.075/$0.30 per 1M tokens env_api_key: "GOOGLE_API_KEY" ``` **Estimated cost per run** (3 intents): ~(0.003-)0.005 **Use when:** - Running frequent monitoring (hourly/daily) - Testing configuration changes - Limited budget - High query volume ### Quality-Optimized Best accuracy with premium models: ```yaml models: - provider: "openai" model_name: "gpt-4o" # $2.50/$10 per 1M tokens env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" # $3/$15 per 1M tokens env_api_key: "ANTHROPIC_API_KEY" ``` **Estimated cost per run** (3 intents): ~(0.05-)0.10 **Use when:** - Weekly/monthly executive reports - Competitive intelligence deep-dives - High-stakes positioning decisions - Complex queries requiring reasoning ### Balanced Mix of cost and quality: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # Fast, cheap baseline env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" # Quality check env_api_key: "ANTHROPIC_API_KEY" - provider: "perplexity" model_name: "sonar-pro" # Web-grounded env_api_key: "PERPLEXITY_API_KEY" ``` **Estimated cost per run** (3 intents): ~(0.02-)0.04 **Use when:** - Regular monitoring (daily/weekly) - Comparing provider perspectives - Balanced budget - Production use cases ### Fresh Data Web-grounded models for current information: ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` **Use when:** - Monitoring recent product launches - Tracking current events impact - Detecting real-time ranking changes - Competitive news monitoring ### Regional Compliance Models for specific regulatory requirements: ```yaml models: # European providers for GDPR - provider: "mistral" model_name: "mistral-large-latest" env_api_key: "MISTRAL_API_KEY" # Baseline comparison - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` **Use when:** - GDPR compliance required - Data residency requirements - Regional preference testing ## Model Pricing Comparison Current pricing as of November 2024: | Model | Input (per 1M tokens) | Output (per 1M tokens) | Cost per Query\* | | ----------------- | --------------------- | ---------------------- | ---------------- | | gpt-4o-mini | $0.15 | $0.60 | $0.0004 | | gpt-4o | $2.50 | $10.00 | $0.0056 | | claude-3-5-haiku | $0.80 | $4.00 | $0.0022 | | claude-3-5-sonnet | $3.00 | $15.00 | $0.0090 | | mistral-large | $2.00 | $6.00 | $0.0040 | | grok-2-1212 | $2.00 | $10.00 | $0.0054 | | gemini-2.0-flash | $0.075 | $0.30 | $0.0002 | | sonar-pro | $3.00 | $15.00 | $0.0090\*\* | \* Assumes ~150 input tokens + ~500 output tokens per query \*\* Plus request fees (~(0.005-)0.03 per query) Dynamic Pricing LLM Answer Watcher automatically loads current pricing from [llm-prices.com](https://www.llm-prices.com) with 24-hour caching. Prices may change. Check current pricing: ```bash llm-answer-watcher prices show ``` ## Extraction Model Configuration Use a dedicated model for extraction (faster, cheaper than querying main models): ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" # Fast, cheap model env_api_key: "OPENAI_API_KEY" system_prompt: "openai/extraction-default" method: "function_calling" fallback_to_regex: true min_confidence: 0.7 ``` **Benefits:** - **Cost savings**: Use cheap model for extraction - **Speed**: Fast models for quick parsing - **Separation**: Main models for quality, extraction model for structure - **Accuracy**: Function calling more accurate than regex **Recommended extraction models:** - `gpt-4o-mini`: Best balance of speed, cost, accuracy - `gpt-4.1-nano`: Ultra-fast, ultra-cheap (OpenAI only) - `gemini-2.0-flash-exp`: Very fast, very cheap - `claude-3-5-haiku-20241022`: High accuracy, reasonable cost See [Function Calling](../../features/function-calling/) for details. ## Multi-Model Comparison Strategies ### A/B Testing Compare two providers: ```yaml models: # Variant A: OpenAI - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Variant B: Anthropic - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" ``` Analyze results: ```sql -- Compare brand mentions by provider SELECT model_provider, COUNT(*) as total_mentions, AVG(rank_position) as avg_rank FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY model_provider; ``` ### Provider Diversity Query multiple providers for comprehensive coverage: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" - provider: "google" model_name: "gemini-2.0-flash-exp" env_api_key: "GOOGLE_API_KEY" ``` **Benefits:** - Reduce algorithm dependence - Hedge against provider changes - Capture diverse perspectives - Build comprehensive dataset ### Model Size Comparison Compare model sizes within a provider: ```yaml models: # Small model - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Large model - provider: "openai" model_name: "gpt-4o" env_api_key: "OPENAI_API_KEY" ``` Analyze cost vs. quality trade-offs: ```sql -- Compare cost and mention rates by model SELECT model_name, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost_per_query, SUM(CASE WHEN brand IN (SELECT * FROM mine_brands) THEN 1 ELSE 0 END) as my_brand_mentions FROM answers_raw GROUP BY model_name; ``` ## Troubleshooting ### API Key Issues **Problem**: `API key not found: OPENAI_API_KEY` **Solution**: Set the environment variable: ```bash export OPENAI_API_KEY=sk-your-key-here ``` Verify: ```bash echo $OPENAI_API_KEY llm-answer-watcher validate --config watcher.config.yaml ``` ______________________________________________________________________ **Problem**: `Invalid API key for provider openai` **Solution**: Check API key format and validity: ```bash # Test with curl (OpenAI) curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $OPENAI_API_KEY" # Test with curl (Anthropic) curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" ``` Get a new API key from your provider's console. ### Model Not Found **Problem**: `Model not found: gpt-4-mini` **Solution**: Use correct model name: ```yaml # ❌ Wrong (doesn't exist) model_name: "gpt-4-mini" # βœ… Correct model_name: "gpt-4o-mini" ``` Check [provider documentation](https://platform.openai.com/docs/models) for valid models. ### Rate Limiting **Problem**: `Rate limit exceeded for provider openai` **Solution**: LLM Answer Watcher automatically retries with exponential backoff. If persistent: 1. Upgrade to higher rate limits (pay-as-you-go tier) 1. Reduce concurrent queries 1. Add delays between queries: ```yaml run_settings: rate_limit_delay_seconds: 1 # Delay between queries ``` ### Cost Overruns **Problem**: Unexpected high costs **Solution**: Enable budget controls: ```yaml run_settings: budget: enabled: true max_per_run_usd: 1.00 warn_threshold_usd: 0.50 ``` Check estimated costs before running: ```bash llm-answer-watcher run --config watcher.config.yaml --dry-run ``` See [Budget Configuration](../budget/) for details. ## Best Practices ### 1. Start with One Model Begin with a single cheap model: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` Validate your configuration, then expand to multiple models. ### 2. Use Cost-Optimized Models for Frequent Runs Daily/hourly monitoring: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # ~$0.0004 per query env_api_key: "OPENAI_API_KEY" ``` Weekly reports: ```yaml models: - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" # ~$0.009 per query env_api_key: "ANTHROPIC_API_KEY" ``` ### 3. Enable Web Search for Fresh Data When tracking current events: ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` Or: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` ### 4. Separate Extraction Models Use dedicated model for extraction: ```yaml # Main models for quality answers run_settings: models: - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" env_api_key: "ANTHROPIC_API_KEY" # Cheap model for extraction extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" ``` ### 5. Version Control Model Configs Track model changes in git: ```bash git add watcher.config.yaml git commit -m "feat: add Claude 3.5 Sonnet for quality comparison" ``` This creates an audit trail of which models you were using when. ### 6. Monitor Provider Changes Providers update models frequently. Subscribe to: - [OpenAI Blog](https://openai.com/blog) - [Anthropic Blog](https://www.anthropic.com/news) - [Mistral Announcements](https://mistral.ai/news) - [Google AI Blog](https://ai.google/blog) Update your config when new models release. ### 7. Test Before Production Validate new model configurations: ```bash # Dry run to check costs llm-answer-watcher run --config watcher.config.yaml --dry-run # Validate configuration llm-answer-watcher validate --config watcher.config.yaml # Test with single intent llm-answer-watcher run --config watcher.config.yaml --intent best-tools ``` ## Next Steps - **[Brand Configuration](../brands/)**: Optimize brand detection - **[Intent Configuration](../intents/)**: Design effective prompts - **[Budget Configuration](../budget/)**: Control costs - **[Web Search Configuration](../web-search/)**: Enable real-time data - **[Cost Management](../../features/cost-management/)**: Track spending # Brand Configuration Brand configuration defines which brands to track in LLM responses. Proper brand configuration is critical for accurate mention detection and false-positive prevention. ## Brand Categories LLM Answer Watcher tracks two categories of brands: ### Mine Your brand(s) that you want to monitor. **At least one required.** ```yaml brands: mine: - "MyBrand" - "MyBrand.io" - "MyBrand CRM" ``` ### Competitors Competitor brands you want to track for comparison. ```yaml brands: competitors: - "CompetitorA" - "CompetitorB" - "MarketLeader" ``` ## Basic Brand Configuration ### Minimal Example Simplest configuration with single brand: ```yaml brands: mine: - "Warmly" competitors: - "Instantly" - "Lemwarm" ``` ### Comprehensive Example Full configuration with aliases: ```yaml brands: mine: - "Warmly" # Base name - "Warmly.io" # With TLD - "Warmly AI" # With product descriptor competitors: # Direct competitors - "Instantly" - "Lemwarm" - "Smartlead" # Indirect competitors - "HubSpot" - "Salesforce" # Category leaders - "Apollo.io" ``` How Many Competitors? **Recommended**: 5-15 competitors - **Too few**: Miss important context - **Too many**: Dilutes focus, increases noise Focus on competitors that: - Directly compete for the same customers - Appear frequently in buyer comparisons - Represent different market segments ## Brand Alias Strategies ### Why Use Aliases? LLMs may refer to your brand in different ways: - With/without TLD: "Warmly" vs "Warmly.io" - With/without product name: "HubSpot" vs "HubSpot CRM" - Common variations: "Salesforce" vs "SFDC" - Capitalization: "GitHub" vs "Github" ### Alias Best Practices **Include common variations:** ```yaml brands: mine: - "GitHub" - "Github" # Common misspelling - "GitHub.com" - "GitHub Actions" # Product line ``` **TLD variations:** ```yaml brands: mine: - "Stripe" - "Stripe.com" - "stripe.io" # If you own it ``` **Product family variations:** ```yaml brands: mine: - "HubSpot" - "HubSpot CRM" - "HubSpot Marketing Hub" - "HubSpot Sales Hub" ``` **Abbreviations and acronyms:** ```yaml brands: mine: - "Salesforce" - "SFDC" # Common abbreviation - "Salesforce.com" ``` Avoid Over-Aliasing Don't include: - Generic terms: "CRM" (too broad) - Common words: "Hub" (false positives) - Competitor names: Track separately in competitors list ### Case Sensitivity Brand matching is **case-insensitive** by default: ```yaml brands: mine: - "GitHub" # Matches: GitHub, github, GITHUB, GiTHuB ``` You only need one capitalization variant: ```yaml # ❌ Redundant brands: mine: - "GitHub" - "github" - "GITHUB" # βœ… Sufficient brands: mine: - "GitHub" ``` ## Word-Boundary Matching LLM Answer Watcher uses **word-boundary regex** to prevent false positives. ### How It Works Word boundaries (`\b`) ensure brands match only as complete words: ```python pattern = r'\b' + re.escape(brand_alias) + r'\b' ``` **Examples:** | Text | Brand Alias | Matches? | Reason | | ---------------------- | ----------- | -------- | ---------------- | | "Use HubSpot daily" | "HubSpot" | βœ… Yes | Complete word | | "GitHub and HubSpot" | "HubSpot" | βœ… Yes | Complete word | | "Hubspot is great" | "HubSpot" | βœ… Yes | Case-insensitive | | "Use hub for projects" | "Hub" | βœ… Yes | Complete word | | "GitHub has features" | "Hub" | ❌ No | Inside "GitHub" | | "rehub your content" | "Hub" | ❌ No | Inside "rehub" | ### Why Word Boundaries Matter **Without word boundaries** (naive substring matching): ```yaml brands: mine: - "Hub" # ❌ BAD: Matches "GitHub", "HubSpot", "rehub", etc. ``` **With word boundaries** (LLM Answer Watcher default): ```yaml brands: mine: - "Hub" # βœ… GOOD: Only matches "Hub" as complete word ``` Special Characters Word boundaries work with special characters: - `Apollo.io` matches "Apollo.io" but not "Apolloio" - `Slack-Bot` matches "Slack-Bot" but not "SlackBot" ### Testing Word Boundaries Test your brand aliases: ```python import re def test_brand_match(text: str, brand: str) -> bool: pattern = r'\b' + re.escape(brand) + r'\b' return bool(re.search(pattern, text, re.IGNORECASE)) # Test cases print(test_brand_match("Use HubSpot daily", "HubSpot")) # True print(test_brand_match("GitHub and GitLab", "Git")) # False ``` ## Brand Normalization Brands are normalized for deduplication and analysis. ### Normalization Process 1. **Case folding**: Convert to lowercase 1. **TLD removal**: Strip `.com`, `.io`, etc. 1. **Whitespace normalization**: Collapse multiple spaces 1. **Punctuation handling**: Preserve hyphens, remove others **Examples:** | Original | Normalized | Rationale | | ---------------- | ---------------- | ----------------- | | "HubSpot" | "hubspot" | Lowercase | | "HubSpot.com" | "hubspot" | TLD removed | | "Apollo.io" | "apollo" | TLD removed | | "Slack Bot" | "slackbot" | Spaces collapsed | | "GitHub-Actions" | "github-actions" | Hyphens preserved | ### Why Normalization Matters Prevents duplicate counting: ```yaml brands: mine: - "Warmly" - "Warmly.io" ``` LLM response: "I recommend Warmly and Warmly.io for outreach." **Without normalization**: 2 mentions counted **With normalization**: 1 mention counted (both normalize to "warmly") ### Normalized Name in Database The SQLite database stores both: - `brand`: Original matched text - `normalized_name`: Normalized version for deduplication ```sql SELECT brand, normalized_name, COUNT(*) as mentions FROM mentions WHERE run_id = '2025-11-01T08-00-00Z' GROUP BY normalized_name; ``` ## Competitor Selection Strategies ### Direct Competitors Brands solving the same problem for the same audience: ```yaml brands: competitors: # Email warmup tools (if you're Warmly) - "Instantly" - "Lemwarm" - "Smartlead" - "Woodpecker" ``` ### Indirect Competitors Brands in adjacent categories: ```yaml brands: competitors: # If you're an email warmup tool - "HubSpot" # Full sales platform - "Apollo.io" # Sales intelligence - "Salesforce" # Enterprise CRM ``` ### Category Leaders Market-defining brands to benchmark against: ```yaml brands: competitors: # Category leaders (if you're a startup CRM) - "Salesforce" # Enterprise standard - "HubSpot" # SMB leader - "Pipedrive" # Sales-focused ``` ### Segment-Specific Competitors Brands targeting different segments: ```yaml brands: competitors: # Startup segment - "Attio" - "Folk" # SMB segment - "Pipedrive" - "Copper" # Enterprise segment - "Salesforce" - "Microsoft Dynamics" ``` ## Brand Configuration Patterns ### Single Product Company Simple brand with variations: ```yaml brands: mine: - "MyProduct" - "MyProduct.io" - "MyProduct.com" competitors: - "CompetitorA" - "CompetitorB" - "CompetitorC" ``` ### Multi-Product Company Track different product lines: ```yaml brands: mine: - "MyCompany" - "MyCompany CRM" - "MyCompany Marketing" - "MyCompany Sales Hub" competitors: # CRM competitors - "Salesforce" - "HubSpot" # Marketing automation competitors - "Marketo" - "Pardot" ``` ### Parent Company + Subsidiaries Track corporate structure: ```yaml brands: mine: - "ParentCo" - "ProductA" # Subsidiary - "ProductB" # Subsidiary competitors: - "CompetitorCorp" - "CompetitorProduct" ``` ### Rebranded Company Track both old and new names: ```yaml brands: mine: - "NewBrand" # Current name - "OldBrand" # Legacy name (still in training data) - "NewBrand.io" competitors: - "Competitor" ``` ### Regional Variations Track region-specific brands: ```yaml brands: mine: - "MyBrand" # Global - "MyBrand US" - "MyBrand EU" competitors: - "GlobalCompetitor" - "USCompetitor" - "EUCompetitor" ``` ## Advanced Brand Configuration ### Fuzzy Matching Enable fuzzy matching for misspellings (optional): ```yaml extraction_settings: fuzzy_matching: enabled: true threshold: 0.9 # Similarity threshold (0.0-1.0) brands: mine: - "Warmly" # Also matches: "Warmley", "Warmlly" ``` Fuzzy Matching Trade-offs **Pros:** - Catches misspellings - More comprehensive tracking **Cons:** - Higher false-positive rate - Slower extraction - May match unrelated words **Recommended threshold**: 0.9 (very strict) ### Brand Exclusions Exclude certain patterns (advanced): ```yaml brands: mine: - "Apple" exclusions: - "apple pie" # Don't match "apple" in "apple pie" - "apple juice" ``` Exclusions Not Yet Implemented This feature is planned for a future release. Currently, use word boundaries to minimize false positives. ### Brand Categories Group brands by category (for analysis): ```yaml brands: mine: - "MyBrand" competitors: # Tag with category (custom metadata) - name: "CompetitorA" category: "direct" - name: "CompetitorB" category: "direct" - name: "MarketLeader" category: "aspirational" ``` Categories Not Yet Implemented This feature is planned for a future release. Currently, track categories externally. ## Brand Mention Analysis ### Viewing Mentions Query SQLite database: ```sql -- All mentions for a run SELECT brand, COUNT(*) as mentions FROM mentions WHERE run_id = '2025-11-01T08-00-00Z' GROUP BY normalized_name ORDER BY mentions DESC; ``` ```sql -- My brand mentions over time SELECT DATE(timestamp_utc) as date, COUNT(*) as mentions FROM mentions WHERE normalized_name = 'mybrand' GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` ```sql -- Competitor comparison SELECT brand, COUNT(*) as total_mentions, AVG(rank_position) as avg_rank FROM mentions WHERE run_id = '2025-11-01T08-00-00Z' GROUP BY normalized_name ORDER BY avg_rank ASC; ``` ### Mention Metrics Key metrics to track: - **Mention rate**: % of queries where brand appears - **Average rank**: Mean position in ranked lists - **Top-3 rate**: % of mentions in top 3 - **Share of voice**: Your mentions / total mentions Calculate in SQL: ```sql -- Mention rate SELECT (COUNT(DISTINCT CASE WHEN normalized_name = 'mybrand' THEN intent_id END) * 100.0 / COUNT(DISTINCT intent_id)) as mention_rate FROM mentions WHERE run_id = '2025-11-01T08-00-00Z'; ``` ```sql -- Top-3 rate SELECT (SUM(CASE WHEN rank_position <= 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) as top3_rate FROM mentions WHERE normalized_name = 'mybrand' AND run_id = '2025-11-01T08-00-00Z'; ``` ## Validation and Testing ### Validate Brand Configuration Check for common issues: ```bash llm-answer-watcher validate --config watcher.config.yaml ``` **Validation checks:** - At least one brand in `mine` list - No empty brand aliases - No duplicate aliases (warning) - Brand aliases >= 3 characters (warning) ### Test Brand Matching Test your brands against sample text: ```bash # Create test file echo "I recommend HubSpot, Salesforce, and Warmly for CRM." > test.txt # Test matching (hypothetical command) llm-answer-watcher test-brands --config watcher.config.yaml --text test.txt ``` Expected output: ```text βœ… Found 3 brand mentions: - HubSpot (competitor, position 16) - Salesforce (competitor, position 26) - Warmly (mine, position 42) ``` ### Common Validation Errors **Error**: `At least one brand required in 'mine'` ```yaml # ❌ Wrong brands: mine: [] # βœ… Correct brands: mine: - "MyBrand" ``` ______________________________________________________________________ **Error**: `Brand alias too short: "io"` ```yaml # ❌ Warning (high false-positive risk) brands: mine: - "io" # βœ… Better brands: mine: - "MyBrand.io" ``` ______________________________________________________________________ **Warning**: `Duplicate brand alias: "HubSpot"` ```yaml # ❌ Redundant brands: mine: - "HubSpot" competitors: - "HubSpot" # Same brand in both categories! # βœ… Correct brands: mine: - "MyBrand" competitors: - "HubSpot" ``` ## Best Practices ### 1. Start with Core Brand Names Begin with unambiguous brand names: ```yaml brands: mine: - "Warmly" # Clear, unambiguous competitors: - "Instantly" - "Lemwarm" ``` ### 2. Add TLD Variations Gradually Monitor results, then add TLDs if needed: ```yaml # Week 1: Start simple brands: mine: - "Warmly" # Week 2: Add TLD after seeing LLM responses brands: mine: - "Warmly" - "Warmly.io" ``` ### 3. Use Specific Names, Not Generic Terms ```yaml # ❌ Bad (too generic) brands: mine: - "CRM" - "Email" - "Sales Tool" # βœ… Good (specific) brands: mine: - "Warmly CRM" - "Warmly Email" ``` ### 4. Track 10-15 Competitors Maximum Focus on key competitors: ```yaml brands: competitors: # Top 5 direct competitors - "DirectA" - "DirectB" - "DirectC" - "DirectD" - "DirectE" # Top 3 category leaders - "LeaderA" - "LeaderB" - "LeaderC" ``` ### 5. Review Mentions Regularly Check for unexpected matches: ```sql -- Find unexpected brand mentions SELECT brand, answer_text FROM mentions JOIN answers_raw USING (run_id, intent_id) WHERE normalized_name = 'mybrand' AND run_id = '2025-11-01T08-00-00Z'; ``` Look for false positives or missing variations. ### 6. Version Brand Lists Track brand list changes: ```bash git add watcher.config.yaml git commit -m "feat: add HubSpot as competitor" ``` ### 7. Test Before Production Validate brand configuration: ```bash llm-answer-watcher validate --config watcher.config.yaml llm-answer-watcher run --config watcher.config.yaml --dry-run ``` ## Troubleshooting ### False Positives **Problem**: Brand matches where it shouldn't **Example**: "Hub" matches in "GitHub" **Solution**: Use more specific aliases: ```yaml # ❌ Too generic brands: mine: - "Hub" # βœ… More specific brands: mine: - "MyHub" - "MyHub.io" ``` ### False Negatives **Problem**: Brand doesn't match when it should **Example**: LLM says "Warmly.ai" but you only track "Warmly.io" **Solution**: Add missing variation: ```yaml brands: mine: - "Warmly" - "Warmly.io" - "Warmly.ai" # Add missing TLD ``` ### Duplicate Counting **Problem**: Same brand counted multiple times **Example**: "Warmly" and "Warmly.io" counted separately **Solution**: This is expected! Normalization prevents duplicates in analysis: ```sql -- Use normalized_name for deduplication SELECT normalized_name, COUNT(*) as mentions FROM mentions GROUP BY normalized_name; -- Use brand to see exact matches SELECT brand, COUNT(*) as raw_mentions FROM mentions GROUP BY brand; ``` ### Brand Not Found **Problem**: Brand not detected in LLM response **Possible causes:** 1. **LLM didn't mention it**: Check raw response 1. **Misspelling**: Add variations or enable fuzzy matching 1. **Different phrasing**: LLM used different name **Debug:** ```sql -- Check raw response SELECT answer_text FROM answers_raw WHERE run_id = '2025-11-01T08-00-00Z' AND intent_id = 'best-tools'; ``` Look for how LLM referred to your brand. ## Next Steps - **[Intent Configuration](../intents/)**: Design prompts that surface your brand - **[Rank Extraction](../../features/rank-extraction/)**: Understand how ranking works - **[Brand Detection](../../features/brand-detection/)**: Deep dive into detection algorithms - **[Historical Tracking](../../features/historical-tracking/)**: Analyze brand trends over time # Intent Configuration Intents are the questions you ask LLMs to test brand visibility. Well-designed intents produce actionable insights about how LLMs recommend your brand versus competitors. ## What is an Intent? An **intent** represents a buyer-journey question that prospects might ask an LLM when researching solutions. **Examples:** - "What are the best CRM tools for startups?" - "Compare HubSpot vs Salesforce for small teams" - "How do I improve email deliverability?" ## Basic Intent Configuration ### Minimal Intent Simplest intent with required fields: ```yaml intents: - id: "best-tools" prompt: "What are the best tools for my category?" ``` **Required fields:** - `id`: Unique identifier (alphanumeric, hyphens, underscores) - `prompt`: Natural language question ### Multiple Intents Test different buyer scenarios: ```yaml intents: - id: "best-tools-general" prompt: "What are the best email warmup tools?" - id: "best-tools-startups" prompt: "What are the best email warmup tools for startups?" - id: "comparison-with-competitor" prompt: "Compare Instantly vs Warmly for email warmup" ``` How Many Intents? **Recommended**: 3-10 intents - **Too few**: Limited coverage of buyer journey - **Too many**: High costs, slow execution Focus on intents that represent actual buyer questions. ## Intent Design Principles ### 1. Natural Language Write prompts as real users would ask: ```yaml # βœ… Good: Natural question intents: - id: "best-crm-startups" prompt: "What's the best CRM for early-stage startups?" # ❌ Bad: Unnatural phrasing intents: - id: "crm-query" prompt: "List CRM software products ranked by quality for startup segment" ``` ### 2. Buyer-Focused Imply purchase intent: ```yaml # βœ… Good: Clear purchase intent intents: - id: "best-email-tools" prompt: "What are the best email warmup tools to buy?" # ❌ Bad: Informational query intents: - id: "email-info" prompt: "What is email warming?" ``` ### 3. Ranking-Friendly Ask for ranked or ordered lists: ```yaml # βœ… Good: Implies ranking intents: - id: "top-tools" prompt: "What are the top 5 email warmup tools?" # ❌ Bad: No ranking signal intents: - id: "tools-info" prompt: "Tell me about email warmup tools" ``` ### 4. Specific Use Cases Target specific scenarios: ```yaml # βœ… Good: Specific use case intents: - id: "best-for-cold-email" prompt: "What are the best email warmup tools for cold outreach campaigns?" # ❌ Bad: Too generic intents: - id: "email-tools" prompt: "What are email tools?" ``` ## Intent Patterns ### Category Leadership Test if your brand is considered a category leader: ```yaml intents: - id: "best-in-category" prompt: "What are the best [category] tools?" - id: "top-choices" prompt: "What are the top [category] platforms?" - id: "leading-solutions" prompt: "What are the leading [category] solutions?" ``` ### Segment-Specific Target different customer segments: ```yaml intents: # Startup segment - id: "best-for-startups" prompt: "What are the best CRM tools for early-stage startups?" # SMB segment - id: "best-for-smb" prompt: "What are the best CRM tools for small businesses?" # Enterprise segment - id: "best-for-enterprise" prompt: "What are the best CRM tools for large enterprises?" ``` ### Use-Case Specific Target specific jobs-to-be-done: ```yaml intents: - id: "improve-deliverability" prompt: "What tools can help me improve email deliverability?" - id: "warm-cold-emails" prompt: "How can I warm up my email domain for cold outreach?" - id: "avoid-spam" prompt: "What tools help me avoid the spam folder?" ``` ### Competitive Comparison Test head-to-head comparisons: ```yaml intents: - id: "vs-main-competitor" prompt: "Compare [YourBrand] vs [MainCompetitor] for [use case]" - id: "alternatives-to-competitor" prompt: "What are the best alternatives to [Competitor]?" - id: "hubspot-replacement" prompt: "What's the best replacement for HubSpot for small teams?" ``` ### Problem-Solution Frame around customer pain points: ```yaml intents: - id: "solve-deliverability" prompt: "My emails are going to spam. What tools can help?" - id: "improve-open-rates" prompt: "How can I improve my email open rates?" - id: "scale-outreach" prompt: "What tools help me scale cold email outreach?" ``` ### Buying Journey Stages Target different stages: ```yaml intents: # Awareness: "What is...?" - id: "awareness" prompt: "What is email warmup and why do I need it?" # Consideration: "What are the options?" - id: "consideration" prompt: "What are the best email warmup tools?" # Decision: "Which should I choose?" - id: "decision" prompt: "Should I use Warmly or Instantly for email warmup?" ``` ## Advanced Intent Configuration ### Intent with Operations Run custom operations after each query: ```yaml intents: - id: "best-email-tools" prompt: "What are the best email warmup tools?" operations: - id: "content-gaps" description: "Identify content opportunities" prompt: | Analyze this LLM response and identify content gaps that could improve our ranking. My brand: {brand:mine} Current rank: {rank:mine} Response: {intent:response} Provide 3 specific content recommendations. model: "gpt-4o-mini" - id: "competitor-analysis" description: "Extract competitor strengths" prompt: | What strengths are mentioned for each competitor? Competitors: {competitors:mentioned} Response: {intent:response} model: "gpt-4o-mini" ``` See [Operations Configuration](../operations/) for details. ### Intent with Dependencies Chain operations with dependencies: ```yaml intents: - id: "best-crm-tools" prompt: "What are the best CRM tools for startups?" operations: - id: "extract-features" description: "Extract features mentioned" prompt: "Extract features mentioned for each tool: {intent:response}" model: "gpt-4o-mini" - id: "gap-analysis" description: "Identify feature gaps" prompt: | Based on these features: {operation:extract-features} What features are missing from {brand:mine} compared to competitors? depends_on: ["extract-features"] model: "gpt-4o-mini" ``` ### Intent with Custom Metadata Add metadata for analysis (future feature): ```yaml intents: - id: "best-tools" prompt: "What are the best email warmup tools?" metadata: stage: "consideration" priority: "high" segment: "smb" ``` Metadata Not Yet Implemented Custom metadata is planned for a future release. ## Intent ID Naming Conventions Intent IDs must be: - Unique across the configuration - Alphanumeric with hyphens and underscores - Descriptive and readable **Good intent IDs:** ```yaml intents: - id: "best-email-warmup-tools" - id: "hubspot-alternatives-smb" - id: "improve-deliverability-2025" - id: "warmly-vs-instantly" ``` **Bad intent IDs:** ```yaml # ❌ Not descriptive intents: - id: "intent1" - id: "test" - id: "query" # ❌ Invalid characters intents: - id: "best tools" # Space not allowed - id: "best-tools!" # Special char not allowed - id: "best/tools" # Slash not allowed ``` Intent ID Best Practices - Use descriptive names that explain the intent - Include segment/use-case in ID if relevant - Use hyphens for readability: `best-crm-for-startups` - Keep under 50 characters - Avoid special characters except `-` and `_` ## Prompt Engineering for Intents ### Effective Prompt Patterns **Pattern 1: Top N Format** ```yaml intents: - id: "top-5-crm" prompt: "What are the top 5 CRM tools for startups in 2025?" ``` Benefits: - Clear ranking expectation - Limited scope (5 items) - Time-bound (2025) ______________________________________________________________________ **Pattern 2: Use-Case Specific** ```yaml intents: - id: "best-for-cold-email" prompt: "What are the best email warmup tools specifically for cold email campaigns?" ``` Benefits: - Targets specific use case - Filters out generic responses - Relevant to your positioning ______________________________________________________________________ **Pattern 3: Comparison** ```yaml intents: - id: "compare-top-tools" prompt: "Compare the top email warmup tools for improving deliverability" ``` Benefits: - Encourages detailed analysis - Shows relative positioning - Highlights differentiators ______________________________________________________________________ **Pattern 4: Problem-Oriented** ```yaml intents: - id: "solve-spam-problem" prompt: "My sales emails are going to spam. What tools can help me fix this?" ``` Benefits: - Natural buyer question - Solution-focused - Real pain point ______________________________________________________________________ **Pattern 5: Segment-Specific** ```yaml intents: - id: "best-for-startups" prompt: "What's the best CRM for a 10-person startup with limited budget?" ``` Benefits: - Targets specific segment - Includes constraints (budget) - Realistic buyer scenario ### Prompt Length **Recommended**: 10-30 words ```yaml # βœ… Good: Clear and concise intents: - id: "best-tools" prompt: "What are the best email warmup tools for cold outreach in 2025?" # ❌ Too short: Lacks context intents: - id: "tools" prompt: "Email tools?" # ❌ Too long: Overly specific intents: - id: "detailed-query" prompt: "I am a sales development representative at a B2B SaaS startup with 5 SDRs sending approximately 500 cold emails per day and we're experiencing deliverability issues with 40% of our emails going to spam, what are the absolute best email warmup tools that can help us improve our domain reputation and inbox placement rate while being cost-effective for a startup budget?" ``` ### Time-Bounding Prompts Include year for current recommendations: ```yaml # βœ… Good: Time-bound intents: - id: "best-tools-2025" prompt: "What are the best email warmup tools in 2025?" # ⚠️ Generic: May return outdated info intents: - id: "best-tools" prompt: "What are the best email warmup tools?" ``` Training Data Cutoff Most LLMs have training data cutoffs (e.g., October 2023 for GPT-4). Time-bounding may not help unless: - Using web search-enabled models - Using Perplexity (real-time web search) - Using models with recent training data ### Neutral vs. Biased Prompts **Neutral prompts** (recommended): ```yaml intents: - id: "best-tools" prompt: "What are the best email warmup tools?" ``` **Biased prompts** (avoid): ```yaml # ❌ Biased toward your brand intents: - id: "why-warmly-best" prompt: "Why is Warmly the best email warmup tool?" # ❌ Biased against competitor intents: - id: "hubspot-problems" prompt: "What are the problems with HubSpot?" ``` Neutral prompts give you realistic brand positioning data. ## Intent Validation ### Validate Intent Configuration Check for common issues: ```bash llm-answer-watcher validate --config watcher.config.yaml ``` **Validation checks:** - At least one intent configured - Intent IDs are unique - Intent IDs are valid (alphanumeric, hyphens, underscores) - Prompts are non-empty - Prompts are at least 10 characters ### Common Validation Errors **Error**: `At least one intent must be configured` ```yaml # ❌ Wrong intents: [] # βœ… Correct intents: - id: "best-tools" prompt: "What are the best tools?" ``` ______________________________________________________________________ **Error**: `Duplicate intent IDs found: best-tools` ```yaml # ❌ Wrong intents: - id: "best-tools" prompt: "What are the best CRM tools?" - id: "best-tools" # Duplicate! prompt: "What are the best email tools?" # βœ… Correct intents: - id: "best-crm-tools" prompt: "What are the best CRM tools?" - id: "best-email-tools" prompt: "What are the best email tools?" ``` ______________________________________________________________________ **Error**: `Intent ID must be alphanumeric with hyphens/underscores: best tools!` ```yaml # ❌ Wrong (space and special char) intents: - id: "best tools!" prompt: "What are the best tools?" # βœ… Correct intents: - id: "best-tools" prompt: "What are the best tools?" ``` ## Intent Organization Strategies ### By Buyer Journey Stage Organize intents by funnel stage: ```yaml intents: # Awareness stage - id: "awareness-what-is-email-warmup" prompt: "What is email warmup and why is it important?" - id: "awareness-deliverability-problems" prompt: "Why are my emails going to spam?" # Consideration stage - id: "consideration-best-tools" prompt: "What are the best email warmup tools?" - id: "consideration-tool-comparison" prompt: "Compare the top email warmup platforms" # Decision stage - id: "decision-warmly-vs-instantly" prompt: "Should I use Warmly or Instantly?" - id: "decision-pricing" prompt: "What's the most cost-effective email warmup tool?" ``` ### By Customer Segment Organize intents by target segment: ```yaml intents: # Startup segment - id: "startup-best-crm" prompt: "What's the best CRM for early-stage startups?" - id: "startup-affordable-tools" prompt: "What are affordable CRM options for startups?" # SMB segment - id: "smb-best-crm" prompt: "What's the best CRM for small businesses?" - id: "smb-easy-setup" prompt: "What's the easiest CRM to set up for a 20-person team?" # Enterprise segment - id: "enterprise-best-crm" prompt: "What's the best enterprise CRM platform?" - id: "enterprise-scalable" prompt: "What CRM platforms scale to 1000+ users?" ``` ### By Use Case Organize intents by jobs-to-be-done: ```yaml intents: # Use case: Cold email - id: "cold-email-best-tools" prompt: "What are the best tools for cold email outreach?" - id: "cold-email-deliverability" prompt: "How can I improve cold email deliverability?" # Use case: Account-based sales - id: "abs-best-tools" prompt: "What are the best tools for account-based sales?" - id: "abs-personalization" prompt: "What tools help personalize outreach at scale?" # Use case: Lead nurturing - id: "nurture-best-tools" prompt: "What are the best tools for lead nurturing?" ``` ### By Competitor Track competitive positioning: ```yaml intents: # vs. Main Competitor - id: "vs-instantly" prompt: "Compare Warmly vs Instantly for email warmup" - id: "alternatives-to-instantly" prompt: "What are the best alternatives to Instantly?" # vs. Market Leader - id: "vs-hubspot" prompt: "Compare Warmly vs HubSpot for sales outreach" - id: "alternatives-to-hubspot" prompt: "What are the best alternatives to HubSpot for startups?" ``` ## Testing Intent Prompts ### Manual Testing Test prompts with ChatGPT/Claude before adding: 1. Ask the prompt directly 1. Check if response includes ranked lists 1. Verify brand mentions 1. Adjust prompt as needed ### A/B Testing Intents Compare prompt variations: ```yaml intents: # Variation A: Generic - id: "best-tools-generic" prompt: "What are the best email warmup tools?" # Variation B: Specific - id: "best-tools-specific" prompt: "What are the best email warmup tools for cold outreach in 2025?" ``` Compare results to see which prompt surfaces your brand better. ### Iteration Process 1. **Start broad**: Test generic prompts 1. **Analyze results**: Check brand mention rates 1. **Refine prompts**: Add specificity where needed 1. **Test again**: Compare refined vs. original 1. **Keep winners**: Use prompts with best brand visibility ## Intent Metrics Track intent performance: ```sql -- Mention rate by intent SELECT intent_id, COUNT(DISTINCT run_id) as runs, SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') THEN 1 ELSE 0 END) as my_brand_mentions, (SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT run_id)) as mention_rate FROM mentions GROUP BY intent_id ORDER BY mention_rate DESC; ``` ```sql -- Average rank by intent SELECT intent_id, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank, COUNT(*) as total_mentions FROM mentions WHERE normalized_name IN ('mybrand', 'mybrand.io') GROUP BY intent_id ORDER BY avg_rank ASC; ``` ```sql -- Top-performing intents SELECT intent_id, COUNT(*) as queries, SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') AND rank_position <= 3 THEN 1 ELSE 0 END) as top3_mentions, (SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') AND rank_position <= 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) as top3_rate FROM mentions GROUP BY intent_id ORDER BY top3_rate DESC; ``` ## Best Practices ### 1. Start with 3-5 Core Intents Begin with essential buyer questions: ```yaml intents: - id: "best-tools" prompt: "What are the best [category] tools?" - id: "best-for-startups" prompt: "What's the best [category] tool for startups?" - id: "vs-main-competitor" prompt: "Compare [YourBrand] vs [MainCompetitor]" ``` ### 2. Test Prompts Manually First Before adding to config, test with ChatGPT/Claude: - Does it produce ranked lists? - Does it mention your brand? - Is the response format consistent? ### 3. Use Natural Language Write prompts as real users would ask: ```yaml # βœ… Good intents: - id: "improve-deliverability" prompt: "How can I improve my email deliverability?" # ❌ Bad intents: - id: "deliverability" prompt: "EMAIL_DELIVERABILITY_TOOLS_QUERY" ``` ### 4. Include Ranking Signals Ask for "best", "top", or "recommended": ```yaml intents: - id: "best-tools" prompt: "What are the best email warmup tools?" # "best" = ranking signal - id: "top-tools" prompt: "What are the top 5 CRM platforms?" # "top 5" = ranking signal ``` ### 5. Version Control Intents Track intent changes with git: ```bash git add watcher.config.yaml git commit -m "feat: add cold-email intent for startup segment" ``` ### 6. Monitor Intent Performance Review which intents surface your brand: ```sql SELECT intent_id, COUNT(*) as my_brand_mentions FROM mentions WHERE normalized_name = 'mybrand' GROUP BY intent_id ORDER BY my_brand_mentions DESC; ``` Focus on high-performing intents, retire low-performers. ### 7. Update Prompts Based on Results Iterate on prompts: ```yaml # Original (low brand mentions) - id: "best-tools" prompt: "What are email tools?" # Improved (higher brand mentions) - id: "best-tools" prompt: "What are the best email warmup tools for cold outreach?" ``` ## Troubleshooting ### Brand Not Mentioned **Problem**: Your brand doesn't appear in LLM responses **Possible causes:** 1. **Generic prompt**: Too broad, LLM focuses on market leaders 1. **Wrong segment**: Prompt targets different customer segment 1. **Outdated training data**: LLM trained before your brand existed **Solutions:** - Make prompt more specific to your use case - Target your niche/segment explicitly - Use web search-enabled models for fresh data ______________________________________________________________________ ### Inconsistent Responses **Problem**: Different responses for same intent across runs **Cause**: LLM non-determinism (temperature > 0) **Solution**: Use lower temperature for consistency: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" temperature: 0.0 # Deterministic ``` ______________________________________________________________________ ### No Ranked Lists **Problem**: LLM doesn't provide ranked lists **Cause**: Prompt doesn't request ranking **Solution**: Add ranking signal: ```yaml # ❌ Before - id: "tools" prompt: "Tell me about email warmup tools" # βœ… After - id: "top-tools" prompt: "What are the top 5 email warmup tools ranked by quality?" ``` ## Next Steps - **[Brand Configuration](../brands/)**: Optimize brand detection - **[Operations Configuration](../operations/)**: Automate post-query analysis - **[Rank Extraction](../../features/rank-extraction/)**: Understand ranking detection - **[HTML Reports](../../features/html-reports/)**: Visualize intent results # Budget Configuration Budget controls prevent runaway costs by setting spending limits before execution starts. LLM Answer Watcher validates estimated costs against your budget and aborts if limits would be exceeded. ## Why Budget Controls? LLM API costs can add up quickly: - **Testing**: Multiple intents Γ— multiple models = high query volume - **Mistakes**: Accidental loops or configuration errors - **Provider changes**: Pricing updates or model changes - **Experimentation**: Trying new configurations without cost awareness Budget controls ensure you never spend more than intended. ## Basic Budget Configuration ### Enabling Budget Controls Add a `budget` section to `run_settings`: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" budget: enabled: true max_per_run_usd: 1.00 # Hard limit: abort if total > $1.00 max_per_intent_usd: 0.10 # Hard limit: abort if any intent > $0.10 warn_threshold_usd: 0.50 # Warning: log if total > $0.50 (but continue) ``` ### Disabling Budget Controls For unlimited spending: ```yaml run_settings: budget: enabled: false # No cost limits ``` Disabled Budgets Only disable budgets when: - You fully understand costs - Running production monitoring with known costs - Budget controls interfere with automation **Recommendation**: Keep budgets enabled even in production. ## Budget Parameters ### `enabled` (boolean) Enable or disable budget enforcement. ```yaml budget: enabled: true # Enforce budget limits ``` **Default**: `false` (budgets disabled) **Recommended**: `true` for all use cases ______________________________________________________________________ ### `max_per_run_usd` (float) Maximum total cost per run (all intents Γ— all models). ```yaml budget: max_per_run_usd: 1.00 # Abort if total estimated cost > $1.00 ``` **Calculation**: ```text max_per_run = (num_intents Γ— num_models) Γ— avg_cost_per_query ``` **Example**: - 3 intents Γ— 2 models = 6 queries - Average cost: $0.005 per query - Total estimated cost: $0.03 - Budget limit: $1.00 - βœ… **Result**: Run proceeds ______________________________________________________________________ ### `max_per_intent_usd` (float) Maximum cost per single intent (across all models). ```yaml budget: max_per_intent_usd: 0.10 # Abort if any single intent > $0.10 ``` **Calculation**: ```text max_per_intent = num_models Γ— avg_cost_per_query ``` **Example**: - 3 models for one intent - Average cost: $0.005 per query - Intent cost: $0.015 - Budget limit: $0.10 - βœ… **Result**: Intent proceeds **Use case**: Prevent expensive intents with long prompts or web search. ______________________________________________________________________ ### `warn_threshold_usd` (float) Warning threshold (logs warning but continues). ```yaml budget: warn_threshold_usd: 0.50 # Log warning if total > $0.50 ``` **Behavior**: - If `estimated_cost <= warn_threshold`: Silent execution - If `warn_threshold < estimated_cost <= max_per_run`: Log warning, continue - If `estimated_cost > max_per_run`: Abort execution **Example output**: ```text ⚠️ Cost warning: Estimated run cost $0.75 exceeds warning threshold of $0.50 Budget limit: $1.00 (OK to proceed) Run will execute 12 queries across 3 intents and 4 models. ``` ## Budget Configuration Patterns ### Development / Testing Strict limits for experimentation: ```yaml run_settings: budget: enabled: true max_per_run_usd: 0.10 # Very low limit max_per_intent_usd: 0.05 # Catch expensive intents early warn_threshold_usd: 0.05 # Warn at same level as max ``` **Use when:** - Testing configuration changes - Developing new intents - Running frequent test runs - Learning the tool ______________________________________________________________________ ### Production Monitoring Balanced limits for regular monitoring: ```yaml run_settings: budget: enabled: true max_per_run_usd: 5.00 # Reasonable daily limit max_per_intent_usd: 0.50 # Prevent runaway intent costs warn_threshold_usd: 2.50 # Alert if > $2.50 ``` **Use when:** - Daily/weekly monitoring - Established configuration - Known cost profile - Production use ______________________________________________________________________ ### CI/CD Pipelines Conservative limits for automated runs: ```yaml run_settings: budget: enabled: true max_per_run_usd: 0.50 # Low limit for automated runs max_per_intent_usd: 0.10 warn_threshold_usd: 0.25 ``` **Use when:** - Automated testing - Pull request checks - Continuous monitoring - High-frequency runs ______________________________________________________________________ ### Executive Reports Higher limits for comprehensive analysis: ```yaml run_settings: budget: enabled: true max_per_run_usd: 25.00 # Higher limit for quality models max_per_intent_usd: 2.00 warn_threshold_usd: 10.00 ``` **Use when:** - Monthly executive reports - Using premium models (GPT-4, Claude Opus) - Comprehensive competitive analysis - Deep-dive research ______________________________________________________________________ ### Warning-Only Mode Logs warnings but never aborts: ```yaml run_settings: budget: enabled: true max_per_run_usd: 999999.99 # Effectively unlimited max_per_intent_usd: 999999.99 warn_threshold_usd: 1.00 # But warn at $1 ``` **Use when:** - Production monitoring with known costs - Don't want aborts to break automation - Still want cost visibility Use with Caution This defeats the purpose of budget controls. Only use when you fully understand cost implications. ## Cost Estimation LLM Answer Watcher estimates costs **before** execution using: ### Estimation Formula ```python estimated_cost = ( (input_tokens Γ— input_price_per_token) + (output_tokens Γ— output_price_per_token) ) Γ— safety_buffer ``` **Parameters:** - `input_tokens`: Estimated from prompt length (~150 tokens) - `output_tokens`: Estimated average response (~500 tokens) - `input_price_per_token`: From llm-prices.com (cached 24h) - `output_price_per_token`: From llm-prices.com (cached 24h) - `safety_buffer`: 1.2 (20% buffer for variance) ### Estimation Accuracy Cost estimates are **approximate**: - **Actual costs**: May vary Β±20% from estimates - **Factors affecting accuracy**: - Prompt length (longer = higher input cost) - Response length (varies by model and prompt) - Web search usage (adds (10-)25 per 1k calls) - Function calling (may increase token usage) Estimation Accuracy Estimates are conservative (tend to overestimate). Actual costs are typically 10-20% lower than estimated. ### Checking Estimated Costs **Before running:** ```bash llm-answer-watcher run --config watcher.config.yaml --dry-run ``` Output: ```text πŸ’° Cost Estimation: β”œβ”€β”€ OpenAI gpt-4o-mini: $0.0004 per query Γ— 3 intents = $0.0012 β”œβ”€β”€ Anthropic claude-3-5-haiku: $0.0022 per query Γ— 3 intents = $0.0066 β”œβ”€β”€ Safety buffer (20%): +$0.0016 └── Total estimated cost: $0.0094 βœ… Budget check passed: β”œβ”€β”€ Estimated cost: $0.0094 β”œβ”€β”€ Budget limit: $1.00 └── Remaining budget: $0.9906 ``` **After running:** Check actual costs in `run_meta.json`: ```json { "run_id": "2025-11-01T08-00-00Z", "total_cost_usd": 0.0087, "estimated_cost_usd": 0.0094, "cost_accuracy": 92.6 } ``` ## Dynamic Pricing LLM Answer Watcher automatically loads current pricing from [llm-prices.com](https://www.llm-prices.com). ### How Dynamic Pricing Works 1. **On first run**: Fetch pricing from llm-prices.com 1. **Cache for 24 hours**: Store in `~/.cache/llm-answer-watcher/pricing.json` 1. **Auto-refresh**: Re-fetch after 24 hours 1. **Fallback**: Use hardcoded prices if API unavailable ### Viewing Current Pricing ```bash # Show all models llm-answer-watcher prices show # Show specific provider llm-answer-watcher prices show --provider openai # Show specific model llm-answer-watcher prices show --model gpt-4o-mini # Export as JSON llm-answer-watcher prices list --format json ``` Example output: ```text πŸ’° Current LLM Pricing (as of 2025-11-01): OpenAI: gpt-4o-mini: Input: $0.15 per 1M tokens Output: $0.60 per 1M tokens gpt-4o: Input: $2.50 per 1M tokens Output: $10.00 per 1M tokens Anthropic: claude-3-5-haiku-20241022: Input: $0.80 per 1M tokens Output: $4.00 per 1M tokens ``` ### Forcing Pricing Refresh ```bash # Force refresh (ignore cache) llm-answer-watcher prices refresh --force # Verify pricing updated llm-answer-watcher prices show ``` ### Pricing Cache Location - **Linux/Mac**: `~/.cache/llm-answer-watcher/pricing.json` - **Windows**: `%LOCALAPPDATA%/llm-answer-watcher/pricing.json` Clear cache: ```bash rm ~/.cache/llm-answer-watcher/pricing.json ``` ## Web Search Costs Web search adds additional costs beyond token usage. ### OpenAI Web Search Pricing | Model Tier | Cost per 1,000 Calls | Content Tokens | | -------------------------- | -------------------- | --------------- | | Standard (all models) | $10 | @ model rate | | gpt-4o-mini, gpt-4.1-mini | $10 | Fixed 8k tokens | | Preview reasoning (o1, o3) | $10 | @ model rate | | Preview non-reasoning | $25 | **FREE** | ### Web Search Cost Calculation ```python # Standard model web_search_cost = ( (num_searches Γ— $0.01) + # $10 per 1k calls (search_tokens Γ— input_price_per_token) ) # Mini models (fixed 8k tokens) web_search_cost = ( (num_searches Γ— $0.01) + (8000 Γ— input_price_per_token) ) ``` ### Estimating Web Search Costs ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" ``` **Estimated cost per query** (with web search): - Base query: $0.0004 (tokens) - Web search call: $0.01 (per call) - Web search content: $0.0012 (8k tokens @ $0.15/1M) - **Total**: ~$0.0116 per query See [Web Search Configuration](../web-search/) for details. ### Perplexity Request Fees Perplexity charges **request fees** in addition to token costs: - Basic searches: ~$0.005 per request - Complex searches: ~(0.01-)0.03 per request Perplexity Costs Not Fully Estimated Request fees are **not yet included** in cost estimates. Budget accordingly when using Perplexity: ```yaml budget: max_per_run_usd: 2.00 # Higher buffer for Perplexity ``` ## Budget Enforcement Behavior ### Pre-Execution Validation Budget validation happens **before** any LLM calls: 1. Load configuration 1. Estimate total cost 1. Check against budget limits 1. If budget exceeded: **Abort immediately** 1. If budget OK: Proceed with execution **No LLM calls are made if budget would be exceeded.** ### Abort on Budget Exceeded When budget is exceeded: ```bash llm-answer-watcher run --config watcher.config.yaml ``` Output: ```text ❌ Budget exceeded: Estimated run cost $1.25 exceeds max_per_run_usd budget of $1.00. Run would execute 12 queries: β”œβ”€β”€ 3 intents Γ— 4 models = 12 queries β”œβ”€β”€ Estimated cost: $1.25 β”œβ”€β”€ Budget limit: $1.00 └── Overage: $0.25 Options: 1. Reduce number of models or intents 2. Increase budget limit in watcher.config.yaml 3. Use --force to override budget (not recommended) ``` **Exit code**: `1` (configuration error) ### Force Override Override budget limits (use with caution): ```bash llm-answer-watcher run --config watcher.config.yaml --force ``` Output: ```text ⚠️ Budget check OVERRIDDEN with --force flag Estimated cost: $1.25 Budget limit: $1.00 Overage: $0.25 Proceeding anyway... ``` Force Override Only use `--force` when: - You understand exact costs - Budget limit is incorrect - Emergency production run **Never** use `--force` in automated scripts. ### Warning Threshold Behavior When cost exceeds warning threshold (but not max): ```bash llm-answer-watcher run --config watcher.config.yaml ``` Output: ```text ⚠️ Cost warning: Estimated run cost $0.75 exceeds warning threshold of $0.50 β”œβ”€β”€ Estimated cost: $0.75 β”œβ”€β”€ Warning threshold: $0.50 β”œβ”€β”€ Budget limit: $1.00 └── Status: OK to proceed Run will execute 12 queries. Continue? [Y/n] ``` **Behavior**: - In human mode: Prompt for confirmation - With `--yes` flag: Continue automatically - In agent mode: Continue automatically (warning logged) ## Cost Tracking ### Per-Run Cost Summary After each run, check `run_meta.json`: ```json { "run_id": "2025-11-01T08-00-00Z", "timestamp_utc": "2025-11-01T08:00:00Z", "total_cost_usd": 0.0142, "estimated_cost_usd": 0.0168, "cost_accuracy_percent": 84.5, "queries_completed": 6, "queries_failed": 0, "cost_by_provider": { "openai": 0.0048, "anthropic": 0.0094 }, "cost_by_model": { "gpt-4o-mini": 0.0048, "claude-3-5-haiku-20241022": 0.0094 } } ``` ### Historical Cost Analysis Query SQLite database: ```sql -- Total spending SELECT SUM(total_cost_usd) as total_spent FROM runs; -- Spending by week SELECT strftime('%Y-W%W', timestamp_utc) as week, SUM(total_cost_usd) as weekly_cost, COUNT(*) as runs FROM runs GROUP BY week ORDER BY week DESC; -- Spending by model SELECT model_name, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost_per_query FROM answers_raw GROUP BY model_name ORDER BY total_cost DESC; -- Spending by intent SELECT intent_id, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost_per_query FROM answers_raw GROUP BY intent_id ORDER BY total_cost DESC; ``` ### Monthly Budget Tracking Track spending vs. monthly budget: ```sql -- Current month spending SELECT SUM(total_cost_usd) as month_to_date FROM runs WHERE strftime('%Y-%m', timestamp_utc) = strftime('%Y-%m', 'now'); -- Monthly trend SELECT strftime('%Y-%m', timestamp_utc) as month, SUM(total_cost_usd) as monthly_cost, COUNT(*) as runs, AVG(total_cost_usd) as avg_cost_per_run FROM runs GROUP BY month ORDER BY month DESC; ``` ## Best Practices ### 1. Always Enable Budgets Even in production: ```yaml run_settings: budget: enabled: true max_per_run_usd: 10.00 # Reasonable safety limit ``` ### 2. Set Conservative Limits Start low, increase as needed: ```yaml # Week 1: Very conservative budget: max_per_run_usd: 0.10 # Week 2: Based on actual usage budget: max_per_run_usd: 0.50 # Production: 2x observed average budget: max_per_run_usd: 1.00 ``` ### 3. Use Warning Thresholds Get alerts before hitting limits: ```yaml budget: max_per_run_usd: 1.00 warn_threshold_usd: 0.50 # Alert at 50% of limit ``` ### 4. Separate Budgets by Environment Different limits for different environments: ```yaml # dev.config.yaml run_settings: budget: max_per_run_usd: 0.10 # prod.config.yaml run_settings: budget: max_per_run_usd: 5.00 ``` ### 5. Monitor Actual vs. Estimated Costs Track estimation accuracy: ```sql SELECT AVG(total_cost_usd / estimated_cost_usd) as avg_accuracy, MIN(total_cost_usd / estimated_cost_usd) as min_accuracy, MAX(total_cost_usd / estimated_cost_usd) as max_accuracy FROM runs WHERE estimated_cost_usd > 0; ``` Adjust safety buffer if needed. ### 6. Account for Web Search Costs Budget higher when using web search: ```yaml # Without web search budget: max_per_run_usd: 0.50 # With web search (10x higher) budget: max_per_run_usd: 5.00 ``` ### 7. Use Dry Runs Check costs before running: ```bash llm-answer-watcher run --config watcher.config.yaml --dry-run ``` ## Troubleshooting ### Budget Always Exceeded **Problem**: Every run exceeds budget **Possible causes:** 1. Too many intents or models 1. Budget limit too low 1. Expensive models (GPT-4, Claude Opus) 1. Web search enabled **Solutions:** ```yaml # Reduce intents intents: - id: "primary-intent" prompt: "Most important question" # Reduce models models: - provider: "openai" model_name: "gpt-4o-mini" # Cheapest option # Increase budget budget: max_per_run_usd: 2.00 # Higher limit # Disable web search models: - provider: "openai" model_name: "gpt-4o-mini" # Remove tools section ``` ______________________________________________________________________ ### Estimated Costs Inaccurate **Problem**: Actual costs differ significantly from estimates **Possible causes:** 1. Longer/shorter responses than expected 1. Web search usage not estimated correctly 1. Pricing data outdated 1. Function calling adds tokens **Solutions:** ```bash # Refresh pricing llm-answer-watcher prices refresh --force # Check estimation accuracy # (in run_meta.json after run) cat output/2025-11-01T08-00-00Z/run_meta.json | jq '.cost_accuracy_percent' # Adjust safety buffer if needed (future feature) ``` ______________________________________________________________________ ### Budget Blocks Valid Runs **Problem**: Budget blocks run that should be allowed **Cause**: Budget limit too conservative **Solution**: Increase limit based on historical data: ```sql -- Check average run cost SELECT AVG(total_cost_usd) as avg_cost, MAX(total_cost_usd) as max_cost FROM runs; ``` Set budget to 2x average or 1.5x max: ```yaml budget: max_per_run_usd: 0.50 # 2x average of $0.25 ``` ## Next Steps - **[Cost Management](../../features/cost-management/)**: Deep dive into cost tracking - **[Web Search Configuration](../web-search/)**: Understand web search costs - **[Model Configuration](../models/)**: Choose cost-effective models - **[Automation](../../usage/automation/)**: Budget controls in CI/CD # Web Search Configuration Web search enables LLMs to access real-time information from the web, providing current data beyond their training cutoff dates. This is crucial for monitoring brand visibility in fresh, up-to-date LLM responses. ## Why Use Web Search? ### Benefits **Fresh Data**: Access information after LLM training cutoff - Track recent product launches - Monitor current competitive landscape - Detect real-time ranking changes - Capture latest industry trends **Accurate Information**: Grounded in current web sources - Real-time pricing and features - Current company positioning - Latest product updates - Active competitor status **Citations**: Transparent source attribution (Perplexity) - See exactly which sources LLMs used - Verify information accuracy - Understand ranking drivers - Track source patterns ### Trade-offs **Higher Costs**: Web search adds significant costs - OpenAI: +(10-)25 per 1,000 calls - Perplexity: +(0.005-)0.03 per request - 10-30x cost increase vs. base queries **Slower Responses**: Web search takes longer - Base query: ~1-2 seconds - With web search: ~3-10 seconds - May impact automation pipelines **Variability**: Results can change frequently - Web content changes constantly - Less reproducible than static responses - Harder to track trends ## Supported Providers ### OpenAI Web Search OpenAI offers web search through the Responses API with the `web_search` tool. **Configuration**: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` **How it works**: 1. LLM receives user prompt 1. Decides whether to use web search (if `tool_choice: auto`) 1. Searches the web if needed 1. Incorporates search results into response 1. Returns answer with web context **Pricing** (per 1,000 calls): | Model Tier | Cost | Content Tokens | | -------------------------- | ---- | --------------- | | Standard (all models) | $10 | @ model rate | | gpt-4o-mini, gpt-4.1-mini | $10 | Fixed 8k tokens | | Preview reasoning (o1, o3) | $10 | @ model rate | | Preview non-reasoning | $25 | **FREE** | ______________________________________________________________________ ### Perplexity (Native Web Search) Perplexity models have built-in web search - no configuration needed. **Configuration**: ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` **How it works**: 1. Every query automatically searches the web 1. LLM synthesizes answer from sources 1. Returns response with citations 1. Provides source URLs for verification **Models**: - `sonar`: Fast, web-grounded ((1/)1 per 1M tokens + request fees) - `sonar-pro`: High-quality grounded ((3/)15 per 1M tokens + request fees) - `sonar-reasoning`: Enhanced reasoning ((1/)5 per 1M tokens + request fees) - `sonar-deep-research`: In-depth analysis ((3/)15 per 1M tokens + request fees) **Pricing**: Token costs + request fees (~(0.005-)0.03 per request) Perplexity Request Fees Request fees are **not yet included** in cost estimates. Budget accordingly. ______________________________________________________________________ ### Google Search Grounding Google Gemini models support Google Search grounding, which enables the LLM to search the web and ground responses in current information. **Configuration**: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" # Recommended tools: - google_search: {} # Enable Google Search ``` **How it works**: 1. LLM receives user prompt 1. Gemini automatically decides if search is needed 1. Performs Google Search if beneficial 1. Grounds response in search results 1. Returns answer with grounding metadata **Models**: - `gemini-2.0-flash-lite`: Not supported (no grounding) - `gemini-2.0-flash-exp`: Supported (experimental) - `gemini-2.5-flash`: Supported (best for grounding) - `gemini-2.5-flash-lite`: Not supported - `gemini-2.5-pro`: Supported (highest quality) **Pricing**: Base model token costs (no additional fees for grounding) Configuration Format Difference Google uses `google_search: {}` (dictionary format) while OpenAI uses `type: "web_search"` (typed format). This reflects different provider API specifications. See [detailed configuration](#google-search-grounding-configuration) below. ## OpenAI Web Search Configuration ### Basic Configuration Enable web search with automatic activation: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # Let model decide ``` ### Tool Choice Options Control when web search is used: **`auto` (Recommended)**: Model decides when to search ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` **Use when**: You want LLM to determine if fresh data is needed. ______________________________________________________________________ **`required`**: Force web search for every query ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "required" ``` **Use when**: You always want current information. **Warning**: Significantly increases costs (every query uses web search). ______________________________________________________________________ **`none`**: Disable web search for specific queries ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # No tools specified - web search disabled ``` **Use when**: Training data is sufficient, cost optimization priority. ### Comparing With and Without Web Search Test impact of web search: ```yaml models: # With web search - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # Without web search (control) - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # No tools ``` Compare results to see web search impact on brand visibility. ### Web Search Metadata LLM Answer Watcher tracks web search usage: ```json { "intent_id": "best-email-tools", "model_provider": "openai", "model_name": "gpt-4o-mini", "answer_text": "The best email warmup tools are...", "web_search_used": true, "web_search_count": 3, "web_search_results": [ { "url": "https://example.com/best-email-tools", "title": "Top Email Warmup Tools 2025", "snippet": "..." } ], "usage_meta": { "prompt_tokens": 150, "completion_tokens": 520, "web_search_tokens": 8000 } } ``` ## Perplexity Configuration ### Basic Configuration Use Perplexity for automatic web grounding: ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` **No additional configuration needed** - web search is automatic. ### Perplexity Model Selection Choose model based on use case: **`sonar`**: Fast, cost-effective ```yaml models: - provider: "perplexity" model_name: "sonar" env_api_key: "PERPLEXITY_API_KEY" ``` - **Cost**: (1/)1 per 1M tokens + ~$0.005 per request - **Speed**: ~2-4 seconds per query - **Use when**: Daily monitoring, high-volume queries ______________________________________________________________________ **`sonar-pro`**: High-quality grounded answers ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` - **Cost**: (3/)15 per 1M tokens + ~$0.01 per request - **Speed**: ~3-6 seconds per query - **Use when**: Weekly reports, competitive analysis ______________________________________________________________________ **`sonar-reasoning`**: Enhanced reasoning with web search ```yaml models: - provider: "perplexity" model_name: "sonar-reasoning" env_api_key: "PERPLEXITY_API_KEY" ``` - **Cost**: (1/)5 per 1M tokens + ~$0.015 per request - **Speed**: ~4-8 seconds per query - **Use when**: Complex queries, deep analysis ______________________________________________________________________ **`sonar-deep-research`**: Comprehensive research mode ```yaml models: - provider: "perplexity" model_name: "sonar-deep-research" env_api_key: "PERPLEXITY_API_KEY" ``` - **Cost**: (3/)15 per 1M tokens + ~(0.02-)0.03 per request - **Speed**: ~8-15 seconds per query - **Use when**: Monthly executive reports, thorough research ### Perplexity Citations Perplexity provides source citations: ```json { "intent_id": "best-email-tools", "model_provider": "perplexity", "model_name": "sonar-pro", "answer_text": "The best email warmup tools are...", "citations": [ { "index": 1, "url": "https://www.g2.com/categories/email-warmup", "title": "Best Email Warmup Software 2025", "used_in_response": true }, { "index": 2, "url": "https://blog.competitor.com/warmup-guide", "title": "Email Warmup Best Practices", "used_in_response": true } ] } ``` Citation Analysis Track which sources influence LLM recommendations: - Identify key industry publications - Monitor competitor content - Find content opportunities - Track source diversity ## Google Search Grounding Configuration ### Basic Configuration Enable Google Search grounding for Gemini models: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} ``` **Key configuration points**: - **`model_name`**: Must be a grounding-capable model (see [supported models](#supported-models) below) - **`system_prompt`**: Use `"google/gemini-grounding"` for optimized grounding behavior - **`tools`**: Use `google_search: {}` format (Google API specification) ### Configuration Format Google uses a different tools format than OpenAI: **Google format** (dictionary with tool name as key): ```yaml tools: - google_search: {} ``` **OpenAI format** (dictionary with `type` field): ```yaml tools: - type: "web_search" tool_choice: "auto" ``` **Why the difference?** - Each provider has different API specifications - OpenAI uses typed tool specification with `tool_choice` control - Google uses named tool objects with automatic decision-making - The config does direct passthrough to each provider's API No Tool Choice Google Gemini automatically decides when to use Google Search based on the query. There's no `tool_choice` parameter - the model intelligently determines when grounding would improve the response. ### Supported Models Not all Gemini models support Google Search grounding: | Model | Grounding Support | Best For | | ----------------------- | ----------------- | ---------------------------------------- | | `gemini-2.0-flash-lite` | ❌ No | Fast, non-grounded queries | | `gemini-2.0-flash-exp` | ⚠️ Experimental | Testing new features | | `gemini-2.5-flash` | βœ… Yes | **Recommended** - balanced speed/quality | | `gemini-2.5-flash-lite` | ❌ No | Fast, non-grounded queries | | `gemini-2.5-pro` | βœ… Yes | Highest quality grounding | **Recommendation**: Use `gemini-2.5-flash` for production. It provides excellent grounding quality at reasonable cost. ### System Prompt Optimization Use the specialized `google/gemini-grounding` system prompt: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" # Optimized for grounding tools: - google_search: {} ``` **What it does**: - Instructs Gemini to use Google Search when beneficial - Emphasizes grounding responses in search results - Requests comprehensive source coverage - Improves answer quality for brand monitoring **Default system prompt** (`google/default.json`) also works but is less optimized for web search use cases. ### Configuration Examples **With grounding** (recommended for brand monitoring): ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} intents: - id: "email-warmup-tools" prompt: "What are the best email warmup tools in 2025?" ``` **Without grounding** (faster, uses only training data): ```yaml models: - provider: "google" model_name: "gemini-2.5-flash-lite" env_api_key: "GEMINI_API_KEY" # No tools or system_prompt specified intents: - id: "email-warmup-tools" prompt: "What are the best email warmup tools?" ``` ### Grounding Metadata When Google Search is used, the response includes grounding metadata: ```json { "intent_id": "email-warmup-tools", "model_provider": "google", "model_name": "gemini-2.5-flash", "answer_text": "Based on current research, the best email warmup tools are...", "web_search_results": { "web_search_queries": [ "best email warmup tools 2025", "email warmup service comparison" ], "grounding_chunks": [ { "web_source": "https://www.g2.com/categories/email-warmup", "retrieved_context": "Top-rated email warmup tools include..." } ], "grounding_supports": [ { "segment": { "start_index": 150, "end_index": 200, "text": "Warmly is a leading email warmup solution" }, "grounding_chunk_indices": [0, 2], "confidence_scores": [0.95, 0.88] } ] }, "web_search_count": 2 } ``` **Key fields**: - **`web_search_queries`**: Google Search queries Gemini performed - **`grounding_chunks`**: Source URLs and retrieved context - **`grounding_supports`**: Which text segments were grounded in which sources - **`confidence_scores`**: How confident Gemini is in the grounding (0.0-1.0) ### Pricing **Good news**: Google Search grounding has **no additional per-request fees**. You only pay standard token costs: | Model | Input Cost | Output Cost | | ------------------ | ----------------- | ----------------- | | `gemini-2.5-flash` | $0.04 / 1M tokens | $0.12 / 1M tokens | | `gemini-2.5-pro` | $0.60 / 1M tokens | $1.80 / 1M tokens | **Example cost** (email warmup query with grounding): ```text Query: 100 tokens input Response: 300 tokens output (with grounding context) gemini-2.5-flash cost: = (100 Γ— $0.04/1M) + (300 Γ— $0.12/1M) = $0.000004 + $0.000036 = $0.00004 per query ``` **vs. OpenAI with web search**: ```text OpenAI gpt-4o-mini with web_search: = $0.0116 per query (~290x more expensive) ``` Cost Advantage Google Search grounding is significantly cheaper than OpenAI web search for high-volume monitoring. Grounding tokens are included in base pricing. ### When to Use Google Search Grounding **Use Google Search Grounding when**: - βœ… You need current, real-time information - βœ… You want Google's search quality and coverage - βœ… You're running high-volume monitoring (cost-effective) - βœ… You want automatic search decision-making - βœ… You need grounding metadata with source attribution **Use OpenAI web search when**: - βœ… You need explicit `tool_choice` control (force or disable search) - βœ… You prefer OpenAI's LLM reasoning quality - βœ… You're already invested in OpenAI ecosystem **Use Perplexity when**: - βœ… You need explicit source citations with URLs - βœ… You want always-on web search (no configuration) - βœ… You prefer Perplexity's citation format ### Complete Example Configuration Multi-provider comparison with side-by-side grounding: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: # Google with grounding (cost-effective, automatic) - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} # OpenAI with controlled web search - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # Perplexity with always-on citations - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" brands: mine: - "Warmly" - "Lemlist" competitors: - "HubSpot" - "Instantly" intents: - id: "best-email-tools-2025" prompt: "What are the best email warmup tools in 2025?" ``` **This configuration enables**: - Google: Automatic grounding with lowest cost - OpenAI: LLM-controlled web search with reasoning - Perplexity: Always-on search with explicit citations Compare results across all three to understand: - How each provider uses web search - Cost vs. quality trade-offs - Grounding vs. citation differences ## Cost Management for Web Search ### Web Search Cost Breakdown **OpenAI gpt-4o-mini with web search**: ```text Base query: $0.0004 (tokens only) + Web search call: $0.01 (per 1k calls) + Web search content: $0.0012 (8k tokens @ $0.15/1M) = Total: ~$0.0116 per query ``` **Perplexity sonar-pro**: ```text Base tokens: $0.0050 (500 output tokens @ $3/$15 per 1M) + Request fee: $0.01 (varies by complexity) = Total: ~$0.015 per query ``` ### Budget Configuration for Web Search Adjust budgets to account for higher costs: ```yaml run_settings: # Without web search budget: max_per_run_usd: 0.50 # With web search (10-30x higher) budget: max_per_run_usd: 5.00 ``` **Example calculation**: - 3 intents Γ— 2 models with web search = 6 queries - ~$0.015 per query - Total: $0.09 per run - Recommended budget: $0.50 (5x safety margin) ### Optimizing Web Search Costs **1. Use `auto` tool choice**: ```yaml tools: - type: "web_search" tool_choice: "auto" # Only search when needed ``` Model only uses web search when beneficial, reducing unnecessary searches. ______________________________________________________________________ **2. Mix web and non-web models**: ```yaml models: # Web-grounded for fresh data - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" # Base model for comparison - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # No web search ``` Compare web vs. non-web responses to validate web search value. ______________________________________________________________________ **3. Use web search selectively**: ```yaml intents: # Fresh data needed - id: "current-best-tools" prompt: "What are the best email tools in 2025?" # Use web search models for this intent # Historical query - id: "email-warmup-concept" prompt: "What is email warmup?" # No web search needed ``` Separate configs for different intent types. ______________________________________________________________________ **4. Track web search usage**: ```sql -- Web search usage rate SELECT model_name, COUNT(*) as total_queries, SUM(web_search_used) as web_searches, (SUM(web_search_used) * 100.0 / COUNT(*)) as usage_rate, AVG(estimated_cost_usd) as avg_cost FROM answers_raw WHERE model_provider = 'openai' GROUP BY model_name; ``` Optimize based on actual usage patterns. ## Use Cases for Web Search ### 1. Recent Product Launches Track brand visibility after launches: ```yaml intents: - id: "best-tools-2025" prompt: "What are the best email warmup tools launched in 2025?" models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` Web search ensures LLM knows about recent launches. ### 2. Current Competitive Landscape Monitor live market positioning: ```yaml intents: - id: "current-market-leaders" prompt: "Who are the current market leaders in email warmup?" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "required" # Always search ``` ### 3. Pricing and Features Track current pricing mentions: ```yaml intents: - id: "pricing-comparison" prompt: "Compare pricing for email warmup tools in 2025" models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` ### 4. News and Events Monitor impact of news on brand visibility: ```yaml intents: - id: "post-acquisition" prompt: "What are the best email tools after HubSpot's recent acquisition?" models: - provider: "perplexity" model_name: "sonar-reasoning" env_api_key: "PERPLEXITY_API_KEY" ``` ### 5. Trend Analysis Track emerging trends: ```yaml intents: - id: "ai-email-tools" prompt: "What are the best AI-powered email warmup tools?" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` ## Analyzing Web Search Results ### Web Search Metadata Check if web search was used: ```python import json with open("output/2025-11-01T08-00-00Z/intent_best-tools_raw_openai_gpt-4o-mini.json") as f: data = json.load(f) print(f"Web search used: {data.get('web_search_used')}") print(f"Searches performed: {data.get('web_search_count')}") ``` ### Citation Analysis (Perplexity) Extract and analyze citations: ```python import json with open("output/2025-11-01T08-00-00Z/intent_best-tools_raw_perplexity_sonar-pro.json") as f: data = json.load(f) for citation in data.get('citations', []): print(f"{citation['index']}: {citation['title']}") print(f" {citation['url']}\n") ``` ### Source Patterns Track which sources LLMs cite: ```sql -- Citation frequency (future feature) SELECT citation_domain, COUNT(*) as citation_count, COUNT(DISTINCT intent_id) as intents_cited_in FROM citations GROUP BY citation_domain ORDER BY citation_count DESC LIMIT 10; ``` Citation Tracking Full citation tracking is planned for a future release. Currently, citations are stored in JSON artifacts. ## Best Practices ### 1. Test With and Without Web Search Compare to measure impact: ```yaml models: # Baseline (no web search) - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Test (with web search) - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" ``` ### 2. Use Auto Tool Choice Let model decide when to search: ```yaml tools: - type: "web_search" tool_choice: "auto" # More cost-effective ``` ### 3. Budget Appropriately Account for 10-30x cost increase: ```yaml budget: max_per_run_usd: 5.00 # vs. $0.50 without web search ``` ### 4. Use for Time-Sensitive Queries Enable web search when freshness matters: - Recent product launches - Current pricing - Latest competitive moves - Industry news impact ### 5. Track Citation Sources Monitor which sources influence rankings: - Identify key industry publications - Find content gaps - Track competitor content - Understand ranking factors ### 6. Combine Providers Use multiple web search approaches: ```yaml models: # OpenAI: Selective web search - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # Perplexity: Always web-grounded - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` ## Troubleshooting ### Web Search Not Working **Problem**: Web search tool not being used **Check**: 1. Tool configuration is correct: ```yaml tools: - type: "web_search" # Correct # Not: tool_type or search_tool ``` 1. Tool choice is set: ```yaml tool_choice: "auto" # or "required" ``` 1. Model supports web search: 1. OpenAI: All chat models 1. Perplexity: All models (automatic) ______________________________________________________________________ ### High Costs **Problem**: Web search costs higher than expected **Solutions**: 1. Check tool choice: ```yaml tool_choice: "auto" # Not "required" ``` 1. Monitor usage: ```sql SELECT COUNT(*) as total, SUM(web_search_used) as searches, AVG(estimated_cost_usd) as avg_cost FROM answers_raw; ``` 1. Use cheaper models: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # Cheapest with web search ``` ______________________________________________________________________ ### Inconsistent Results **Problem**: Results vary between runs **Cause**: Web content changes frequently **Expected behavior**: Web-grounded responses will vary as web content updates. **Mitigation**: - Run multiple queries, average results - Track trends over time vs. point-in-time snapshots - Use non-web models for baseline comparison ## Next Steps - **[Model Configuration](../models/)**: Choose models with web search - **[Budget Configuration](../budget/)**: Budget for web search costs - **[Cost Management](../../features/cost-management/)**: Track web search spending - **[HTML Reports](../../features/html-reports/)**: View web search metadata # Post-Intent Operations Post-intent operations allow you to execute custom actions after each intent query completes. This advanced feature enables dynamic workflows like discovering competitors mentioned by LLMs. ## Overview Operations are defined per-intent and execute after the LLM response is received: ```yaml intents: - id: "best-tools" prompt: "What are the best tools?" operations: - type: "extract_competitors" save_to: "discovered_competitors" ``` ## Supported Operation Types ### `extract_competitors` Automatically extracts brand names mentioned in LLM responses that aren't in your configured brand lists. **Use Case**: Discover new competitors you weren't tracking. **Configuration**: ```yaml intents: - id: "market-research" prompt: "What are all the tools in this category?" operations: - type: "extract_competitors" save_to: "discovered_brands" params: min_confidence: 0.7 exclude_generic_terms: true ``` **Parameters**: - `save_to` (required): Variable name to store results - `min_confidence`: Minimum confidence threshold (0.0-1.0) - `exclude_generic_terms`: Filter out generic words **Output**: Results saved to `intent_*_operation_extract_competitors.json`: ```json { "operation_type": "extract_competitors", "discovered_brands": [ {"name": "NewCompetitor", "confidence": 0.95}, {"name": "EmergingTool", "confidence": 0.82} ] } ``` ## Operation Chaining Execute multiple operations in sequence: ```yaml intents: - id: "comprehensive-analysis" prompt: "Analyze the market landscape" operations: # Step 1: Extract competitors - type: "extract_competitors" save_to: "new_competitors" # Step 2: Could add more operations in future # - type: "sentiment_analysis" # depends_on: "new_competitors" ``` Operations execute in order and can depend on previous results. ## Real-World Examples ### Market Discovery Find competitors you didn't know about: ```yaml intents: - id: "discover-market" prompt: "List all email marketing tools you know" operations: - type: "extract_competitors" save_to: "market_scan" params: min_confidence: 0.8 ``` ### Quarterly Expansion Update your competitor list quarterly: ```yaml intents: - id: "q1-market-scan" prompt: "What are the top 20 tools in our category as of Q1 2025?" operations: - type: "extract_competitors" save_to: "q1_competitors" ``` Then review `q1_competitors.json` and add new brands to your config. ## Best Practices ### 1. Use High Confidence Thresholds Avoid false positives: ```yaml params: min_confidence: 0.8 # Only very confident extractions ``` ### 2. Review Before Adding to Config Operations discover candidates - manually review before adding to your brand list. ### 3. Separate Discovery Intents Create dedicated intents for competitor discovery: ```yaml intents: # Regular monitoring - id: "best-tools" prompt: "What are the best tools?" # Discovery (run monthly) - id: "market-discovery" prompt: "Comprehensive list of all tools in category" operations: - type: "extract_competitors" save_to: "monthly_scan" ``` ## Accessing Operation Results Results are stored in the output directory: ```text output/2025-11-05T14-30-00Z/ β”œβ”€β”€ intent_market-discovery_operation_extract_competitors.json └── ... ``` Also queryable from SQLite: ```sql SELECT operation_type, operation_results FROM intent_operations WHERE intent_id = 'market-discovery'; ``` ## Future Operation Types Planned for future releases: - `sentiment_analysis`: Analyze tone of brand mentions - `feature_extraction`: Extract mentioned features/capabilities - `pricing_detection`: Detect pricing information - `use_case_mapping`: Map brands to specific use cases ## Limitations - Operations run synchronously (no parallel execution yet) - Limited to extraction tasks (no API calls or external actions) - Results require manual review before acting on them ## Next Steps - [Learn about intent configuration](../intents/) - [See complete examples](../../../examples/ci-cd-integration/) # Core Features # Brand Mention Detection Brand mention detection is the core feature of LLM Answer Watcher. It uses word-boundary regex matching to accurately identify brand mentions while preventing false positives. ## How It Works ### Word-Boundary Matching The system uses **word-boundary regex** (`\b`) to ensure accurate matching: ```python # Pattern: \bHubSpot\b # Matches: "I use HubSpot daily" # Doesn't match: "I use HubSpotter" or "hub" in "GitHub" ``` This prevents common false positives: - βœ… "HubSpot" matches "HubSpot" exactly - ❌ "Hub" does NOT match "HubSpot" - ❌ "Spot" does NOT match "HubSpot" - ❌ "hub" does NOT match "GitHub" ### Case-Insensitive Matching All matching is case-insensitive: ```python # All these match "HubSpot" "HubSpot", "hubspot", "HUBSPOT", "HuBsPoT" ``` ### Brand Aliases Configure multiple aliases for each brand: ```yaml brands: mine: - "Warmly" - "Warmly.io" - "Warmly AI" competitors: - "HubSpot" - "HubSpot CRM" - "Instantly" - "Instantly.ai" ``` ## Configuration ### Basic Brand Configuration Minimal configuration with your brand and competitors: ```yaml brands: mine: - "YourBrand" competitors: - "CompetitorA" - "CompetitorB" ``` ### Advanced Brand Configuration Include all variations and common misspellings: ```yaml brands: mine: - "Acme Corp" - "Acme" - "AcmeCorp" - "Acme.io" - "Acme Software" competitors: # Direct competitors - "Competitor One" - "CompetitorOne" - "Competitor1" # Market leaders - "Industry Leader" - "Big Player Inc" # Adjacent competitors - "Alternative Tool" ``` ### Brand Normalization Brands are normalized for storage and analysis: ```python "HubSpot CRM" β†’ "hubspot-crm" "Instantly.ai" β†’ "instantly-ai" "Apollo.io" β†’ "apollo-io" ``` This ensures consistent matching across different formats. ## Detection Methods ### Method 1: Regex (Default) Fast, free, pattern-based detection. **Advantages:** - Zero cost (no API calls) - Instant results - 100% consistent - Works offline **Limitations:** - May miss contextual mentions - Requires exact alias match - No semantic understanding **Configuration:** ```yaml run_settings: use_llm_rank_extraction: false ``` ### Method 2: Function Calling LLM-assisted detection using function calling for higher accuracy. **Advantages:** - Understands context - Catches variations - Semantic understanding - Confidence scores **Limitations:** - Costs money per query - Slower than regex - Requires extraction model **Configuration:** ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" fallback_to_regex: true min_confidence: 0.7 ``` ### Method 3: Hybrid Combines regex and function calling for best results. **How it works:** 1. Try regex first (fast, free) 1. If regex fails, use function calling 1. Merge results with de-duplication **Configuration:** ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "hybrid" fallback_to_regex: true min_confidence: 0.7 ``` ## Detection Results ### Mention Object Each detected mention includes: ```json { "brand": "HubSpot", "normalized_name": "hubspot", "is_mine": false, "rank_position": 1, "snippet": "...I recommend HubSpot for CRM needs...", "confidence": 1.0, "detection_method": "regex" } ``` ### My Brands vs Competitors Mentions are categorized: ```json { "my_mentions": [ { "brand": "Warmly", "is_mine": true, "rank_position": 2 } ], "competitor_mentions": [ { "brand": "HubSpot", "is_mine": false, "rank_position": 1 }, { "brand": "Instantly", "is_mine": false, "rank_position": 3 } ] } ``` ## Common Detection Patterns ### Pattern 1: Exact Brand Name **LLM Response:** > "The best email warmup tools are Warmly, Instantly, and Lemwarm." **Detected:** - βœ… Warmly - βœ… Instantly - βœ… Lemwarm ### Pattern 2: Brand with TLD **LLM Response:** > "Check out Warmly.io for email warmup." **Detected:** - βœ… Warmly.io **Note:** Add both "Warmly" and "Warmly.io" as aliases to catch both. ### Pattern 3: Brand in Context **LLM Response:** > "Many sales teams use HubSpot CRM to manage leads." **Detected:** - βœ… HubSpot CRM - βœ… HubSpot (if both aliases configured) ### Pattern 4: Case Variations **LLM Response:** > "HUBSPOT and hubspot are the same product." **Detected:** - βœ… HubSpot (both instances) ## Preventing False Positives ### Use Word Boundaries **❌ Bad - Substring Matching:** ```yaml brands: mine: - "Hub" # Matches "GitHub", "HubSpot", "hub" ``` This creates false positives. **βœ… Good - Full Word Matching:** ```yaml brands: mine: - "HubSpot" # Only matches "HubSpot" ``` Word boundaries prevent substring matches. ### Avoid Overly Generic Names **❌ Bad:** ```yaml brands: competitors: - "AI" # Too generic - "The" - "Pro" ``` **βœ… Good:** ```yaml brands: competitors: - "OpenAI" - "The Sales Platform" - "Pro CRM" ``` ### Test Your Aliases ```bash # Validate configuration llm-answer-watcher validate --config watcher.config.yaml # Run with example intents llm-answer-watcher run --config watcher.config.yaml ``` ## Detection Accuracy ### Evaluation Metrics LLM Answer Watcher tracks detection accuracy: | Metric | Description | Target | | ------------- | ------------------------------------- | ------ | | **Precision** | Correct mentions / Total detected | β‰₯ 90% | | **Recall** | Correct mentions / Expected mentions | β‰₯ 80% | | **F1 Score** | Harmonic mean of precision and recall | β‰₯ 85% | ### Run Evaluations ```bash llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml ``` See [Evaluation Framework](../../../evaluation/overview/) for details. ## Advanced Detection ### Special Characters Escape special characters in brand names: ```yaml brands: mine: - "Brand (TM)" # Automatically escaped - "Brand.io" - "Brand-Name" ``` The system handles escaping automatically. ### Multi-Word Brands ```yaml brands: competitors: - "Acme Corp" - "Big Company Inc" - "The Sales Platform" ``` Word boundaries work across multiple words. ### Abbreviations Add both full name and abbreviation: ```yaml brands: competitors: - "Customer Relationship Management" - "CRM" - "HubSpot CRM" ``` ## Debugging Detection Issues ### Issue: Brand Not Detected **Problem:** Your brand appears in response but isn't detected. **Solutions:** 1. Check brand alias spelling: ```bash # View raw response cat output/2025-11-05T14-30-00Z/intent_*_raw_*.json | jq '.answer_text' ``` 1. Add alias variation: ```yaml brands: mine: - "YourBrand" - "YourBrand.io" - "Your Brand" # Add this ``` 1. Check for special formatting: ```json "Check out **YourBrand**" // Bold formatting "Visit `YourBrand.io`" // Code formatting ``` ### Issue: False Positives **Problem:** Unrelated words are detected as brand mentions. **Solutions:** 1. Remove overly generic aliases: ```yaml # ❌ Remove this brands: mine: - "AI" # βœ… Use this instead brands: mine: - "YourBrand AI" ``` 1. Check word boundaries are working: ```bash # Test with evaluation suite llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml ``` ### Issue: Case Sensitivity **Problem:** Brand detected with wrong capitalization. **Solution:** Matching is already case-insensitive, but display preserves original case from LLM response. ```python # All match the same brand "HubSpot" β†’ normalized to "hubspot" "hubspot" β†’ normalized to "hubspot" "HUBSPOT" β†’ normalized to "hubspot" ``` ## Best Practices ### 1. Start with Core Aliases ```yaml brands: mine: - "YourBrand" # Exact name - "YourBrand.io" # With TLD ``` ### 2. Add Variations Incrementally Run monitoring, review results, add missing aliases: ```yaml brands: mine: - "YourBrand" - "YourBrand.io" - "YourBrand AI" # Added after reviewing results - "YB" # Abbreviation if commonly used ``` ### 3. Limit Competitor List Track 10-20 key competitors: ```yaml brands: competitors: # Top 5 direct competitors - "Competitor A" - "Competitor B" # Top 3 market leaders - "Market Leader" ``` ### 4. Monitor Detection Metrics ```sql -- Check detection rates SELECT brand, COUNT(*) as total_mentions, COUNT(DISTINCT run_id) as runs_appeared, COUNT(*) * 100.0 / (SELECT COUNT(*) FROM runs) as appearance_rate FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY brand ORDER BY total_mentions DESC; ``` ### 5. Use Evaluation Suite ```bash # Test detection before deploying llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml # Add custom test cases for your brands # See: evals/testcases/fixtures.yaml ``` ## Next Steps - **Rank Extraction** ______________________________________________________________________ Learn how ranking positions are extracted [Rank Extraction β†’](../rank-extraction/) - **Function Calling** ______________________________________________________________________ Use LLM-assisted detection for higher accuracy [Function Calling β†’](../function-calling/) - **Evaluation Framework** ______________________________________________________________________ Test and validate detection accuracy [Evaluation Guide β†’](../../../evaluation/overview/) - **Brand Configuration** ______________________________________________________________________ Deep dive into brand configuration strategies [Brand Config β†’](../../configuration/brands/) # Rank Extraction Rank extraction identifies where brands appear in ranked lists within LLM responses. This feature helps track competitive positioning and brand visibility. ## Overview When LLMs generate lists like "The best tools are:", rank extraction determines: 1. **Position**: Where each brand appears (1st, 2nd, 3rd, etc.) 1. **Context**: Whether it's an explicit ranking or casual mention 1. **Competitors**: How your brand ranks against competitors ## How It Works ### Pattern-Based Extraction (Default) Uses regex patterns to detect numbered or bulleted lists: **Supported Patterns:** ```text # Numbered lists 1. HubSpot 2. Salesforce 3. Pipedrive # With periods 1) HubSpot 2) Salesforce # With dashes - HubSpot - Salesforce # With asterisks * HubSpot * Salesforce # With letters a. HubSpot b. Salesforce ``` ### Ranking Algorithm 1. **Detect List Structure**: Find numbered/bulleted lists in response 1. **Extract Brand Names**: Match brands within list items 1. **Assign Positions**: Number brands sequentially (1, 2, 3...) 1. **Handle Ties**: Brands in same list item get same rank ## Configuration ### Use Regex Extraction (Free) Default method - no additional configuration needed: ```yaml run_settings: use_llm_rank_extraction: false # Use pattern-based extraction ``` **Advantages:** - βœ… Zero cost - βœ… Fast - βœ… Deterministic - βœ… Works offline **Limitations:** - ❌ May miss implicit rankings - ❌ Requires explicit list structure - ❌ No semantic understanding ### Use LLM Extraction (Paid) LLM-assisted extraction for complex rankings: ```yaml run_settings: use_llm_rank_extraction: true # Use LLM for extraction extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" min_confidence: 0.7 ``` **Advantages:** - βœ… Understands context - βœ… Extracts implicit rankings - βœ… Handles complex formats - βœ… Semantic understanding **Limitations:** - ❌ Costs money per query - ❌ Slower than regex - ❌ May be inconsistent ## Ranking Examples ### Example 1: Simple Numbered List **LLM Response:** ```text The best email warmup tools are: 1. Instantly 2. Warmly 3. Lemwarm ``` **Extracted Rankings:** ```json [ {"brand": "Instantly", "rank_position": 1}, {"brand": "Warmly", "rank_position": 2}, {"brand": "Lemwarm", "rank_position": 3} ] ``` ### Example 2: Descriptive List **LLM Response:** ```text Top CRM tools: 1. HubSpot - Great for startups 2. Salesforce - Enterprise solution 3. Pipedrive - Sales-focused ``` **Extracted Rankings:** ```json [ {"brand": "HubSpot", "rank_position": 1}, {"brand": "Salesforce", "rank_position": 2}, {"brand": "Pipedrive", "rank_position": 3} ] ``` ### Example 3: Multiple Brands Per Item **LLM Response:** ```text Best tools for sales teams: 1. HubSpot and Salesforce for enterprise 2. Pipedrive for small teams ``` **Extracted Rankings:** ```json [ {"brand": "HubSpot", "rank_position": 1}, {"brand": "Salesforce", "rank_position": 1}, {"brand": "Pipedrive", "rank_position": 2} ] ``` ### Example 4: Bulleted List **LLM Response:** ```text - Instantly: Best for cold email - Warmly: Great for personalization - Lemwarm: Simple and effective ``` **Extracted Rankings:** ```json [ {"brand": "Instantly", "rank_position": 1}, {"brand": "Warmly", "rank_position": 2}, {"brand": "Lemwarm", "rank_position": 3} ] ``` ### Example 5: Prose (No Ranking) **LLM Response:** ```text I've used HubSpot, Salesforce, and Pipedrive. They're all good options. ``` **Extracted Rankings:** ```json [ {"brand": "HubSpot", "rank_position": null}, {"brand": "Salesforce", "rank_position": null}, {"brand": "Pipedrive", "rank_position": null} ] ``` **Note:** Mentions detected but no ranking assigned (not in a list). ## Rank Position Meanings ### Position 1 **Highest visibility** - First recommendation. ```sql -- Count #1 rankings SELECT brand, COUNT(*) as first_place_count FROM mentions WHERE rank_position = 1 AND timestamp_utc >= datetime('now', '-30 days') GROUP BY brand ORDER BY first_place_count DESC; ``` ### Positions 2-5 **High visibility** - Listed in top recommendations. ### Positions 6-10 **Medium visibility** - Included in comprehensive lists. ### Position NULL **Mentioned but not ranked** - Appears in prose or examples. ## Analyzing Rankings ### Average Rank Position Lower is better (1 is best): ```sql SELECT brand, AVG(rank_position) as avg_rank, COUNT(*) as mentions, COUNT(CASE WHEN rank_position = 1 THEN 1 END) as first_place_count FROM mentions WHERE rank_position IS NOT NULL AND timestamp_utc >= datetime('now', '-30 days') GROUP BY brand ORDER BY avg_rank ASC; ``` ### Rank Distribution See where brands typically appear: ```sql SELECT rank_position, COUNT(*) as mention_count, COUNT(DISTINCT brand) as unique_brands FROM mentions WHERE rank_position IS NOT NULL GROUP BY rank_position ORDER BY rank_position; ``` ### Competitor Comparison Compare your rank against competitors: ```sql SELECT m1.run_id, m1.intent_id, my_brand.rank_position as my_rank, competitor.brand as competitor_name, competitor.rank_position as competitor_rank FROM mentions m1 JOIN mentions my_brand ON m1.run_id = my_brand.run_id AND m1.intent_id = my_brand.intent_id AND my_brand.is_mine = 1 JOIN mentions competitor ON m1.run_id = competitor.run_id AND m1.intent_id = competitor.intent_id AND competitor.is_mine = 0 WHERE m1.timestamp_utc >= datetime('now', '-7 days') ORDER BY m1.timestamp_utc DESC; ``` ## Rank Trends Track how rankings change over time: ```sql SELECT DATE(timestamp_utc) as date, brand, AVG(rank_position) as avg_rank, COUNT(*) as mentions FROM mentions WHERE rank_position IS NOT NULL AND brand = 'YourBrand' AND timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc), brand ORDER BY date DESC; ``` ## Common Ranking Patterns ### Pattern 1: Direct Recommendation **Prompt:** "What's the best CRM?" **Response:** "I recommend HubSpot for most teams." **Rank:** Position 1 (single recommendation) ### Pattern 2: Top 3 List **Prompt:** "Top 3 CRM tools?" **Response:** ```text 1. HubSpot 2. Salesforce 3. Pipedrive ``` **Rank:** Explicit positions 1-3 ### Pattern 3: Comprehensive List **Prompt:** "List all major CRM tools" **Response:** Lists 10+ tools **Rank:** All assigned positions, less emphasis on specific rank ### Pattern 4: Categorized Lists **Prompt:** "Best CRM by company size?" **Response:** ```text For startups: 1. HubSpot 2. Pipedrive For enterprise: 1. Salesforce 2. Microsoft Dynamics ``` **Rank:** Multiple brands at position 1 (different categories) ## Debugging Ranking Issues ### Issue: No Rankings Detected **Problem:** Brands detected but `rank_position` is `null`. **Cause:** Response doesn't contain explicit lists. **Example Response:** ```text I've used HubSpot and Salesforce. Both are great options. ``` **Solution:** 1. Update intent prompts to encourage rankings: ```yaml # ❌ Generic prompt: "Tell me about CRM tools" # βœ… Ranking-focused prompt: "What are the top 5 CRM tools ranked by popularity?" ``` 1. Enable LLM rank extraction: ```yaml run_settings: use_llm_rank_extraction: true ``` ### Issue: Incorrect Rankings **Problem:** Rankings don't match actual LLM response order. **Debugging:** ```bash # View raw response cat output/2025-11-05T14-30-00Z/intent_*_raw_*.json | jq '.answer_text' # View extracted rankings cat output/2025-11-05T14-30-00Z/intent_*_parsed_*.json | jq '.ranked_list' ``` **Solutions:** 1. Check for unusual list formatting 1. Enable LLM rank extraction 1. Add evaluation test case ### Issue: All Brands Ranked #1 **Problem:** Multiple brands get `rank_position: 1`. **Cause:** Brands appear in separate lists or categories. **Example:** ```text Best for startups: HubSpot Best for enterprise: Salesforce ``` Both get rank 1 (different contexts). **This is correct behavior** - each is #1 in its category. ## Best Practices ### 1. Design Ranking-Friendly Prompts ```yaml intents: # βœ… Good - Encourages ranking - id: "top-5-crm-tools" prompt: "What are the top 5 CRM tools ranked by market share?" # βœ… Good - Specific ranking criteria - id: "best-for-startups" prompt: "Rank the best CRM tools for early-stage startups" # ❌ Bad - No ranking signal - id: "crm-info" prompt: "Tell me about CRM software" ``` ### 2. Use Regex First, LLM as Fallback ```yaml extraction_settings: method: "hybrid" # Try regex, fallback to LLM fallback_to_regex: true ``` ### 3. Track Rank Changes ```sql -- Alert when rank drops WITH latest_ranks AS ( SELECT brand, AVG(rank_position) as current_avg FROM mentions WHERE timestamp_utc >= datetime('now', '-7 days') AND brand = 'YourBrand' GROUP BY brand ), previous_ranks AS ( SELECT brand, AVG(rank_position) as previous_avg FROM mentions WHERE timestamp_utc >= datetime('now', '-14 days') AND timestamp_utc < datetime('now', '-7 days') AND brand = 'YourBrand' GROUP BY brand ) SELECT l.brand, p.previous_avg as previous_rank, l.current_avg as current_rank, l.current_avg - p.previous_avg as rank_change FROM latest_ranks l JOIN previous_ranks p ON l.brand = p.brand WHERE l.current_avg > p.previous_avg; -- Rank got worse (higher number) ``` ### 4. Analyze by Intent Some intents may favor certain brands: ```sql SELECT intent_id, brand, AVG(rank_position) as avg_rank, COUNT(*) as mentions FROM mentions WHERE rank_position IS NOT NULL GROUP BY intent_id, brand ORDER BY intent_id, avg_rank; ``` ### 5. Monitor First-Place Wins ```sql -- Track #1 rankings over time SELECT DATE(timestamp_utc) as date, COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_first_place, COUNT(CASE WHEN is_mine = 0 THEN 1 END) as competitor_first_place FROM mentions WHERE rank_position = 1 AND timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` ## Next Steps - **Brand Detection** ______________________________________________________________________ Learn how brands are detected [Brand Detection β†’](../brand-detection/) - **Function Calling** ______________________________________________________________________ Use LLM-assisted ranking extraction [Function Calling β†’](../function-calling/) - **Query Examples** ______________________________________________________________________ SQL queries for ranking analysis [Query Examples β†’](../../../data-analytics/query-examples/) - **Trends Analysis** ______________________________________________________________________ Track ranking changes over time [Trends Analysis β†’](../../../data-analytics/trends-analysis/) # Function Calling for Extraction Function calling uses LLMs to extract structured data from responses with higher accuracy than regex-based extraction. This feature enables semantic understanding of brand mentions and rankings. ## Overview Function calling instructs the LLM to output structured JSON matching a specific schema, ensuring consistent, parseable extraction results. ### When to Use βœ… **Use function calling when:** - Regex extraction misses complex mentions - You need contextual understanding - Rankings are implicit (not in explicit lists) - Budget allows for additional API calls ❌ **Skip function calling when:** - Regex works well for your use case - Optimizing for cost (regex is free) - Brand names are simple and unambiguous - Running frequent monitoring (hourly/daily) ## Configuration ### Basic Setup ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/extraction-default" method: "function_calling" fallback_to_regex: true min_confidence: 0.7 ``` ### Advanced Configuration ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" # Fast, cheap extraction model env_api_key: "OPENAI_API_KEY" system_prompt: "openai/extraction-default" # Extraction method method: "function_calling" # Options: function_calling, regex, hybrid # Fall back to regex if function calling fails fallback_to_regex: true # Minimum confidence threshold (0.0-1.0) min_confidence: 0.7 # Maximum extraction attempts max_retries: 2 ``` ## Extraction Methods ### Method 1: Function Calling Only Use LLM for all extraction: ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" fallback_to_regex: false # Don't fall back ``` **Cost:** ~$0.001-0.003 per extraction ### Method 2: Regex Only Use pattern matching (no LLM): ```yaml run_settings: use_llm_rank_extraction: false # No extraction_settings needed ``` **Cost:** Free ### Method 3: Hybrid (Recommended) Try regex first, use LLM as fallback: ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "hybrid" fallback_to_regex: true ``` **Cost:** Variable (free for regex hits, paid for LLM fallback) ## Function Schema ### Competitor Detection Function ```json { "name": "extract_competitor_mentions", "description": "Extract mentions of competitor brands from LLM response", "parameters": { "type": "object", "properties": { "competitors": { "type": "array", "items": { "type": "object", "properties": { "brand": { "type": "string", "description": "Exact brand name as mentioned" }, "rank_position": { "type": "integer", "description": "Position in ranked list (1=first, null=not ranked)" }, "confidence": { "type": "number", "description": "Confidence score 0.0-1.0" }, "context": { "type": "string", "description": "Surrounding context of the mention" } }, "required": ["brand", "confidence"] } } }, "required": ["competitors"] } } ``` ### Example LLM Response **Input (LLM answer):** ```text The best email warmup tools are: 1. Instantly - Great for cold email 2. Warmly - Excellent personalization 3. Lemwarm - Simple and effective ``` **Function Call Output:** ```json { "competitors": [ { "brand": "Instantly", "rank_position": 1, "confidence": 0.95, "context": "Great for cold email" }, { "brand": "Warmly", "rank_position": 2, "confidence": 0.95, "context": "Excellent personalization" }, { "brand": "Lemwarm", "rank_position": 3, "confidence": 0.90, "context": "Simple and effective" } ] } ``` ## Confidence Scores ### Confidence Threshold Only accept extractions above confidence threshold: ```yaml extraction_settings: min_confidence: 0.7 # Reject extractions < 70% confidence ``` ### Confidence Levels | Range | Quality | Action | | --------- | -------- | ------------------------- | | 0.90-1.00 | High | Accept automatically | | 0.70-0.89 | Medium | Accept with review | | 0.50-0.69 | Low | Reject or flag for review | | 0.00-0.49 | Very Low | Reject | ### Interpreting Confidence **High confidence (0.9+):** - Clear, unambiguous mention - Explicit ranking - Standard brand name **Medium confidence (0.7-0.9):** - Slight ambiguity - Implicit ranking - Brand name variation **Low confidence (\<0.7):** - Ambiguous mention - Unclear ranking - Possible false positive ## Cost Management ### Extraction Costs Function calling adds extra API calls: | Model | Cost per 1K tokens | Typical Extraction Cost | | ---------------- | --------------------------- | ----------------------- | | gpt-4o-mini | $0.15 input / $0.60 output | $0.001-0.002 | | gpt-4o | $2.50 input / $10.00 output | $0.010-0.020 | | claude-3-5-haiku | $0.80 input / $4.00 output | $0.003-0.005 | ### Cost Optimization **1. Use cheap extraction models:** ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" # Cheapest option ``` **2. Use hybrid method:** ```yaml extraction_settings: method: "hybrid" # Free regex first, LLM fallback ``` **3. Cache extraction results:** Extraction results are stored in SQLite and reused. **4. Limit extraction to important intents:** ```yaml intents: - id: "high-priority" prompt: "..." use_extraction: true # Enable for this intent - id: "low-priority" prompt: "..." use_extraction: false # Skip for this intent ``` ## Advantages Over Regex ### 1. Semantic Understanding **Regex:** ```text "I recommend HubSpot" β†’ Detected "HubSpot is not recommended" β†’ Detected (false positive) ``` **Function Calling:** ```text "I recommend HubSpot" β†’ Detected with positive context "HubSpot is not recommended" β†’ Not detected (understands negation) ``` ### 2. Implicit Rankings **LLM Response:** ```text "While Salesforce is the market leader, I prefer HubSpot for startups." ``` **Regex:** No ranking detected (no list structure) **Function Calling:** Detects HubSpot as preferred (rank 1) ### 3. Context Extraction Function calling extracts surrounding context: ```json { "brand": "HubSpot", "rank_position": 1, "context": "Great for startups with limited budget", "confidence": 0.92 } ``` ### 4. Handles Variations **LLM mentions:** "HS CRM", "HubSpot's CRM", "HubSpot platform" **Regex:** Misses variations **Function Calling:** Normalizes all to "HubSpot" ## Debugging Function Calling ### View Function Call Logs ```bash # Enable verbose logging export LOG_LEVEL=DEBUG llm-answer-watcher run --config watcher.config.yaml --verbose ``` ### Check Extraction Results ```bash # View parsed results cat output/2025-11-05T14-30-00Z/intent_*_parsed_*.json | jq '.extraction_method' # Output: "function_calling" or "regex" ``` ### Common Issues **Issue: Low confidence scores** **Solution:** Adjust threshold: ```yaml extraction_settings: min_confidence: 0.6 # Lower threshold ``` **Issue: High costs** **Solution:** Switch to hybrid: ```yaml extraction_settings: method: "hybrid" # Use regex when possible ``` **Issue: Inconsistent results** **Solution:** Use specific system prompt: ```yaml extraction_settings: extraction_model: system_prompt: "openai/extraction-strict" # More consistent ``` ## Best Practices ### 1. Start with Regex Test regex extraction first: ```yaml run_settings: use_llm_rank_extraction: false ``` If accuracy is insufficient, enable function calling. ### 2. Use Hybrid Method Best of both worlds: ```yaml extraction_settings: method: "hybrid" fallback_to_regex: true ``` ### 3. Monitor Extraction Costs ```sql SELECT DATE(timestamp_utc) as date, SUM(estimated_cost_usd) as total_cost, COUNT(*) as extractions FROM answers_raw WHERE extraction_method = 'function_calling' AND timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc); ``` ### 4. Test with Eval Suite ```bash llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml ``` ### 5. Use Dedicated Extraction Model Don't use expensive models for extraction: ```yaml # ❌ Bad - expensive extraction_model: model_name: "gpt-4o" # βœ… Good - cheap and fast extraction_model: model_name: "gpt-4o-mini" ``` ## Next Steps - **Brand Detection** ______________________________________________________________________ Understanding brand mention detection [Brand Detection β†’](../brand-detection/) - **Rank Extraction** ______________________________________________________________________ How rankings are extracted [Rank Extraction β†’](../rank-extraction/) - **Cost Management** ______________________________________________________________________ Managing LLM costs [Cost Management β†’](../cost-management/) - **Evaluation** ______________________________________________________________________ Test extraction accuracy [Evaluation β†’](../../../evaluation/overview/) # Sentiment Analysis & Intent Classification Advanced analysis features that extract sentiment, context, and intent from brand mentions and user queries using LLM function calling. New in v0.1.0 These features were added to enhance brand mention analysis and enable prioritization of high-value queries. ## Overview LLM Answer Watcher includes two powerful analysis features: 1. **Sentiment Analysis**: Analyzes the tone and context of each brand mention 1. **Intent Classification**: Determines the user's intent and buyer journey stage for each query Both features use OpenAI's function calling API for accurate, structured extraction. ## Sentiment Analysis ### What It Analyzes For each brand mention, the system extracts: **Sentiment** - Emotional tone: - `positive`: Brand recommended or praised - `neutral`: Brand mentioned without judgment - `negative`: Brand criticized or not recommended **Mention Context** - How the brand was mentioned: - `primary_recommendation`: Brand is the top recommendation - `alternative_listing`: Brand listed as one of several options - `competitor_negative`: Brand mentioned as inferior to others - `competitor_neutral`: Brand compared without negative bias - `passing_reference`: Brief mention without detail ### Example Query: *"What are the best email warmup tools?"* LLM Response: *"The best tools are Lemwarm for automated warmup and Instantly for cold outreach. HubSpot is also an option but quite expensive."* **Extracted Sentiments:** | Brand | Sentiment | Context | Reasoning | | --------- | ---------- | ------------------------ | -------------------------------------- | | Lemwarm | `positive` | `primary_recommendation` | Listed first with positive qualifier | | Instantly | `positive` | `primary_recommendation` | Listed alongside Lemwarm with use case | | HubSpot | `neutral` | `alternative_listing` | Mentioned as option with cost caveat | ### Configuration Enable sentiment analysis in `extraction_settings`: ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" method: "function_calling" # Enable sentiment analysis (default: true) enable_sentiment_analysis: true ``` Function Calling Required Sentiment analysis only works with `method: "function_calling"`. Regex extraction does not support sentiment analysis (fields will be `None`). ### Cost Impact Sentiment analysis is integrated into function calling extraction: - **No extra LLM calls** - sentiment extracted in same call as brand mentions - **Cost increase**: ~33% per extraction call due to larger response schema - **Example**: $0.0002 β†’ $0.00027 per extraction with gpt-4o-mini ### Database Storage Sentiments are stored in the `mentions` table: ```sql SELECT brand, sentiment, mention_context, timestamp_utc FROM mentions WHERE sentiment = 'positive' AND mention_context = 'primary_recommendation' ORDER BY timestamp_utc DESC; ``` Schema: ```sql ALTER TABLE mentions ADD COLUMN sentiment TEXT; ALTER TABLE mentions ADD COLUMN mention_context TEXT; ``` ## Intent Classification ### What It Classifies For each user query, the system determines: **Intent Type** - What the user wants: - `transactional`: Ready to buy/use a tool - `commercial_investigation`: Researching options before purchase - `informational`: Learning about a topic - `navigational`: Looking for a specific brand/site **Buyer Journey Stage** - Where they are in the purchase process: - `awareness`: Learning about the category - `consideration`: Evaluating options - `decision`: Ready to choose/purchase **Urgency Signal** - How urgent is the need: - `high`: Immediate need ("now", "urgent", "today") - `medium`: Near-term need ("soon", "this week") - `low`: Future or casual exploration **Classification Confidence** - How confident the model is (0.0-1.0) **Reasoning** - Explanation of why it was classified this way ### Examples #### High-Value Query Query: *"What are the best email warmup tools to buy now for my outreach campaign?"* Classification: ```json { "intent_type": "transactional", "buyer_stage": "decision", "urgency_signal": "high", "classification_confidence": 0.95, "reasoning": "Query contains 'buy now' and specific use case, indicating ready-to-purchase intent with high urgency" } ``` #### Research Query Query: *"How do email warmup tools work?"* Classification: ```json { "intent_type": "informational", "buyer_stage": "awareness", "urgency_signal": "low", "classification_confidence": 0.92, "reasoning": "Query seeks explanation, indicating learning phase without purchase intent" } ``` #### Comparison Query Query: *"Compare Lemwarm vs Instantly for cold email"* Classification: ```json { "intent_type": "commercial_investigation", "buyer_stage": "consideration", "urgency_signal": "medium", "classification_confidence": 0.88, "reasoning": "Direct comparison of specific brands indicates evaluation phase before purchase decision" } ``` ### Configuration Enable intent classification in `extraction_settings`: ```yaml extraction_settings: extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Enable intent classification (default: true) enable_intent_classification: true ``` ### Cost Impact Intent classification adds one extra LLM call per unique query: - **Cost**: ~$0.00012 per query with gpt-4o-mini - **When**: Before extracting brand mentions - **Caching**: Results are cached by query hash, so repeated queries are free **Example cost breakdown:** - 3 intents Γ— 1 model = 3 queries - Intent classification: 3 Γ— $0.00012 = $0.00036 - Extraction: 3 Γ— $0.0002 = (0.0006 - \*\*Total\*\*: ~)0.001 per run ### Database Storage Intent classifications are stored in `intent_classifications` table: ```sql SELECT intent_id, intent_type, buyer_stage, urgency_signal, reasoning FROM intent_classifications WHERE buyer_stage = 'decision' AND urgency_signal = 'high' ORDER BY classification_confidence DESC; ``` Schema: ```sql CREATE TABLE intent_classifications ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, intent_id TEXT NOT NULL, query_text TEXT NOT NULL, query_hash TEXT NOT NULL, intent_type TEXT NOT NULL, buyer_stage TEXT NOT NULL, urgency_signal TEXT NOT NULL, classification_confidence REAL NOT NULL, reasoning TEXT, timestamp_utc TEXT NOT NULL, UNIQUE(run_id, intent_id) ); ``` ### Query Hash Caching Intent classifications are cached by query hash: ```python # Normalized query β†’ hash "What are the best email warmup tools?" β†’ "5d41402abc4b2a76b9719d911017c592..." # Same hash for semantically identical queries " what are the BEST email warmup tools? " β†’ "5d41402abc4b2a76b9719d911017c592..." (same hash) ``` Caching benefits: - **Saves API calls**: Repeated queries use cached results - **Normalizes variations**: Whitespace/case differences don't matter - **Persistent cache**: Stored in database across runs ## Use Cases ### 1. Prioritize High-Value Queries Focus on queries with high buyer intent: ```sql SELECT m.brand, ic.intent_type, ic.buyer_stage, ic.urgency_signal FROM mentions m JOIN intent_classifications ic ON m.intent_id = ic.intent_id WHERE ic.intent_type = 'transactional' AND ic.buyer_stage = 'decision' AND ic.urgency_signal = 'high' AND m.sentiment = 'positive'; ``` ### 2. Track Sentiment Trends Monitor how sentiment changes over time: ```sql SELECT DATE(timestamp_utc) as date, sentiment, COUNT(*) as mentions FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY DATE(timestamp_utc), sentiment ORDER BY date DESC; ``` ### 3. Identify Context Patterns See how your brand is typically mentioned: ```sql SELECT mention_context, COUNT(*) as count, ROUND(AVG(CASE sentiment WHEN 'positive' THEN 1.0 WHEN 'neutral' THEN 0.5 WHEN 'negative' THEN 0.0 END), 2) as sentiment_score FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY mention_context ORDER BY count DESC; ``` ### 4. ROI Analysis Calculate value of brand mentions by intent: ```sql SELECT ic.buyer_stage, COUNT(DISTINCT m.brand) as brands_mentioned, COUNT(*) as total_mentions FROM mentions m JOIN intent_classifications ic ON m.intent_id = ic.intent_id WHERE m.is_mine = 1 GROUP BY ic.buyer_stage ORDER BY CASE ic.buyer_stage WHEN 'decision' THEN 1 WHEN 'consideration' THEN 2 WHEN 'awareness' THEN 3 END; ``` ## Disabling Features ### Disable Sentiment Analysis ```yaml extraction_settings: enable_sentiment_analysis: false ``` **Result**: `sentiment` and `mention_context` fields will be `None` in database. ### Disable Intent Classification ```yaml extraction_settings: enable_intent_classification: false ``` **Result**: No rows in `intent_classifications` table, queries classified as `None`. ### Disable Both ```yaml extraction_settings: enable_sentiment_analysis: false enable_intent_classification: false ``` **Benefit**: Reduces costs by ~33% for extraction calls and eliminates intent classification calls. ## Limitations ### Function Calling Only Both features require `method: "function_calling"`: ```yaml extraction_settings: method: "function_calling" # Required enable_sentiment_analysis: true enable_intent_classification: true ``` Regex extraction does not support these features. ### Provider Support Currently only OpenAI supports function calling for extraction: ```yaml extraction_model: provider: "openai" # Required model_name: "gpt-4o-mini" ``` Anthropic, Mistral, and other providers coming soon. ### Confidence Thresholds Low confidence classifications may be inaccurate: ```sql -- Filter by confidence SELECT * FROM intent_classifications WHERE classification_confidence >= 0.8; ``` ## Best Practices ### 1. Enable for High-Value Monitoring Use sentiment/intent for business-critical queries: ```yaml # Production config - full analysis extraction_settings: method: "function_calling" enable_sentiment_analysis: true enable_intent_classification: true ``` ### 2. Disable for Cost Optimization Skip for budget-constrained or high-frequency monitoring: ```yaml # Cost-optimized config extraction_settings: method: "regex" # No function calling enable_sentiment_analysis: false enable_intent_classification: false ``` ### 3. Review Classification Reasoning Check why queries were classified: ```sql SELECT query_text, intent_type, buyer_stage, reasoning FROM intent_classifications WHERE classification_confidence < 0.8; ``` ### 4. Track Sentiment Distribution Monitor the health of your brand's mentions: ```sql SELECT sentiment, COUNT(*) as mentions, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY sentiment; ``` **Healthy distribution**: 70%+ positive, \<10% negative ## Next Steps - **Function Calling** ______________________________________________________________________ Learn how function calling works [Function Calling β†’](../function-calling/) - **Query Examples** ______________________________________________________________________ SQL queries for sentiment analysis [Query Examples β†’](../../../data-analytics/query-examples/) - **Cost Management** ______________________________________________________________________ Understand cost implications [Cost Management β†’](../cost-management/) - **Trends Analysis** ______________________________________________________________________ Track sentiment over time [Trends β†’](../../../data-analytics/trends-analysis/) # Historical Tracking LLM Answer Watcher stores all query results in a local SQLite database for historical trend analysis. ## Features - **Long-term Storage**: All responses saved indefinitely - **Trend Analysis**: Track brand visibility over time - **Comparative Analysis**: Compare performance across dates - **Data Export**: Query via SQL or export to CSV ## Database Location ```text ./output/watcher.db ``` ## Querying Historical Data ```sql -- Brand mentions over time SELECT DATE(timestamp_utc) as date, COUNT(*) as mentions FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` See [SQLite Database](../../../data-analytics/sqlite-database/) for more queries. # Cost Management Control and monitor LLM API costs with built-in budget protection. ## Features - Pre-run cost estimation - Budget limits (per run, per intent) - Real-time cost tracking - Cost breakdowns by provider/model ## Budget Configuration ```yaml run_settings: budget: enabled: true max_per_run_usd: 1.00 max_per_intent_usd: 0.10 warn_threshold_usd: 0.50 ``` ## Cost Estimation Before running, the tool estimates costs based on: - Number of intents - Number of models - Average tokens per query - Provider pricing See [Budget Controls](../../configuration/budget/) for detailed configuration. # HTML Reports Auto-generated interactive HTML reports for each monitoring run. ## Features - Brand mention visualization - Rank distribution charts - Cost breakdowns - Raw response inspection - Historical trends (if multiple runs) ## Report Location ```text output/YYYY-MM-DDTHH-MM-SSZ/report.html ``` ## Opening Reports ```bash # macOS open output/2025-11-05T14-30-00Z/report.html # Linux xdg-open output/2025-11-05T14-30-00Z/report.html ``` ## Report Sections 1. **Summary**: Costs, queries, brands found 1. **Brand Mentions**: Detailed mention tables 1. **Rank Distribution**: Visual charts 1. **Historical Trends**: Performance over time 1. **Raw Responses**: Full LLM outputs # CLI Usage # CLI Commands Complete reference for all LLM Answer Watcher CLI commands. ## Command Structure ```bash llm-answer-watcher [COMMAND] [OPTIONS] ``` ## Global Options Available for all commands: | Option | Description | | ----------- | ------------------------ | | `--help` | Show help message | | `--version` | Show version information | ## Commands ### `run` Execute a monitoring run with configured models and intents. **Usage**: ```bash llm-answer-watcher run --config CONFIG_PATH [OPTIONS] ``` **Required Arguments**: - `--config PATH` - Path to YAML configuration file **Options**: | Option | Default | Description | | ------------------------------- | ------- | ------------------------- | | `--format [human\|json\|quiet]` | `human` | Output format | | `--yes, -y` | `false` | Skip confirmation prompts | | `--force` | `false` | Override budget limits | | `--verbose, -v` | `false` | Enable verbose logging | **Examples**: ```bash # Human-friendly output (default) llm-answer-watcher run --config config.yaml # JSON output for automation llm-answer-watcher run --config config.yaml --format json # Quiet mode for scripts llm-answer-watcher run --config config.yaml --quiet # Auto-confirm (no prompts) llm-answer-watcher run --config config.yaml --yes # Override budget limits llm-answer-watcher run --config config.yaml --force # Verbose logging llm-answer-watcher run --config config.yaml --verbose ``` **Exit Codes**: - `0`: Success - `1`: Configuration error - `2`: Database error - `3`: Partial failure - `4`: Complete failure ### `validate` Validate configuration file without running queries. **Usage**: ```bash llm-answer-watcher validate --config CONFIG_PATH [OPTIONS] ``` **Required Arguments**: - `--config PATH` - Path to YAML configuration file **Options**: | Option | Default | Description | | --------------- | ------- | ------------------------ | | `--verbose, -v` | `false` | Show detailed validation | **Examples**: ```bash # Basic validation llm-answer-watcher validate --config config.yaml # Detailed validation llm-answer-watcher validate --config config.yaml --verbose ``` **Output**: ```text βœ… Configuration valid β”œβ”€β”€ Models: 2 configured (openai gpt-4o-mini, anthropic claude-3-5-haiku) β”œβ”€β”€ Brands: 3 mine, 8 competitors β”œβ”€β”€ Intents: 4 queries └── Estimated cost: $0.024 (8 queries total) ``` ### `eval` Run evaluation framework to test extraction accuracy. **Usage**: ```bash llm-answer-watcher eval --fixtures FIXTURES_PATH [OPTIONS] ``` **Required Arguments**: - `--fixtures PATH` - Path to test fixtures YAML file **Options**: | Option | Default | Description | | ------------------------ | ------------------- | ------------------------ | | `--db PATH` | `./eval_results.db` | Evaluation database path | | `--format [human\|json]` | `human` | Output format | **Examples**: ```bash # Run evaluation suite llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml # Custom database llm-answer-watcher eval --fixtures fixtures.yaml --db my_evals.db # JSON output for CI/CD llm-answer-watcher eval --fixtures fixtures.yaml --format json ``` **Exit Codes**: - `0`: All tests passed - `1`: Tests failed (below thresholds) - `2`: Configuration error ### `prices show` Display current LLM pricing information. **Usage**: ```bash llm-answer-watcher prices show [OPTIONS] ``` **Options**: | Option | Description | | ------------------------ | ------------------ | | `--provider NAME` | Filter by provider | | `--format [human\|json]` | Output format | **Examples**: ```bash # Show all pricing llm-answer-watcher prices show # OpenAI pricing only llm-answer-watcher prices show --provider openai # JSON format llm-answer-watcher prices show --format json ``` ### `prices refresh` Refresh pricing cache from llm-prices.com. **Usage**: ```bash llm-answer-watcher prices refresh [OPTIONS] ``` **Options**: | Option | Description | | --------- | ---------------------------- | | `--force` | Force refresh (ignore cache) | **Examples**: ```bash # Refresh if cache expired llm-answer-watcher prices refresh # Force refresh llm-answer-watcher prices refresh --force ``` ### `prices list` List all available models with pricing. **Usage**: ```bash llm-answer-watcher prices list [OPTIONS] ``` **Options**: | Option | Description | | ------------------------ | ------------------ | | `--provider NAME` | Filter by provider | | `--format [human\|json]` | Output format | **Examples**: ```bash # List all models llm-answer-watcher prices list # Anthropic models only llm-answer-watcher prices list --provider anthropic # Export as JSON llm-answer-watcher prices list --format json > models.json ``` ## Output Modes ### Human Mode (Default) Beautiful Rich-formatted output with colors, spinners, and tables. ```bash llm-answer-watcher run --config config.yaml ``` **Features**: - Progress spinners - Colored status indicators - Formatted tables - Visual charts **Best for**: Interactive terminal use ### JSON Mode Structured JSON output for programmatic consumption. ```bash llm-answer-watcher run --config config.yaml --format json ``` **Features**: - Valid JSON output - No ANSI codes - Machine-readable - Complete metadata **Best for**: AI agents, scripts, APIs ### Quiet Mode Minimal tab-separated output. ```bash llm-answer-watcher run --config config.yaml --quiet ``` **Output format**: ```text RUN_ID STATUS QUERIES COST OUTPUT_DIR ``` **Best for**: Shell scripts, pipelines ## Common Workflows ### Development ```bash # Validate config llm-answer-watcher validate --config dev.yaml # Run with verbose logging llm-answer-watcher run --config dev.yaml --verbose ``` ### Production ```bash # Auto-confirm, JSON output llm-answer-watcher run --config prod.yaml --yes --format json ``` ### CI/CD ```bash # Quiet mode with exit code checking llm-answer-watcher run --config ci.yaml --quiet --yes if [ $? -eq 0 ]; then echo "Success" else echo "Failed" && exit 1 fi ``` ## Next Steps - [Learn about output modes](../output-modes/) - [Understand exit codes](../exit-codes/) - [Automate runs](../automation/) # Output Modes LLM Answer Watcher supports three output modes to serve different use cases: humans, AI agents, and shell scripts. ## Human Mode (Default) Beautiful Rich-formatted output designed for interactive terminal use. ### Usage ```bash llm-answer-watcher run --config config.yaml # or explicitly: llm-answer-watcher run --config config.yaml --format human ``` ### Features - **Progress Spinners**: Real-time progress indication - **Colors**: Status indicators (βœ… green success, ❌ red errors) - **Tables**: Formatted data presentation - **Panels**: Organized information display - **Live Updates**: Dynamic progress tracking ### Example Output ```text πŸ” Running LLM Answer Watcher... β”œβ”€β”€ Configuration loaded from config.yaml β”œβ”€β”€ Models: 2 configured β”œβ”€β”€ Intents: 3 queries └── Estimated cost: $0.012 πŸ“€ Query 1/3: "What are the best email warmup tools?" β”œβ”€β”€ Provider: OpenAI (gpt-4o-mini) β”œβ”€β”€ Sending request... ⏳ β”œβ”€β”€ βœ… Response received (1.2s) β”œβ”€β”€ Tokens: 145 input, 387 output β”œβ”€β”€ Cost: $0.004 └── Brands detected: 3 found βœ… Run completed successfully! πŸ“Š Summary: β”œβ”€β”€ Run ID: 2025-11-05T14-30-00Z β”œβ”€β”€ Queries: 3/3 completed (100%) β”œβ”€β”€ Total cost: $0.012 └── Output: ./output/2025-11-05T14-30-00Z/ ``` ## JSON Mode Structured JSON output for programmatic consumption and AI agent automation. ### Usage ```bash llm-answer-watcher run --config config.yaml --format json ``` ### Features - **Valid JSON**: Parseable by any JSON library - **No ANSI Codes**: Clean output for parsing - **Complete Metadata**: All run information included - **Deterministic**: Same format every time ### Output Structure ```json { "run_id": "2025-11-05T14-30-00Z", "status": "success", "timestamp_utc": "2025-11-05T14:30:00Z", "queries_completed": 3, "queries_failed": 0, "total_cost_usd": 0.012, "output_dir": "./output/2025-11-05T14-30-00Z", "brands_detected": { "mine": ["Lemwarm", "Lemlist"], "competitors": ["Instantly", "HubSpot", "Apollo.io"] }, "per_intent_results": [ { "intent_id": "best-email-warmup-tools", "status": "success", "cost_usd": 0.004, "brands_found": ["Lemwarm", "Instantly", "HubSpot"] } ] } ``` ### Use Cases #### AI Agent Automation ```python import subprocess import json result = subprocess.run([ "llm-answer-watcher", "run", "--config", "config.yaml", "--format", "json", "--yes" ], capture_output=True, text=True) data = json.loads(result.stdout) if data["status"] == "success": print(f"Found {len(data['brands_detected']['mine'])} of our brands") ``` #### CI/CD Integration ```yaml # .github/workflows/brand-monitoring.yml - name: Run Brand Monitoring id: monitor run: | OUTPUT=$(llm-answer-watcher run --config config.yaml --format json --yes) echo "result=$OUTPUT" >> $GITHUB_OUTPUT - name: Check Results run: | STATUS=$(echo '${{ steps.monitor.outputs.result }}' | jq -r '.status') if [ "$STATUS" != "success" ]; then exit 1 fi ``` ## Quiet Mode Minimal tab-separated output for shell scripts and pipelines. ### Usage ```bash llm-answer-watcher run --config config.yaml --quiet ``` ### Output Format ```text RUN_ID STATUS QUERIES_COMPLETED COST_USD OUTPUT_DIR ``` ### Example Output ```text 2025-11-05T14-30-00Z success 3 0.012 ./output/2025-11-05T14-30-00Z ``` ### Use Cases #### Shell Scripts ```bash #!/bin/bash OUTPUT=$(llm-answer-watcher run --config config.yaml --quiet --yes) RUN_ID=$(echo "$OUTPUT" | cut -f1) STATUS=$(echo "$OUTPUT" | cut -f2) COST=$(echo "$OUTPUT" | cut -f4) echo "Run $RUN_ID completed with status $STATUS (cost: \$$ $COST)" ``` #### CSV Export ```bash # Append to CSV file echo "timestamp,run_id,status,queries,cost" > monitoring_log.csv llm-answer-watcher run --config config.yaml --quiet --yes >> monitoring_log.csv ``` #### Pipeline Processing ```bash # Process multiple configs for config in configs/*.yaml; do llm-answer-watcher run --config "$config" --quiet --yes | \ awk '{print $1 "\t" $2 "\t" $4}' done ``` ## Comparing Output Modes | Feature | Human | JSON | Quiet | | ----------------------- | ----------- | ---------- | ------- | | **Colors/Emojis** | βœ… Yes | ❌ No | ❌ No | | **Progress Indicators** | βœ… Yes | ❌ No | ❌ No | | **Machine Parseable** | ❌ No | βœ… Yes | βœ… Yes | | **Size** | Large | Medium | Minimal | | **Use Case** | Interactive | Automation | Scripts | | **ANSI Codes** | βœ… Yes | ❌ No | ❌ No | ## Verbose Logging Enable verbose logging in any mode: ```bash llm-answer-watcher run --config config.yaml --verbose ``` Adds detailed logging information: ```text [2025-11-05 14:30:00] INFO: Loading configuration from config.yaml [2025-11-05 14:30:00] DEBUG: Validating YAML schema [2025-11-05 14:30:00] DEBUG: Resolving environment variables [2025-11-05 14:30:00] INFO: API key loaded for provider: openai [2025-11-05 14:30:01] DEBUG: Sending request to OpenAI API [2025-11-05 14:30:02] DEBUG: Response received: 200 OK ``` ## Mode Selection Guide ### Choose Human Mode When: - Running manually in terminal - Debugging configuration issues - Watching progress in real-time - Presenting to stakeholders ### Choose JSON Mode When: - Integrating with AI agents - Building dashboards/UIs - Processing results programmatically - CI/CD automation ### Choose Quiet Mode When: - Shell script automation - Logging to files - CSV/TSV export - Minimal bandwidth/storage ## Next Steps - [Learn about exit codes](../exit-codes/) - [Automate monitoring runs](../automation/) - [See CI/CD examples](../../../examples/ci-cd-integration/) # Exit Codes LLM Answer Watcher uses standardized exit codes for automation and error handling. ## Exit Code Reference | Code | Status | Meaning | When It Occurs | | ----- | ------------------- | ---------------------------------- | ---------------------------------------------------- | | **0** | Success | All queries completed successfully | No errors encountered | | **1** | Configuration Error | Invalid configuration | YAML syntax errors, missing API keys, invalid schema | | **2** | Database Error | Cannot access database | SQLite file locked, permissions issue, disk full | | **3** | Partial Failure | Some queries failed | LLM API errors, rate limits, timeouts | | **4** | Complete Failure | No queries succeeded | All queries failed, fatal errors | ## Exit Code 0: Success All queries completed without errors. **When**: - All LLM API calls succeeded - All brands extracted successfully - All data written to database - Reports generated **Example**: ```bash llm-answer-watcher run --config config.yaml echo $? # Prints: 0 ``` ## Exit Code 1: Configuration Error Configuration file has issues. **When**: - YAML syntax errors - Missing required fields - Invalid provider names - API keys not found in environment - Invalid model names - Budget misconfiguration **Examples**: ```yaml # Missing required field run_settings: # output_dir missing! models: - provider: "openai" ``` ```yaml # Invalid provider models: - provider: "invalid_provider" # Not supported ``` **Handling**: ```bash llm-answer-watcher run --config config.yaml if [ $? -eq 1 ]; then echo "Configuration error - check your YAML file" exit 1 fi ``` ## Exit Code 2: Database Error Cannot create or access SQLite database. **When**: - Database file locked by another process - Insufficient disk space - Permission denied on output directory - Corrupted database file **Handling**: ```bash llm-answer-watcher run --config config.yaml case $? in 2) echo "Database error - check permissions and disk space" # Try to fix permissions chmod 755 output/ # Retry llm-answer-watcher run --config config.yaml ;; esac ``` ## Exit Code 3: Partial Failure Some queries succeeded, others failed. **When**: - Rate limits hit mid-run - Network timeouts - Invalid API responses - Model-specific errors **Example Scenario**: ```text 3 intents Γ— 2 models = 6 total queries βœ… 4 succeeded ❌ 2 failed (rate limit) Exit code: 3 (partial failure) ``` **Handling**: ```bash llm-answer-watcher run --config config.yaml --format json > result.json if [ $? -eq 3 ]; then echo "⚠️ Partial failure - some queries failed" # Check which queries failed jq '.per_intent_results[] | select(.status=="failed")' result.json # Continue with successful results fi ``` **Best Practice**: Accept partial failures in production. The succeeded queries still provide value. ## Exit Code 4: Complete Failure All queries failed. **When**: - All API keys invalid - Network completely down - All models unreachable - Severe runtime errors **Handling**: ```bash llm-answer-watcher run --config config.yaml if [ $? -eq 4 ]; then echo "❌ Complete failure - no queries succeeded" # Alert on-call engineer # Don't continue pipeline exit 1 fi ``` ## Practical Examples ### Basic Error Handling ```bash #!/bin/bash llm-answer-watcher run --config config.yaml --yes case $? in 0) echo "βœ… Success - all queries completed" ;; 1) echo "❌ Configuration error - fix YAML file" exit 1 ;; 2) echo "❌ Database error - check permissions" exit 1 ;; 3) echo "⚠️ Partial failure - continuing" # Partial success is OK ;; 4) echo "❌ Complete failure - aborting" exit 1 ;; esac ``` ### Retry Logic ```bash #!/bin/bash MAX_RETRIES=3 RETRY_COUNT=0 while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do llm-answer-watcher run --config config.yaml --yes EXIT_CODE=$? case $EXIT_CODE in 0|3) # Success or partial success exit 0 ;; 1|2) # Config or DB error - don't retry exit $EXIT_CODE ;; 4) # Complete failure - retry RETRY_COUNT=$((RETRY_COUNT + 1)) echo "Retry $RETRY_COUNT/$MAX_RETRIES after complete failure" sleep $((2 ** RETRY_COUNT)) # Exponential backoff ;; esac done echo "Max retries exceeded" exit 4 ``` ### CI/CD Integration ```yaml # .github/workflows/monitoring.yml - name: Run Monitoring id: monitor run: | llm-answer-watcher run --config config.yaml --format json --yes echo "exit_code=$?" >> $GITHUB_OUTPUT continue-on-error: true - name: Handle Result run: | case ${{ steps.monitor.outputs.exit_code }} in 0) echo "βœ… Success" ;; 1) echo "❌ Configuration error" exit 1 ;; 2) echo "❌ Database error" exit 1 ;; 3) echo "⚠️ Partial failure (acceptable)" ;; 4) echo "❌ Complete failure" exit 1 ;; esac ``` ### Alerting Based on Exit Codes ```bash #!/bin/bash llm-answer-watcher run --config config.yaml --yes EXIT_CODE=$? if [ $EXIT_CODE -eq 4 ]; then # Send alert for complete failure curl -X POST https://alerts.example.com/webhook \ -d '{"alert": "LLM monitoring complete failure", "severity": "critical"}' elif [ $EXIT_CODE -eq 3 ]; then # Log partial failure (no alert) echo "$(date): Partial failure" >> /var/log/monitoring.log fi ``` ## Testing Exit Codes ### Simulate Errors Test your error handling: ```bash # Force configuration error llm-answer-watcher run --config nonexistent.yaml echo $? # Should be 1 # Invalid API key export OPENAI_API_KEY=invalid llm-answer-watcher run --config config.yaml echo $? # Should be 1 or 4 ``` ### Validation Testing ```bash # This should exit 0 (validation success) llm-answer-watcher validate --config config.yaml echo $? ``` ## Best Practices ### 1. Always Check Exit Codes ```bash # ❌ Bad - ignores errors llm-answer-watcher run --config config.yaml # βœ… Good - checks exit code llm-answer-watcher run --config config.yaml if [ $? -ne 0 ]; then handle_error fi ``` ### 2. Differentiate Error Types Don't treat all non-zero exits the same: ```bash # βœ… Good - handles each error type case $? in 1|2) exit 1 ;; # Fatal - abort 3) continue ;; # Partial - OK 4) retry ;; # Complete - retry esac ``` ### 3. Log Exit Codes ```bash EXIT_CODE=$? echo "$(date): Exit code $EXIT_CODE" >> monitoring.log ``` ### 4. Accept Partial Failures In production, partial success is often acceptable: ```bash if [ $EXIT_CODE -eq 0 ] || [ $EXIT_CODE -eq 3 ]; then echo "Run completed with usable results" continue_pipeline fi ``` ## Next Steps - [Learn about output modes](../output-modes/) - [Automate monitoring runs](../automation/) - [See CI/CD examples](../../../examples/ci-cd-integration/) # Automation Automate LLM Answer Watcher runs with cron, GitHub Actions, or custom schedulers. ## Quick Start ```bash # Run with no prompts llm-answer-watcher run --config config.yaml --yes --format json ``` ## Cron Jobs ### Basic Cron Setup Edit crontab: ```bash crontab -e ``` Add scheduled job: ```text # Run daily at 9 AM 0 9 * * * /path/to/.venv/bin/llm-answer-watcher run --config /path/to/config.yaml --yes --quiet >> /var/log/monitoring.log 2>&1 # Run weekly on Monday 0 9 * * 1 /path/to/.venv/bin/llm-answer-watcher run --config /path/to/config.yaml --yes --format json > /path/to/results/$(date +\%Y-\%m-\%d).json ``` ### Production Cron Script ```bash #!/bin/bash # /usr/local/bin/run-monitoring.sh set -euo pipefail # Configuration CONFIG="/home/user/monitoring/config.yaml" VENV="/home/user/llm-answer-watcher/.venv" LOG_DIR="/var/log/monitoring" # Load environment source "$VENV/bin/activate" source /home/user/.env # API keys # Run with error handling "$VENV/bin/llm-answer-watcher" run \ --config "$CONFIG" \ --yes \ --format json \ > "$LOG_DIR/$(date +%Y-%m-%d).json" 2>&1 EXIT_CODE=$? # Alert on failure if [ $EXIT_CODE -eq 4 ]; then echo "Monitoring failed" | mail -s "Alert: Monitoring Failure" ops@example.com fi exit $EXIT_CODE ``` Make executable and schedule: ```bash chmod +x /usr/local/bin/run-monitoring.sh # Add to crontab 0 9 * * * /usr/local/bin/run-monitoring.sh ``` ## GitHub Actions ### Basic Workflow `.github/workflows/brand-monitoring.yml`: ```yaml name: Brand Monitoring on: schedule: # Run daily at 9 AM UTC - cron: '0 9 * * *' workflow_dispatch: # Manual trigger jobs: monitor: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: '3.12' - name: Install dependencies run: | pip install uv uv sync - name: Run monitoring env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | uv run llm-answer-watcher run \ --config configs/production.yaml \ --yes \ --format json \ > results.json - name: Upload results uses: actions/upload-artifact@v4 with: name: monitoring-results path: | results.json output/ - name: Commit database run: | git config --local user.email "bot@example.com" git config --local user.name "Monitoring Bot" git add output/watcher.db git commit -m "Update monitoring data" git push ``` ### Advanced Workflow with Notifications ```yaml name: Advanced Brand Monitoring on: schedule: - cron: '0 9 * * *' jobs: monitor: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: '3.12' - name: Install run: | pip install uv uv sync - name: Run monitoring id: monitor env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | uv run llm-answer-watcher run \ --config config.yaml \ --yes \ --format json | tee results.json echo "exit_code=$?" >> $GITHUB_OUTPUT continue-on-error: true - name: Parse results id: parse run: | COST=$(jq -r '.total_cost_usd' results.json) BRANDS=$(jq -r '.brands_detected.mine | length' results.json) echo "cost=$COST" >> $GITHUB_OUTPUT echo "brands_found=$BRANDS" >> $GITHUB_OUTPUT - name: Slack notification if: steps.monitor.outputs.exit_code == '0' uses: slackapi/slack-github-action@v1 with: payload: | { "text": "βœ… Brand monitoring completed", "blocks": [ { "type": "section", "text": { "type": "mrkdwn", "text": "*Brand Monitoring Results*\nβ€’ Cost: $$${{ steps.parse.outputs.cost }}\nβ€’ Brands found: ${{ steps.parse.outputs.brands_found }}" } } ] } env: SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }} - name: Alert on failure if: steps.monitor.outputs.exit_code == '4' uses: slackapi/slack-github-action@v1 with: payload: | { "text": "❌ Brand monitoring failed completely" } env: SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }} ``` ## Docker Automation ### Dockerfile ```dockerfile FROM python:3.12-slim WORKDIR /app # Install uv RUN pip install uv # Copy project COPY . . # Install dependencies RUN uv sync # Set entrypoint ENTRYPOINT ["uv", "run", "llm-answer-watcher"] CMD ["run", "--config", "config.yaml", "--yes", "--format", "json"] ``` ### Docker Compose ```yaml # docker-compose.yml version: '3.8' services: monitoring: build: . environment: - OPENAI_API_KEY=${OPENAI_API_KEY} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./output:/app/output - ./configs:/app/configs command: run --config configs/production.yaml --yes --format json ``` Run: ```bash docker-compose up ``` ## Best Practices ### 1. Use --yes Flag Skip confirmation prompts: ```bash llm-answer-watcher run --config config.yaml --yes ``` ### 2. Use JSON or Quiet Mode For parsing: ```bash llm-answer-watcher run --config config.yaml --yes --format json ``` ### 3. Handle Exit Codes ```bash llm-answer-watcher run --config config.yaml --yes case $? in 0|3) echo "Success or partial success" ;; *) echo "Error occurred" && exit 1 ;; esac ``` ### 4. Secure API Keys Never hardcode API keys: ```bash # βœ… Good - from environment export OPENAI_API_KEY=sk-... # βœ… Good - from secrets management OPENAI_API_KEY=$(aws secretsmanager get-secret-value --secret-id openai-key) ``` ### 5. Log Output ```bash llm-answer-watcher run --config config.yaml --yes \ --format json \ > /var/log/monitoring/$(date +%Y-%m-%d).json 2>&1 ``` ### 6. Rotate Logs ```bash # Keep last 30 days find /var/log/monitoring -name "*.json" -mtime +30 -delete ``` ## Next Steps - [See CI/CD examples](../../../examples/ci-cd-integration/) - [Learn about output modes](../output-modes/) - [Understand exit codes](../exit-codes/) # Supported Providers # Provider Overview LLM Answer Watcher supports 6 major LLM providers with a unified interface. Choose providers based on cost, performance, and feature requirements. > **🌐 New in v0.2.0**: Browser Runners - Access ChatGPT and Perplexity via web UI automation to capture the true user experience. See [Browser vs API Access](#browser-vs-api-access) below. ## Supported Providers | Provider | Models | Cost Range | Web Search | Best For | | --------------- | ------------------------------ | ---------------------- | ----------- | --------------------------- | | **OpenAI** | gpt-4o-mini, gpt-4o, more | (0.15-)10/1M tokens | βœ… Yes | General use, cost-effective | | **Anthropic** | Claude 3.5 Haiku, Sonnet, Opus | (0.80-)75/1M tokens | ❌ No | High-quality responses | | **Mistral** | mistral-large, mistral-small | (0.30-)2/1M tokens | ❌ No | European alternative | | **X.AI (Grok)** | grok-beta, grok-2, grok-3 | (2-)25/1M tokens | ❌ No | X platform integration | | **Google** | Gemini 2.0 Flash | (0.075-)0.30/1M tokens | ❌ No | Low-cost option | | **Perplexity** | Sonar, Sonar Pro | (1-)15/1M tokens | βœ… Built-in | Grounded responses | ## Quick Configuration ### Single Provider ```yaml run_settings: models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` ### Multiple Providers ```yaml run_settings: models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` ## Provider Selection Guide ### By Budget **Ultra Low Cost (\<$0.005 per query):** - Google Gemini 2.0 Flash - OpenAI gpt-4o-mini **Low Cost ($0.005-0.01 per query):** - Mistral mistral-small - Anthropic Claude 3.5 Haiku **Medium Cost ($0.01-0.05 per query):** - OpenAI gpt-4o - Anthropic Claude 3.5 Sonnet - Perplexity Sonar Pro **High Cost (>$0.05 per query):** - Anthropic Claude 3.5 Opus - Grok grok-3 - OpenAI gpt-4-turbo ### By Feature **Web Search Required:** - βœ… OpenAI (with tools configuration) - βœ… Perplexity (built-in) **No Web Search:** - Anthropic, Mistral, Grok, Google **Grounded Responses:** - βœ… Perplexity (best) - βœ… OpenAI with web search **High Quality:** - Anthropic Claude 3.5 Sonnet/Opus - OpenAI gpt-4o - Perplexity Sonar Pro **Fast Response:** - OpenAI gpt-4o-mini - Google Gemini Flash - Mistral mistral-small ### By Use Case **Cost-Optimized Monitoring:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" ``` **High-Quality Analysis:** ```yaml models: - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" ``` **Multi-Provider Comparison:** ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" - provider: "perplexity" model_name: "sonar" ``` **Web Search Required:** ```yaml models: - provider: "perplexity" model_name: "sonar-pro" ``` ## API Key Setup ### OpenAI ```bash export OPENAI_API_KEY=sk-your-openai-key-here ``` Get key: https://platform.openai.com/api-keys ### Anthropic ```bash export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here ``` Get key: https://console.anthropic.com/ ### Mistral ```bash export MISTRAL_API_KEY=your-mistral-key-here ``` Get key: https://console.mistral.ai/ ### X.AI (Grok) ```bash export XAI_API_KEY=xai-your-grok-key-here ``` Get key: https://console.x.ai/ ### Google Gemini ```bash export GOOGLE_API_KEY=AIza-your-google-api-key-here ``` Get key: https://aistudio.google.com/apikey ### Perplexity ```bash export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here ``` Get key: https://www.perplexity.ai/settings/api ## Provider Comparison ### Response Quality **Best to Good:** 1. Anthropic Claude 3.5 Opus 1. Anthropic Claude 3.5 Sonnet 1. OpenAI gpt-4o 1. Perplexity Sonar Pro 1. Mistral mistral-large 1. Grok grok-3 1. Anthropic Claude 3.5 Haiku 1. OpenAI gpt-4o-mini 1. Google Gemini 2.0 Flash 1. Mistral mistral-small ### Cost Efficiency **Best value (quality per dollar):** 1. OpenAI gpt-4o-mini 1. Google Gemini 2.0 Flash 1. Anthropic Claude 3.5 Haiku 1. Mistral mistral-small 1. Perplexity Sonar ### Speed **Fastest to Slowest:** 1. Google Gemini Flash 1. OpenAI gpt-4o-mini 1. Mistral models 1. Perplexity Sonar 1. Anthropic Haiku 1. OpenAI gpt-4o 1. Anthropic Sonnet 1. Grok models 1. Anthropic Opus ## Rate Limits Default rate limits (check provider docs for current limits): | Provider | Requests/Min | Tokens/Min | | ---------- | ------------ | ---------- | | OpenAI | 500 | 90,000 | | Anthropic | 50 | 100,000 | | Mistral | 5-60 | Varies | | X.AI | 60 | 120,000 | | Google | 15 | 32,000 | | Perplexity | 20 | Varies | **Recommendation:** Add delays between queries if hitting rate limits: ```yaml run_settings: delay_between_queries: 2 # seconds ``` ## Provider-Specific Features ### OpenAI - βœ… Web search via tools - βœ… Function calling - βœ… JSON mode - βœ… Vision support (not used) See [OpenAI Provider](../openai/) ### Anthropic - βœ… Extended context (200K tokens) - βœ… Function calling - βœ… JSON mode - βœ… Thinking mode (not used) See [Anthropic Provider](../anthropic/) ### Mistral - βœ… European data residency - βœ… Function calling - βœ… JSON mode - βœ… Competitive pricing See [Mistral Provider](../mistral/) ### X.AI (Grok) - βœ… X platform integration - βœ… OpenAI-compatible API - βœ… Real-time information - ⚠️ Limited model selection See [Grok Provider](../grok/) ### Google - βœ… Very low cost - βœ… Fast responses - βœ… Long context (1M tokens) - ⚠️ Newer platform See [Google Provider](../google/) ### Perplexity - βœ… Built-in web search - βœ… Grounded responses - βœ… Citations included - βœ… Real-time information - ⚠️ Request fees (not in cost estimate) See [Perplexity Provider](../perplexity/) ## Multi-Provider Strategies ### Strategy 1: Cost vs Quality Cheap model for volume, expensive for quality: ```yaml models: # High volume, low cost - provider: "openai" model_name: "gpt-4o-mini" # Occasional high-quality check - provider: "anthropic" model_name: "claude-3-5-sonnet-20241022" enabled_for: ["critical-intent"] ``` ### Strategy 2: Provider Diversity Avoid single-provider dependency: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" - provider: "google" model_name: "gemini-2.0-flash-exp" ``` ### Strategy 3: Web Search + Standard ```yaml models: # Standard queries - provider: "openai" model_name: "gpt-4o-mini" # Web-search enabled - provider: "perplexity" model_name: "sonar-pro" ``` ## Common Issues ### API Key Errors ```text ❌ API key not found: OPENAI_API_KEY ``` **Solution:** ```bash export OPENAI_API_KEY=sk-your-key-here ``` ### Rate Limit Exceeded ```text ⚠️ Rate limit exceeded for openai/gpt-4o-mini ``` **Solutions:** 1. Add delay: `delay_between_queries: 2` 1. Reduce concurrent requests 1. Upgrade API tier ### Model Not Found ```text ❌ Model not found: gpt-4-mini ``` **Solution:** Use correct model name: `gpt-4o-mini` See provider docs for valid models. ### Authentication Failed ```text ❌ Authentication failed: Invalid API key ``` **Solutions:** 1. Check key spelling 1. Regenerate key at provider console 1. Verify key has correct permissions ## Browser vs API Access ### Two Ways to Access Providers Starting in v0.2.0, LLM Answer Watcher supports **two access methods** for supported providers: | Access Method | Providers | How It Works | Use Cases | | ------------------------- | ------------------- | ---------------------------------- | ------------------------------------------------- | | **API Access** | All 6 providers | Direct API calls with your API key | Production monitoring, cost-optimized, fast | | **Browser Access (BETA)** | ChatGPT, Perplexity | Headless browser via Steel API | True user experience, screenshots, web UI testing | ### Key Differences **API Access:** - βœ… Faster (no browser overhead) - βœ… Accurate cost tracking - βœ… Token usage metrics - βœ… Programmatic control - ❌ May differ from web UI behavior - ❌ No visual evidence **Browser Access:** - βœ… Captures actual user experience - βœ… Screenshots and HTML snapshots - βœ… Tests web UI behavior - βœ… Free tier usage (no API costs) - ❌ Slower (10-30s overhead) - ❌ No cost tracking yet (shows $0.00) - ❌ Subject to UI changes ### When to Use Each **Use API Access when:** - You need fast, automated monitoring - Cost tracking is important - You're running high-volume queries - You need programmatic control **Use Browser Access when:** - You want to verify web UI behavior - You need visual evidence (screenshots) - You're testing free tier experience - You want to see what actual users see ### Example: Comparing Both ```yaml runners: # API access for production monitoring - runner_plugin: "api" config: provider: "openai" model_name: "gpt-4o-mini" api_key: "${OPENAI_API_KEY}" # Browser access to verify web UI - runner_plugin: "steel-chatgpt" config: steel_api_key: "${STEEL_API_KEY}" take_screenshots: true ``` This configuration runs the same query through both methods, letting you compare: - Does the API response match what users see in ChatGPT? - Are citations/sources displayed differently? - Does the web UI recommend different brands? See [Browser Runners Guide](../../BROWSER_RUNNERS/) for complete details. ## Next Steps - **OpenAI** ______________________________________________________________________ Complete OpenAI provider guide [OpenAI Provider β†’](../openai/) - **Anthropic** ______________________________________________________________________ Claude models documentation [Anthropic Provider β†’](../anthropic/) - **Perplexity** ______________________________________________________________________ Grounded LLMs with web search [Perplexity Provider β†’](../perplexity/) - **Browser Runners** ______________________________________________________________________ Web UI automation guide [Browser Runners β†’](../../BROWSER_RUNNERS/) # OpenAI Provider Integration with OpenAI's GPT models. ## Supported Models - `gpt-4o` - Latest GPT-4 Optimized - `gpt-4o-mini` - Cost-effective model (recommended) - `gpt-4-turbo` - Fast GPT-4 - `gpt-3.5-turbo` - Legacy model ## Configuration ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` ## Getting API Key 1. Visit [platform.openai.com](https://platform.openai.com/api-keys) 1. Create new secret key 1. Export: `export OPENAI_API_KEY=sk-your-key` ## Pricing - **gpt-4o-mini**: $0.15/1M input, $0.60/1M output - **gpt-4o**: $2.50/1M input, $10/1M output ## Web Search Tool OpenAI supports web search through the `web_search` tool in the Responses API. ### Basic Web Search Configuration ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" tools: - type: "web_search" tool_choice: "auto" # Model decides when to search ``` ### Tool Choice Options - **`auto`** (recommended): Model decides when web search is needed - **`required`**: Force web search for every query - **`none`**: Disable web search ### Web Search Pricing Web search adds **$10 per 1,000 calls** plus content token costs. **Example cost** (gpt-4o-mini): ```text Base query: $0.0004 (tokens only) + Web search call: $0.01 + Search content: $0.0012 (8k tokens @ $0.15/1M) = Total: ~$0.0116 per query ``` ### When to Use OpenAI Web Search **Use OpenAI when**: - βœ… Need explicit `tool_choice` control - βœ… Prefer OpenAI's LLM reasoning quality - βœ… Already invested in OpenAI ecosystem **Consider alternatives**: - **Google Gemini grounding**: 290x cheaper (~$0.00004 vs $0.0116) - **Perplexity**: Built-in citations, always-on search See [Web Search Configuration](../../user-guide/configuration/web-search/) for detailed setup and comparison. ## Further Reading - [Web Search Configuration](../../user-guide/configuration/web-search/) - Detailed web search setup - [Model Configuration](../../user-guide/configuration/models/) - Model selection guide - [Providers Overview](../overview/) - Compare all providers # Anthropic Provider Integration with Anthropic's Claude models. ## Supported Models - `claude-3-5-sonnet-20241022` - Latest Sonnet - `claude-3-5-haiku-20241022` - Fast and affordable - `claude-3-opus-20240229` - Most capable ## Configuration ```yaml models: - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" ``` ## Getting API Key 1. Visit [console.anthropic.com](https://console.anthropic.com/) 1. Get your API key 1. Export: `export ANTHROPIC_API_KEY=sk-ant-your-key` ## Pricing - **Haiku**: $0.80/1M input, $4/1M output - **Sonnet**: $3/1M input, $15/1M output See [Providers Overview](../overview/) for comparison. # Mistral AI Provider Integration with Mistral's models. ## Supported Models - `mistral-large-latest` - `mistral-small-latest` ## Configuration ```yaml models: - provider: "mistral" model_name: "mistral-small-latest" env_api_key: "MISTRAL_API_KEY" ``` ## Getting API Key 1. Visit [console.mistral.ai](https://console.mistral.ai/) 1. Generate API key 1. Export: `export MISTRAL_API_KEY=your-key` See [Providers Overview](../overview/) for comparison. # X.AI Grok Provider Integration with X.AI's Grok models. ## Supported Models - `grok-beta` - `grok-2-1212` - `grok-2-latest` - `grok-3` - `grok-3-mini` ## Configuration ```yaml models: - provider: "grok" model_name: "grok-2-1212" env_api_key: "XAI_API_KEY" ``` ## Getting API Key 1. Visit [x.ai/api](https://x.ai/api) 1. Get API access 1. Export: `export XAI_API_KEY=xai-your-key` See [Providers Overview](../overview/) for comparison. # Google Gemini Provider Integration with Google's Gemini models, including support for Google Search grounding. ## Overview Google Gemini is a family of multimodal AI models that excels at understanding and generating text. Gemini models are available through Google AI Studio and support **Google Search grounding** for real-time web information. **Key Features**: - **Google Search Grounding**: Access real-time web data with no additional per-request fees - **Competitive Pricing**: Among the most cost-effective LLMs with high quality - **Automatic Search Decision**: Gemini intelligently decides when to use Google Search - **Grounding Metadata**: Rich attribution showing which sources influenced responses ## Supported Models ### Gemini 2.5 Series (Recommended) | Model | Speed | Quality | Grounding | Best For | | ----------------------- | ------- | ------- | --------- | -------------------------------------------- | | `gemini-2.5-flash` | Fast | High | βœ… Yes | **Production** - balanced speed/quality/cost | | `gemini-2.5-flash-lite` | Fastest | Medium | ❌ No | High-volume, non-grounded queries | | `gemini-2.5-pro` | Slower | Highest | βœ… Yes | Complex reasoning, highest quality | ### Gemini 2.0 Series | Model | Speed | Quality | Grounding | Best For | | ----------------------- | ------- | ------- | --------------- | -------------------------- | | `gemini-2.0-flash-exp` | Fast | High | ⚠️ Experimental | Testing new features | | `gemini-2.0-flash-lite` | Fastest | Medium | ❌ No | Fast, non-grounded queries | ### Legacy Models (Not Recommended) - `gemini-1.5-pro` - Superseded by 2.5-pro - `gemini-1.5-flash` - Superseded by 2.5-flash **Recommendation**: Use `gemini-2.5-flash` for production workloads. It provides excellent performance with Google Search grounding support at competitive pricing. ## Basic Configuration ### Without Google Search Grounding Standard Gemini usage with training data only: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash-lite" env_api_key: "GEMINI_API_KEY" ``` **Use when**: - You don't need real-time information - Faster response times are critical - Cost optimization is priority ### With Google Search Grounding Enable real-time web information: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} ``` **Use when**: - Brand monitoring requires current data - Tracking real-time competitive landscape - Need to detect recent changes - Want Google's search quality ## Google Search Grounding ### Configuration Format Google uses a unique tools configuration format: ```yaml tools: - google_search: {} # Dictionary with tool name as key ``` This differs from OpenAI's format: ```yaml tools: - type: "web_search" # Dictionary with 'type' field tool_choice: "auto" ``` **Why the difference?** Each provider has different API specifications. Google uses named tool objects, OpenAI uses typed specifications. The config does direct passthrough to each API. ### Supported Models for Grounding | Model | Grounding Support | | ----------------------- | ---------------------------- | | `gemini-2.5-flash` | βœ… **Yes** (recommended) | | `gemini-2.5-flash-lite` | ❌ No | | `gemini-2.5-pro` | βœ… **Yes** (highest quality) | | `gemini-2.0-flash-exp` | ⚠️ Experimental | | `gemini-2.0-flash-lite` | ❌ No | ### System Prompt Optimization Use the specialized `google/gemini-grounding` system prompt for best results: ```yaml models: - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" # Optimized for grounding tools: - google_search: {} ``` This prompt: - Instructs Gemini to use Google Search when beneficial - Emphasizes grounding responses in search results - Requests comprehensive source coverage - Improves answer quality for brand monitoring ### How Grounding Works 1. Gemini receives your query prompt 1. **Automatically decides** if Google Search would improve the answer 1. Performs search if beneficial (no `tool_choice` parameter needed) 1. Grounds response in search results 1. Returns answer with grounding metadata **No explicit control**: Unlike OpenAI's `tool_choice: "required"`, Gemini intelligently determines when grounding helps. This is intentional - Gemini optimizes for quality and cost. ### Grounding Metadata Responses include rich grounding attribution: ```json { "web_search_results": { "web_search_queries": ["best email warmup tools 2025"], "grounding_chunks": [ { "web_source": "https://www.g2.com/categories/email-warmup", "retrieved_context": "Top email warmup tools..." } ], "grounding_supports": [ { "segment": { "text": "Warmly is a leading solution" }, "grounding_chunk_indices": [0], "confidence_scores": [0.95] } ] }, "web_search_count": 1 } ``` **Key fields**: - `web_search_queries`: What Gemini searched for - `grounding_chunks`: Source URLs and context - `grounding_supports`: Which text segments came from which sources - `confidence_scores`: How confident Gemini is (0.0-1.0) ## Pricing ### Token Costs | Model | Input | Output | | ----------------------- | ----------------- | ----------------- | | `gemini-2.5-flash` | $0.04 / 1M tokens | $0.12 / 1M tokens | | `gemini-2.5-flash-lite` | $0.02 / 1M tokens | $0.06 / 1M tokens | | `gemini-2.5-pro` | $0.60 / 1M tokens | $1.80 / 1M tokens | ### Google Search Grounding Costs **Good news**: No additional fees for grounding. You only pay token costs. **Example** (email warmup query with grounding): ```text Input: 100 tokens @ $0.04/1M = $0.000004 Output: 300 tokens @ $0.12/1M = $0.000036 Total: $0.00004 per query ``` **Comparison**: - **Gemini with grounding**: $0.00004 per query - **OpenAI web search**: $0.0116 per query (~290x more) - **Perplexity sonar-pro**: (0.005-)0.03 per query (125-750x more) Cost Advantage Google Search grounding is **significantly cheaper** than alternatives. Grounding tokens are included in base pricing with no per-request fees. ## Complete Configuration Example ### Multi-Model Strategy Use different models for different use cases: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: # High-volume: Fast + cheap without grounding - provider: "google" model_name: "gemini-2.5-flash-lite" env_api_key: "GEMINI_API_KEY" # Brand monitoring: Balanced with grounding - provider: "google" model_name: "gemini-2.5-flash" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} # Premium: Highest quality with grounding - provider: "google" model_name: "gemini-2.5-pro" env_api_key: "GEMINI_API_KEY" system_prompt: "google/gemini-grounding" tools: - google_search: {} brands: mine: - "Warmly" competitors: - "HubSpot" - "Instantly" intents: - id: "email-warmup-tools" prompt: "What are the best email warmup tools in 2025?" ``` ## Getting API Key 1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey) 1. Sign in with your Google account 1. Click "Create API key" 1. Copy the key (format: `AIza...`) 1. Export to environment: ```bash export GEMINI_API_KEY=AIza-your-key-here ``` API Key Security - Never commit API keys to version control - Use environment variables or secret management - Rotate keys periodically - Monitor usage in AI Studio dashboard ## When to Use Gemini ### Choose Gemini When: - βœ… **Cost optimization**: Among the cheapest high-quality LLMs - βœ… **Google Search quality**: Want Google's search coverage and accuracy - βœ… **High-volume monitoring**: Grounding with no per-request fees - βœ… **Automatic search decision**: Trust Gemini to decide when to ground - βœ… **Grounding metadata**: Need detailed source attribution ### Choose Other Providers When: **OpenAI**: - Need explicit `tool_choice` control (force/disable search) - Prefer OpenAI's reasoning quality - Already invested in OpenAI ecosystem **Perplexity**: - Need explicit source URLs in every response - Want always-on web search with citations - Prefer Perplexity's citation format **Anthropic**: - Need longest context windows (200K+) - Prefer Claude's reasoning style - Don't need web search ## Best Practices ### 1. Use Appropriate Model Tiers ```yaml # High-volume, non-grounded queries - model_name: "gemini-2.5-flash-lite" # Production brand monitoring (recommended) - model_name: "gemini-2.5-flash" tools: [google_search: {}] # Premium quality for critical queries - model_name: "gemini-2.5-pro" tools: [google_search: {}] ``` ### 2. Enable Grounding for Brand Monitoring ```yaml # βœ… GOOD - Grounding for current brand data - provider: "google" model_name: "gemini-2.5-flash" system_prompt: "google/gemini-grounding" tools: - google_search: {} intents: - id: "current-tools" prompt: "What are the best email tools in 2025?" ``` ### 3. Skip Grounding for Historical/Generic Queries ```yaml # βœ… GOOD - No grounding for general knowledge - provider: "google" model_name: "gemini-2.5-flash-lite" intents: - id: "email-best-practices" prompt: "What are email deliverability best practices?" ``` ### 4. Use Grounding-Optimized System Prompt ```yaml # βœ… GOOD system_prompt: "google/gemini-grounding" # Optimized # ❌ SUBOPTIMAL # (no system_prompt or using "google/default") ``` ### 5. Monitor Grounding Usage Track when Gemini uses grounding: ```python # Check if grounding was used if result["web_search_count"] > 0: print(f"Grounding used: {result['web_search_count']} searches") print(f"Queries: {result['web_search_results']['web_search_queries']}") ``` ## Troubleshooting ### Grounding Not Working **Problem**: `web_search_count` is always 0 **Solutions**: 1. Check you're using a grounding-capable model: ```yaml # βœ… Grounding supported model_name: "gemini-2.5-flash" # ❌ Grounding NOT supported model_name: "gemini-2.5-flash-lite" ``` 1. Verify tools configuration format: ```yaml # βœ… Correct tools: - google_search: {} # ❌ Wrong (OpenAI format) tools: - type: "web_search" ``` 1. Use grounding-optimized system prompt: ```yaml system_prompt: "google/gemini-grounding" ``` ### API Authentication Errors **Problem**: `401 Unauthorized` or `403 Forbidden` **Solutions**: 1. Verify API key is correct: ```bash echo $GEMINI_API_KEY # Should show AIza... ``` 1. Check key is active in [AI Studio](https://aistudio.google.com/app/apikey) 1. Verify key has correct permissions ### Rate Limiting **Problem**: `429 Too Many Requests` **Solutions**: 1. Reduce `max_concurrent_requests` in config: ```yaml run_settings: max_concurrent_requests: 3 # Google limit ``` 1. Add delay between requests 1. Upgrade to higher quota tier in AI Studio ## Further Reading - [Web Search Configuration](../../user-guide/configuration/web-search/) - Detailed grounding setup - [Model Configuration](../../user-guide/configuration/models/) - Model selection guide - [Providers Overview](../overview/) - Compare all providers - [Google AI Studio](https://aistudio.google.com) - Official documentation # Perplexity Provider Integration with Perplexity's search-grounded models. ## Supported Models - `sonar` - `sonar-pro` - `sonar-reasoning` - `sonar-reasoning-pro` - `sonar-deep-research` ## Configuration ```yaml models: - provider: "perplexity" model_name: "sonar-pro" env_api_key: "PERPLEXITY_API_KEY" ``` ## Getting API Key 1. Visit [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api) 1. Generate API key 1. Export: `export PERPLEXITY_API_KEY=pplx-your-key` ## Features - Built-in web search - Real-time information - Citations included See [Providers Overview](../overview/) for comparison. # Examples # Basic Monitoring Example A complete, production-ready guide for monitoring brand visibility across multiple LLM providers. ## Quick Start The easiest way to get started is with the pre-built examples: ### 1. Minimal Example (First-Time Users) **File**: [`examples/01-quickstart/minimal.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart/minimal.config.yaml) ```bash # Set API key export OPENAI_API_KEY="sk-..." # Run minimal example llm-answer-watcher run --config examples/01-quickstart/minimal.config.yaml # View results open ./output/*/report.html ``` **Cost**: ~$0.001 | **Time**: ~5 seconds ### 2. Real-World SaaS Monitoring **File**: [`examples/07-real-world/saas-brand-monitoring.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/saas-brand-monitoring.config.yaml) This template demonstrates complete production monitoring with: - Multiple providers for comprehensive coverage - Buyer-intent queries across different use cases - Budget controls and cost management - Competitor tracking ```bash # Set required API keys export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." # Run monitoring llm-answer-watcher run --config examples/07-real-world/saas-brand-monitoring.config.yaml ``` **Cost**: ~$0.05-0.20 per run depending on providers and intents ## Configuration Overview For a detailed explanation of each configuration option, see: - [`examples/01-quickstart/explained.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart/explained.config.yaml) - Same minimal config with inline comments ## Use Case Examples The examples directory includes ready-to-use templates: | Use Case | Example Config | Description | | ----------------------------- | ---------------------------------------------------- | -------------------------------- | | **Quick Testing** | `01-quickstart/minimal.config.yaml` | Single provider, single intent | | **Multi-Provider Comparison** | `02-providers/multi-provider-comparison.config.yaml` | Compare all 6 providers | | **Real-Time Data** | `03-web-search/websearch-comparison.config.yaml` | Web search across providers | | **High Accuracy** | `04-extraction/function-calling.config.yaml` | LLM-based brand extraction | | **Automated Insights** | `05-operations/content-strategy.config.yaml` | Generate content recommendations | | **Budget Controls** | `06-advanced/budget-controls.config.yaml` | Cost management features | | **Production Ready** | `07-real-world/saas-brand-monitoring.config.yaml` | Complete monitoring setup | ## Environment Setup Copy the environment template: ```bash cp examples/.env.example .env ``` Edit `.env` and add your API keys: ```bash OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... GEMINI_API_KEY=... MISTRAL_API_KEY=... GROK_API_KEY=xai-... PERPLEXITY_API_KEY=pplx-... ``` ## Understanding Output Each run creates a timestamped directory with: ```text output/2025-11-05T14-30-00Z/ β”œβ”€β”€ run_meta.json # Run summary and stats β”œβ”€β”€ report.html # Interactive HTML report β”œβ”€β”€ intent_*_raw_*.json # Raw LLM responses β”œβ”€β”€ intent_*_parsed_*.json # Extracted brand mentions └── intent_*_error_*.json # Error details (if any) ``` ### HTML Report Open the report in your browser: ```bash open ./output/2025-11-05T14-30-00Z/report.html ``` **Report includes:** - Summary statistics (costs, queries, mentions) - Brand mention tables with ranks - Rank distribution charts - Cost breakdown by provider - Raw LLM responses for verification ### JSON Results View structured output: ```bash # View run summary cat ./output/2025-11-05T14-30-00Z/run_meta.json | jq '.' # View specific intent results cat ./output/*/intent_best-email-warmup-tools_parsed_openai_gpt-4o-mini.json | jq '.' ``` ### SQLite Database All data is stored in SQLite for historical tracking: ```bash sqlite3 ./output/watcher.db # View latest run SELECT * FROM runs ORDER BY timestamp_utc DESC LIMIT 1; # View your brand mentions SELECT * FROM mentions WHERE is_mine = 1 ORDER BY timestamp_utc DESC; # Compare competitors SELECT brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE is_mine = 0 AND rank_position IS NOT NULL GROUP BY brand ORDER BY mentions DESC; ``` See [SQLite Database Guide](../../data-analytics/sqlite-database/) for more queries. ## Analyzing Results ### Check Brand Visibility ```sql -- Did we appear in any responses? SELECT intent_id, model_provider, model_name, brand, rank_position FROM mentions WHERE is_mine = 1 AND run_id = '2025-11-05T14-30-00Z' ORDER BY intent_id, rank_position; ``` ### Compare vs Competitors ```sql -- How do we rank vs competitors? SELECT brand, COUNT(*) as total_mentions, COUNT(DISTINCT intent_id) as intents_appeared, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND rank_position IS NOT NULL GROUP BY brand ORDER BY total_mentions DESC, avg_rank ASC; ``` ### Identify Gaps ```sql -- Which intents didn't mention us? SELECT DISTINCT intent_id FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND intent_id NOT IN ( SELECT DISTINCT intent_id FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND is_mine = 1 ); ``` ## Schedule Regular Monitoring ### Daily Cron Job See [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py) for a complete automation script. Basic cron setup: ```bash # Run daily at 9 AM 0 9 * * * cd /path/to/llm-answer-watcher && ./venv/bin/llm-answer-watcher run --config examples/07-real-world/saas-brand-monitoring.config.yaml --yes --quiet >> logs/monitoring.log 2>&1 ``` See [Automation Guide](../../user-guide/usage/automation/) for more options. ## Cost Analysis ### Actual Costs ```sql -- Total cost last 30 days SELECT SUM(total_cost_usd) as total_cost FROM runs WHERE timestamp_utc >= datetime('now', '-30 days'); -- Cost by provider SELECT model_provider, SUM(estimated_cost_usd) as provider_cost, COUNT(*) as queries FROM answers_raw WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider; ``` ### Cost Optimization **Budget Examples:** - **Minimal**: Use `01-quickstart/minimal.config.yaml` (~$0.001 per run) - **Budget-Constrained**: Use `06-advanced/budget-controls.config.yaml` (~$0.01 per run) - **Production**: Use `07-real-world/saas-brand-monitoring.config.yaml` (~$0.05-0.20 per run) See [Budget Controls](../../user-guide/configuration/budget/) for more details. ## Troubleshooting ### No Brand Mentions **Problem:** Your brand never appears **Solutions:** 1. Check brand aliases in your config: ```yaml brands: mine: - "YourBrand" - "YourBrand.io" - "YourBrand AI" - "yourbrand.com" # Add domain variations ``` 1. View raw responses to verify: ```bash cat output/*/intent_*_raw_*.json | jq '.answer_text' | grep -i "yourbrand" ``` 1. Try more specific prompts: ```yaml intents: - id: "branded-comparison" prompt: "Compare YourBrand vs Competitor for [use case]" ``` ### High Costs **Problem:** Costs exceed budget **Solutions:** 1. Use the budget-controls example: ```bash llm-answer-watcher run --config examples/06-advanced/budget-controls.config.yaml ``` 1. Switch to cheaper models: ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # Cheapest option ``` 1. Reduce intent count or providers ### Rate Limiting **Problem:** API rate limits hit **Solution:** Reduce concurrency: ```yaml run_settings: max_concurrent_requests: 1 # Sequential processing delay_between_queries: 2 # 2 second delay ``` ## Next Steps - **Multi-Provider Comparison** ______________________________________________________________________ Compare multiple LLM providers side-by-side [Multi-Provider Example β†’](../multi-provider/) - **Competitor Analysis** ______________________________________________________________________ Deep dive into competitor positioning [Competitor Analysis β†’](../competitor-analysis/) - **Historical Trends** ______________________________________________________________________ Track changes over time with SQL [Trends Analysis β†’](../../data-analytics/trends-analysis/) - **Automation** ______________________________________________________________________ Set up scheduled monitoring with cron or CI/CD [Automation Guide β†’](../../user-guide/usage/automation/) ## Additional Resources - **[Examples Directory](https://github.com/nibzard/llm-answer-watcher/tree/main/examples)** - All configuration examples - **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python automation scripts - **[Configuration Reference](../../reference/configuration-schema/)** - Complete config schema - **[Database Schema](../../reference/database-schema/)** - SQLite database structure # Multi-Provider Monitoring Compare how different LLM providers represent your brand. ## Quick Start The easiest way to compare multiple providers is with the pre-built multi-provider example: **File**: [`examples/02-providers/multi-provider-comparison.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers/multi-provider-comparison.config.yaml) ```bash # Set API keys for providers you want to test export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." export GEMINI_API_KEY="..." export MISTRAL_API_KEY="..." export GROK_API_KEY="xai-..." export PERPLEXITY_API_KEY="pplx-..." # Run comparison across all 6 providers llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml ``` **Cost**: ~$0.037 for 3 intents Γ— 6 providers = 18 queries ## Supported Providers All 6 providers are demonstrated in the `examples/02-providers/` directory: | Provider | Example Config | Model | Cost/Query | Notes | | -------------- | --------------------------- | -------------------- | ---------- | -------------------- | | **OpenAI** | `openai.config.yaml` | gpt-4o-mini | ~$0.0008 | Fastest, cheapest | | **Anthropic** | `anthropic.config.yaml` | claude-3-5-haiku | ~$0.002 | Great quality/price | | **Google** | `google-gemini.config.yaml` | gemini-2.0-flash-exp | ~$0.0005 | Very fast, free tier | | **Mistral** | `mistral.config.yaml` | mistral-large-latest | ~$0.003 | European provider | | **Grok** | `grok.config.yaml` | grok-beta | ~$0.005 | X.AI model | | **Perplexity** | `perplexity.config.yaml` | sonar | ~$0.001 | Built-in citations | See the [Providers README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers) for detailed documentation. ## Individual Provider Examples ### Test a Single Provider Each provider has its own example config for isolated testing: ```bash # OpenAI (recommended for first test) llm-answer-watcher run --config examples/02-providers/openai.config.yaml # Anthropic (Claude) llm-answer-watcher run --config examples/02-providers/anthropic.config.yaml # Google Gemini llm-answer-watcher run --config examples/02-providers/google-gemini.config.yaml # Mistral llm-answer-watcher run --config examples/02-providers/mistral.config.yaml # Grok llm-answer-watcher run --config examples/02-providers/grok.config.yaml # Perplexity llm-answer-watcher run --config examples/02-providers/perplexity.config.yaml ``` ## Configuration Example Here's a simplified multi-provider configuration: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: # Fast and cheap - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # High quality - provider: "anthropic" model_name: "claude-3-5-haiku-20241022" env_api_key: "ANTHROPIC_API_KEY" # Free tier available - provider: "google" model_name: "gemini-2.0-flash-exp" env_api_key: "GEMINI_API_KEY" brands: mine: ["YourBrand"] competitors: ["CompetitorA", "CompetitorB"] intents: - id: "best-tools" prompt: "What are the best tools in this category?" ``` ## Benefits of Multi-Provider Monitoring - **See which providers favor your brand** - Different LLMs have different training data and biases - **Identify provider-specific biases** - Track which providers consistently rank competitors higher - **Optimize for specific LLM platforms** - If your users primarily use ChatGPT, focus on OpenAI optimization - **Comprehensive coverage** - Different users use different LLMs, monitor them all ## Analyzing Multi-Provider Results ### Compare Brand Mentions Across Providers ```sql -- How often does each provider mention us? SELECT model_provider, COUNT(*) as total_queries, SUM(CASE WHEN EXISTS ( SELECT 1 FROM mentions m WHERE m.run_id = answers_raw.run_id AND m.intent_id = answers_raw.intent_id AND m.model_provider = answers_raw.model_provider AND m.is_mine = 1 ) THEN 1 ELSE 0 END) as queries_with_brand, ROUND(100.0 * SUM(CASE WHEN EXISTS ( SELECT 1 FROM mentions m WHERE m.run_id = answers_raw.run_id AND m.intent_id = answers_raw.intent_id AND m.model_provider = answers_raw.model_provider AND m.is_mine = 1 ) THEN 1 ELSE 0 END) / COUNT(*), 2) as mention_rate_pct FROM answers_raw WHERE run_id = '2025-11-05T14-30-00Z' GROUP BY model_provider ORDER BY mention_rate_pct DESC; ``` ### Compare Average Rankings by Provider ```sql -- Which provider ranks us highest? SELECT model_provider, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank, COUNT(*) as total_mentions FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND is_mine = 1 AND rank_position IS NOT NULL GROUP BY model_provider ORDER BY avg_rank ASC; ``` ### Provider Cost Comparison ```sql -- Cost efficiency by provider SELECT model_provider, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost_per_query, SUM(tokens_used) as total_tokens FROM answers_raw WHERE run_id = '2025-11-05T14-30-00Z' GROUP BY model_provider ORDER BY total_cost ASC; ``` ## Which Providers Should You Use? ### For Testing/Development - Use **OpenAI gpt-4o-mini** or **Google gemini-flash** (fastest, cheapest) ### For Production Monitoring - Use **multi-provider comparison** to see all perspectives - Track which providers consistently mention your brand ### For Specific Needs - **Best quality**: Anthropic claude-3-5-sonnet, OpenAI gpt-4o - **Cheapest**: Google gemini-flash, OpenAI gpt-4o-mini - **Fastest**: Google gemini-flash - **Citations**: Perplexity sonar - **European data**: Mistral - **Real-time data**: Grok (Twitter/X context) ## Provider-Specific Features See the [Providers README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers) for detailed documentation on each provider's unique features: - **OpenAI**: Web search via Responses API - **Anthropic**: Tool use, 200K context - **Google**: Search grounding - **Mistral**: Function calling - **Grok**: Twitter/X integration - **Perplexity**: Built-in web search ## Next Steps - **Add Web Search** ______________________________________________________________________ Enable real-time web search for current data [Web Search Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/03-web-search) - **Compare Results** ______________________________________________________________________ Analyze differences across providers [Basic Monitoring β†’](../basic-monitoring/) - **Provider Guides** ______________________________________________________________________ Deep dive into each provider's features [Provider Documentation β†’](../../providers/overview/) - **Advanced Config** ______________________________________________________________________ Budget controls, operations, extraction [Advanced Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced) ## Additional Resources - **[Examples Directory](https://github.com/nibzard/llm-answer-watcher/tree/main/examples)** - All configuration examples - **[Provider Comparison](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers)** - Detailed provider documentation - **[Provider Guides](../../providers/overview/)** - Complete provider reference docs # Competitor Analysis Track competitors comprehensively across multiple queries and LLM providers. ## Quick Start The best example for comprehensive competitive intelligence: **File**: [`examples/07-real-world/competitive-intelligence.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/competitive-intelligence.config.yaml) ```bash # Set API keys export OPENAI_API_KEY="sk-..." export ANTHROPIC_API_KEY="sk-ant-..." # Run competitive intelligence monitoring llm-answer-watcher run --config examples/07-real-world/competitive-intelligence.config.yaml ``` This template demonstrates: - Comprehensive competitor tracking across multiple providers - Diverse buyer-intent queries - Competitive positioning analysis - Rank comparison ## Use Case Templates The `examples/07-real-world/` directory includes several competitive analysis templates: | Template | Use Case | File | | ---------------------------- | --------------------------------------------------------- | -------------------------------------- | | **Competitive Intelligence** | Monitor how competitors are positioned | `competitive-intelligence.config.yaml` | | **Content Gap Analysis** | Find opportunities where competitors appear but you don't | `content-gap-analysis.config.yaml` | | **Brand Monitoring** | Track your brand vs competitors | `saas-brand-monitoring.config.yaml` | | **LLM SEO** | Optimize for LLM visibility | `llm-seo-optimization.config.yaml` | See the [Real-World Examples README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world) for details. ## Example Configuration Here's a simplified competitive analysis config: ```yaml run_settings: output_dir: "./output" sqlite_db_path: "./output/watcher.db" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" brands: mine: ["YourBrand"] # Comprehensive competitor list competitors: - "TopCompetitor" # Direct competitor #1 - "RisingStartup" # Emerging threat - "IndustryLeader" # Established player - "NichePlayer" # Specialized competitor - "AlternativeTool" # Adjacent category - "LegacyProvider" # Traditional option intents: # General category query - id: "best-overall" prompt: "What are the best tools in the category?" # Segment-specific queries - id: "for-startups" prompt: "Best tools for startups?" - id: "for-enterprise" prompt: "Best enterprise tools?" # Feature-specific queries - id: "affordable-options" prompt: "Most affordable tools?" - id: "easiest-to-use" prompt: "Which tools are easiest to use?" # Comparison queries - id: "vs-leader" prompt: "How does YourBrand compare to TopCompetitor?" ``` ## Analyzing Competitive Results ### 1. Competitor Appearance Frequency ```sql -- How often does each competitor appear? SELECT brand, COUNT(*) as total_mentions, COUNT(DISTINCT intent_id) as intents_appeared, ROUND(100.0 * COUNT(DISTINCT intent_id) / ( SELECT COUNT(DISTINCT intent_id) FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' ), 2) as coverage_pct FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND normalized_name != 'yourbrand' AND is_mine = 0 GROUP BY brand ORDER BY total_mentions DESC; ``` **Example output:** ```text TopCompetitor | 12 | 5 | 83.33% IndustryLeader | 9 | 4 | 66.67% RisingStartup | 6 | 3 | 50.00% YourBrand | 5 | 3 | 50.00% ``` **Interpretation:** - TopCompetitor appears most frequently (83% of intents) - You're tied with RisingStartup (50% coverage) - Opportunity: Increase visibility in missing intent categories ### 2. Average Rankings by Competitor ```sql -- Compare average rank positions SELECT brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank, MAX(rank_position) as worst_rank FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND rank_position IS NOT NULL GROUP BY brand ORDER BY avg_rank ASC; ``` **Example output:** ```text TopCompetitor | 12 | 1.8 | 1 | 4 YourBrand | 5 | 2.4 | 1 | 5 IndustryLeader | 9 | 2.9 | 1 | 6 RisingStartup | 6 | 3.2 | 2 | 5 ``` **Interpretation:** - TopCompetitor has best average rank (1.8) - You rank 2.4 on average (room for improvement) - Focus on improving from #2-3 to #1 ### 3. Head-to-Head Comparisons ```sql -- When you both appear, who ranks higher? SELECT m1.intent_id, m1.brand as your_brand, m1.rank_position as your_rank, m2.brand as competitor_brand, m2.rank_position as competitor_rank, CASE WHEN m1.rank_position < m2.rank_position THEN 'You win' WHEN m1.rank_position > m2.rank_position THEN 'Competitor wins' ELSE 'Tie' END as outcome FROM mentions m1 JOIN mentions m2 ON m1.run_id = m2.run_id AND m1.intent_id = m2.intent_id AND m1.model_provider = m2.model_provider AND m1.model_name = m2.model_name WHERE m1.run_id = '2025-11-05T14-30-00Z' AND m1.is_mine = 1 AND m2.brand = 'TopCompetitor' AND m1.rank_position IS NOT NULL AND m2.rank_position IS NOT NULL ORDER BY m1.intent_id; ``` ### 4. Identify Content Gaps ```sql -- Which intents do competitors appear in but you don't? SELECT intent_id, GROUP_CONCAT(DISTINCT brand) as competitors_mentioned FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND is_mine = 0 AND intent_id NOT IN ( SELECT DISTINCT intent_id FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND is_mine = 1 ) GROUP BY intent_id; ``` **Example output:** ```text for-enterprise | TopCompetitor, IndustryLeader affordable-options | RisingStartup, NichePlayer ``` **Interpretation:** - You're missing in "enterprise" queries β†’ Create enterprise content - Missing in "affordable" queries β†’ Highlight pricing ### 5. Provider-Specific Competitive Positioning ```sql -- Which providers favor which competitors? SELECT model_provider, brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND rank_position IS NOT NULL GROUP BY model_provider, brand ORDER BY model_provider, avg_rank ASC; ``` ## Competitive Monitoring Strategies ### 1. Daily Competitive Tracking Monitor key competitors daily: ```bash # Run competitive intelligence llm-answer-watcher run --config examples/07-real-world/competitive-intelligence.config.yaml --yes --quiet # Analyze changes python examples/code-examples/analyze_results.py ``` See [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py) for automation. ### 2. Weekly Deep Dives Run comprehensive analysis weekly: ```bash # Multi-provider comparison llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml # With web search for current data llm-answer-watcher run --config examples/03-web-search/websearch-comparison.config.yaml ``` ### 3. Content Gap Analysis Identify where competitors appear but you don't: ```bash llm-answer-watcher run --config examples/07-real-world/content-gap-analysis.config.yaml ``` ### 4. Sentiment Comparison Track how you're described vs competitors: ```bash llm-answer-watcher run --config examples/04-extraction/sentiment-analysis.config.yaml ``` ## Competitive Intelligence Dashboard ### Key Metrics to Track 1. **Mention Rate**: % of queries where you appear 1. **Win Rate**: % of head-to-head comparisons where you rank higher 1. **Average Rank**: Your mean position when mentioned 1. **Coverage Gap**: Intents where competitors appear but you don't 1. **Provider Bias**: Which LLMs favor which brands ### SQL Dashboard Query ```sql -- Comprehensive competitive dashboard WITH competitor_stats AS ( SELECT brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank, COUNT(DISTINCT intent_id) as intent_coverage FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' AND rank_position IS NOT NULL GROUP BY brand ) SELECT brand, mentions, ROUND(avg_rank, 2) as avg_rank, best_rank, intent_coverage, ROUND(100.0 * intent_coverage / ( SELECT COUNT(DISTINCT intent_id) FROM mentions WHERE run_id = '2025-11-05T14-30-00Z' ), 1) as coverage_pct FROM competitor_stats ORDER BY mentions DESC, avg_rank ASC; ``` ## Actionable Insights ### If a Competitor Consistently Ranks Higher 1. **Analyze their positioning**: Read raw responses to understand why 1. **Create targeted content**: Address the specific use cases they dominate 1. **Monitor trends**: Track if gap is widening or narrowing ### If You're Missing in Key Intents 1. **Update your content**: Create pages targeting those queries 1. **Adjust brand aliases**: Add variations that LLMs might use 1. **Test different prompts**: Try alternative phrasings ### If Provider Bias Exists 1. **Optimize for specific LLMs**: If users primarily use ChatGPT, focus there 1. **Diversify content**: Different LLMs have different preferences 1. **Track changes**: Monitor if bias shifts over time ## Next Steps - **Content Gap Analysis** ______________________________________________________________________ Find opportunities where competitors appear but you don't [Content Gap Template β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/content-gap-analysis.config.yaml) - **Historical Trends** ______________________________________________________________________ Track competitive position over time [Trends Analysis β†’](../../data-analytics/trends-analysis/) - **Automate Monitoring** ______________________________________________________________________ Set up daily competitive tracking [Automation Guide β†’](../../user-guide/usage/automation/) - **Operations** ______________________________________________________________________ Generate competitive insights automatically [Operations Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/05-operations) ## Additional Resources - **[Real-World Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world)** - Complete use case templates - **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python analysis scripts - **[Database Queries](../../data-analytics/query-examples/)** - More SQL query examples - **[Trends Analysis](../../data-analytics/trends-analysis/)** - Historical tracking guide # Budget-Constrained Monitoring Minimize costs while maintaining monitoring quality. ## Quick Start The best example for cost-optimized monitoring: **File**: [`examples/06-advanced/budget-controls.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced/budget-controls.config.yaml) ```bash # Set API key export OPENAI_API_KEY="sk-..." # Run with budget controls llm-answer-watcher run --config examples/06-advanced/budget-controls.config.yaml ``` **Features:** - Strict budget limits (abort if exceeded) - Warning thresholds - Cost-effective model selection - Optimized configuration ## Budget Control Options ### Example Configuration ```yaml run_settings: output_dir: "./output" models: # Use cheapest effective model - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" # Regex extraction (no extra LLM calls) use_llm_rank_extraction: false # Set budget limits budget: enabled: true max_per_run_usd: 0.10 # Abort if total exceeds 10 cents warn_threshold_usd: 0.05 # Warn at 5 cents max_per_intent_usd: 0.02 # Abort if single intent exceeds 2 cents brands: mine: ["YourBrand"] # Focus on top 3 competitors only competitors: ["TopCompetitor1", "TopCompetitor2", "TopCompetitor3"] intents: # Single most valuable intent - id: "main-query" prompt: "What are the best tools in my category?" ``` ## Cost Optimization Strategies ### 1. Use Cheapest Models | Provider | Model | Cost per 1M input tokens | Recommended for | | -------------- | -------------------- | ------------------------ | --------------------- | | **Google** | gemini-2.0-flash-exp | Free tier available | Testing, development | | **OpenAI** | gpt-4o-mini | $0.15 | Production monitoring | | **Perplexity** | sonar | ~$0.20 | With web search | | **Anthropic** | claude-3-5-haiku | $0.80 | High quality needed | ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # Best value ``` ### 2. Minimize Intent Count Focus on highest-value buyer-intent queries: ```yaml intents: # Single most important query - id: "primary-buyer-intent" prompt: "What are the best [your category] tools?" # Optional: Add 1-2 more if budget allows # - id: "secondary-query" # prompt: "..." ``` ### 3. Use Regex Extraction (Not LLM) Disable LLM-based rank extraction to save costs: ```yaml run_settings: use_llm_rank_extraction: false # Use regex only (~85% accuracy) ``` This eliminates extra LLM calls for rank extraction. ### 4. Reduce Providers Start with 1-2 providers instead of all 6: ```yaml models: # Single provider for budget monitoring - provider: "openai" model_name: "gpt-4o-mini" ``` ### 5. Enable Budget Controls Set strict limits to prevent cost overruns: ```yaml budget: enabled: true max_per_run_usd: 0.05 # Hard limit warn_threshold_usd: 0.02 # Early warning ``` ## Cost Estimates ### Ultra-Minimal Config **Config**: `examples/01-quickstart/minimal.config.yaml` - 1 intent Γ— 1 model (gpt-4o-mini) - Cost: ~$0.001 per run - Monthly (daily): ~$0.03/month ### Budget-Constrained Config **Config**: `examples/06-advanced/budget-controls.config.yaml` - 3 intents Γ— 1 model (gpt-4o-mini) - Cost: ~$0.006 per run - Monthly (daily): ~$0.18/month ### Moderate Budget Config - 3 intents Γ— 2 models (gpt-4o-mini + claude-haiku) - Cost: ~$0.012 per run - Monthly (daily): ~$0.36/month ## Monitoring Actual Costs ### Track Costs in Database ```sql -- Total cost last 30 days SELECT SUM(total_cost_usd) as total_cost FROM runs WHERE timestamp_utc >= datetime('now', '-30 days'); -- Cost by provider SELECT model_provider, SUM(estimated_cost_usd) as provider_cost, COUNT(*) as queries, AVG(estimated_cost_usd) as avg_per_query FROM answers_raw WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider; -- Cost trend over time SELECT DATE(timestamp_utc) as date, SUM(total_cost_usd) as daily_cost FROM runs WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` ### Set Budget Alerts If costs exceed thresholds, the tool will: - **Warn** at `warn_threshold_usd` - **Abort** at `max_per_run_usd` Example output: ```text ⚠️ Warning: Cost approaching budget limit Current: $0.048 Limit: $0.05 Queries remaining: ~2 ``` ## Trade-offs: Cost vs. Features | Feature | Cost Impact | Alternative | | ----------------------- | ------------- | --------------------------- | | **LLM rank extraction** | +$0.001/query | Use regex (85% accuracy) | | **Web search** | +$0.01/query | Skip for non-time-sensitive | | **Operations** | +$0.005/query | Run separately when needed | | **Multiple providers** | Γ—N providers | Use 1-2 providers | | **Function calling** | +$0.001/query | Use regex extraction | ## When to Increase Budget Consider increasing your budget if: 1. **Low visibility**: Your brand rarely appears 1. Solution: Add more intents, try different phrasing 1. **Missing competitors**: Important competitors not tracked 1. Solution: Add more competitor brands 1. **Limited provider coverage**: Only testing 1 provider 1. Solution: Add 1-2 more providers for comparison 1. **Need real-time data**: Using stale LLM knowledge 1. Solution: Enable web search (see `examples/03-web-search/`) ## Free Tier Options ### Google Gemini Free Tier Google offers free tier for Gemini models: ```yaml models: - provider: "google" model_name: "gemini-2.0-flash-exp" env_api_key: "GEMINI_API_KEY" ``` **Free tier limits:** - 15 requests/minute - 1,500 requests/day - 1 million requests/month Perfect for testing and low-volume monitoring. See [`examples/02-providers/google-gemini.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers/google-gemini.config.yaml) ## Next Steps - **Cost Management** ______________________________________________________________________ Learn more about budget controls and cost optimization [Cost Management Guide β†’](../../user-guide/features/cost-management/) - **Start Minimal** ______________________________________________________________________ Try the absolute minimum config first [Quickstart Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart) - **Scale Up** ______________________________________________________________________ When ready, add more providers and features [Multi-Provider Example β†’](../multi-provider/) - **Track Costs** ______________________________________________________________________ Query cost history in SQLite [Database Guide β†’](../../data-analytics/sqlite-database/) ## Additional Resources - **[Budget Controls Example](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced/budget-controls.config.yaml)** - Complete budget config - **[Cost Management](../../user-guide/features/cost-management/)** - Full cost management documentation - **[Provider Pricing](../../providers/overview/)** - Compare provider costs # CI/CD Integration Integrate brand monitoring into your continuous integration pipeline. ## Quick Start See the automation examples: - **[automated_monitoring.py](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py)** - Complete Python script for scheduled monitoring - **[Code Examples README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - All automation examples ## GitHub Actions Example `.github/workflows/monitoring.yml`: ```yaml name: Brand Monitoring on: schedule: - cron: '0 9 * * *' # Daily at 9 AM UTC workflow_dispatch: # Allow manual triggers jobs: monitor: runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: '3.12' - name: Install uv run: | curl -LsSf https://astral.sh/uv/install.sh | sh echo "$HOME/.cargo/bin" >> $GITHUB_PATH - name: Install dependencies run: | uv sync - name: Run monitoring env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | uv run llm-answer-watcher run \ --config examples/07-real-world/saas-brand-monitoring.config.yaml \ --yes \ --format json - name: Upload results uses: actions/upload-artifact@v4 with: name: monitoring-results-${{ github.run_id }} path: | output/ !output/*.db - name: Check for visibility drops run: | # Custom script to analyze results and alert on issues python examples/code-examples/analyze_results.py ``` ## Exit Code Handling The CLI returns specific exit codes that can be used in CI/CD: | Exit Code | Meaning | Action | | --------- | ------------------- | ---------------------------------- | | `0` | Success | All queries completed successfully | | `1` | Configuration error | Fix config file or API keys | | `2` | Database error | Check database path/permissions | | `3` | Partial failure | Some queries failed, investigate | | `4` | Complete failure | All queries failed, critical issue | ### Example with Exit Code Handling ```yaml - name: Run monitoring id: monitor run: | uv run llm-answer-watcher run --config config.yaml --yes --format json echo "exit_code=$?" >> $GITHUB_OUTPUT continue-on-error: true - name: Check result run: | if [ "${{ steps.monitor.outputs.exit_code }}" == "0" ]; then echo "βœ… Monitoring completed successfully" elif [ "${{ steps.monitor.outputs.exit_code }}" == "3" ]; then echo "⚠️ Partial failure - some queries failed" exit 0 # Don't fail the workflow else echo "❌ Monitoring failed with exit code ${{ steps.monitor.outputs.exit_code }}" exit 1 fi ``` ## Python Automation Script Complete automation example with notifications: ```python #!/usr/bin/env python3 """ Automated brand monitoring with Slack notifications. See: examples/code-examples/automated_monitoring.py for full implementation """ import subprocess import json import sqlite3 from datetime import datetime def run_monitoring(): """Run LLM Answer Watcher.""" result = subprocess.run([ "llm-answer-watcher", "run", "--config", "examples/07-real-world/saas-brand-monitoring.config.yaml", "--yes", "--format", "json" ], capture_output=True, text=True) return result.returncode, json.loads(result.stdout) def check_visibility_drop(db_path, threshold=0.5): """Check if brand visibility has dropped.""" conn = sqlite3.connect(db_path) cursor = conn.cursor() # Get recent visibility rate cursor.execute(""" SELECT COUNT(DISTINCT CASE WHEN is_mine = 1 THEN intent_id END) * 1.0 / COUNT(DISTINCT intent_id) as visibility_rate FROM mentions WHERE run_id IN ( SELECT run_id FROM runs ORDER BY timestamp_utc DESC LIMIT 1 ) """) current_rate = cursor.fetchone()[0] or 0 conn.close() return current_rate < threshold def send_slack_alert(message): """Send alert to Slack (implement based on your setup).""" # See examples/code-examples/ for Slack integration pass def main(): exit_code, results = run_monitoring() if exit_code == 0: print(f"βœ… Monitoring completed: {results['run_id']}") # Check for visibility drops if check_visibility_drop(results['sqlite_db_path']): send_slack_alert("⚠️ Brand visibility has dropped below 50%") else: print(f"❌ Monitoring failed with exit code {exit_code}") send_slack_alert(f"Monitoring failed: {exit_code}") if __name__ == "__main__": main() ``` **Full implementation**: [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py) ## Cron Job Setup ### Daily Monitoring ```bash # Edit crontab crontab -e # Add this line (runs daily at 9 AM) 0 9 * * * cd /path/to/llm-answer-watcher && .venv/bin/python examples/code-examples/automated_monitoring.py >> logs/cron.log 2>&1 ``` ### Weekly Report ```bash # Weekly comprehensive analysis (Mondays at 9 AM) 0 9 * * 1 cd /path/to/llm-answer-watcher && .venv/bin/llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml --yes --quiet ``` ## Docker Integration ### Dockerfile ```dockerfile FROM python:3.12-slim WORKDIR /app # Install uv RUN pip install uv # Copy project COPY . . # Install dependencies RUN uv sync # Set entrypoint ENTRYPOINT ["uv", "run", "llm-answer-watcher"] CMD ["--help"] ``` ### Docker Compose ```yaml version: '3.8' services: monitoring: build: . environment: - OPENAI_API_KEY=${OPENAI_API_KEY} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./output:/app/output command: > run --config examples/07-real-world/saas-brand-monitoring.config.yaml --yes --format json ``` Run with: ```bash docker-compose run monitoring ``` ## Monitoring Multiple Brands ### Matrix Strategy in GitHub Actions ```yaml jobs: monitor: runs-on: ubuntu-latest strategy: matrix: brand: - brand-a - brand-b - brand-c steps: - name: Checkout uses: actions/checkout@v4 - name: Run monitoring for ${{ matrix.brand }} env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | llm-answer-watcher run \ --config configs/${{ matrix.brand }}.config.yaml \ --yes ``` ## Data Export and Analysis ### Export Results to CSV See [`examples/code-examples/export_to_csv.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/export_to_csv.py): ```python #!/usr/bin/env python3 """Export monitoring results to CSV for analysis.""" import sqlite3 import csv def export_mentions_to_csv(db_path, output_path): """Export mentions table to CSV.""" conn = sqlite3.connect(db_path) cursor = conn.cursor() cursor.execute(""" SELECT run_id, intent_id, model_provider, brand, rank_position, is_mine, timestamp_utc FROM mentions ORDER BY timestamp_utc DESC """) with open(output_path, 'w', newline='') as f: writer = csv.writer(f) writer.writerow(['run_id', 'intent_id', 'provider', 'brand', 'rank', 'is_mine', 'timestamp']) writer.writerows(cursor.fetchall()) conn.close() if __name__ == "__main__": export_mentions_to_csv("output/watcher.db", "output/mentions.csv") ``` ### Analyze Results See [`examples/code-examples/analyze_results.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/analyze_results.py): ```python #!/usr/bin/env python3 """Analyze monitoring results and generate insights.""" import json import sqlite3 def analyze_latest_run(db_path): """Analyze the most recent monitoring run.""" conn = sqlite3.connect(db_path) cursor = conn.cursor() # Get latest run cursor.execute("SELECT run_id FROM runs ORDER BY timestamp_utc DESC LIMIT 1") run_id = cursor.fetchone()[0] # Calculate metrics cursor.execute(""" SELECT COUNT(DISTINCT CASE WHEN is_mine = 1 THEN intent_id END) as my_coverage, COUNT(DISTINCT intent_id) as total_intents, AVG(CASE WHEN is_mine = 1 THEN rank_position END) as my_avg_rank FROM mentions WHERE run_id = ? """, (run_id,)) my_coverage, total_intents, my_avg_rank = cursor.fetchone() conn.close() print(f"πŸ“Š Analysis for {run_id}") print(f" Coverage: {my_coverage}/{total_intents} intents ({my_coverage/total_intents*100:.1f}%)") print(f" Average rank: {my_avg_rank:.2f}") if __name__ == "__main__": analyze_latest_run("output/watcher.db") ``` ## Alerting and Notifications ### Slack Webhook Integration ```python import requests def send_slack_notification(webhook_url, message): """Send notification to Slack.""" payload = { "text": message, "blocks": [ { "type": "section", "text": {"type": "mrkdwn", "text": message} } ] } requests.post(webhook_url, json=payload) # Usage if brand_visibility_dropped: send_slack_notification( os.getenv("SLACK_WEBHOOK_URL"), "⚠️ Brand visibility dropped to 30% (threshold: 50%)" ) ``` ### Email Alerts ```python import smtplib from email.message import EmailMessage def send_email_alert(subject, body): """Send email alert.""" msg = EmailMessage() msg['Subject'] = subject msg['From'] = 'monitoring@yourdomain.com' msg['To'] = 'team@yourdomain.com' msg.set_content(body) with smtplib.SMTP('smtp.gmail.com', 587) as smtp: smtp.starttls() smtp.login(os.getenv('EMAIL_USER'), os.getenv('EMAIL_PASS')) smtp.send_message(msg) ``` ## Best Practices ### 1. Store Secrets Securely Use environment variables or secret managers: ```yaml env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} # Never hardcode API keys! ``` ### 2. Rate Limiting Avoid hitting API rate limits: ```yaml run_settings: max_concurrent_requests: 2 delay_between_queries: 1 ``` ### 3. Cost Controls Enable budget limits in CI/CD: ```yaml budget: enabled: true max_per_run_usd: 0.50 # Prevent runaway costs ``` ### 4. Artifact Retention Upload results but manage storage: ```yaml - name: Upload results uses: actions/upload-artifact@v4 with: name: results path: output/ retention-days: 30 # Auto-delete after 30 days ``` ## Next Steps - **Code Examples** ______________________________________________________________________ Explore all automation scripts [Code Examples β†’](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples) - **Automation Guide** ______________________________________________________________________ Complete automation documentation [Automation Guide β†’](../../user-guide/usage/automation/) - **Database Queries** ______________________________________________________________________ SQL examples for analysis [Query Examples β†’](../../data-analytics/query-examples/) - **Trends Analysis** ______________________________________________________________________ Track changes over time [Trends Guide β†’](../../data-analytics/trends-analysis/) ## Additional Resources - **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python automation scripts - **[Automation Guide](../../user-guide/usage/automation/)** - Complete automation documentation - **[CLI Reference](../../reference/cli-reference/)** - All CLI options and exit codes - **[Python API](../../reference/python-api/)** - Programmatic usage guide # Data & Analytics # Output Structure Understanding the file and directory structure of monitoring runs. ## Directory Layout ```text output/ β”œβ”€β”€ watcher.db # SQLite database └── YYYY-MM-DDTHH-MM-SSZ/ # Run directory β”œβ”€β”€ run_meta.json # Run summary β”œβ”€β”€ report.html # HTML report β”œβ”€β”€ intent_*_raw_*.json # Raw LLM responses β”œβ”€β”€ intent_*_parsed_*.json # Extracted data └── intent_*_error_*.json # Errors (if any) ``` ## File Descriptions ### `run_meta.json` Summary of the entire run with costs and stats. ### `report.html` Interactive HTML report with visualizations. ### `intent_*_raw_*.json` Raw LLM response with metadata. ### `intent_*_parsed_*.json` Extracted brand mentions and ranks. ### `watcher.db` SQLite database with all historical data. See [SQLite Database](../sqlite-database/) for database schema. # SQLite Database LLM Answer Watcher stores all monitoring data in a local SQLite database for historical tracking and trend analysis. ## Database Location Default path: `./output/watcher.db` Configure in `watcher.config.yaml`: ```yaml run_settings: sqlite_db_path: "./output/watcher.db" ``` ## Schema Overview The database has 4 main tables plus schema versioning: ```text schema_version β†’ Track database migrations runs β†’ One row per CLI execution answers_raw β†’ Full LLM responses with metadata mentions β†’ Exploded brand mentions for analysis operations β†’ Post-intent operation results (optional) ``` ## Schema Details ### Table: runs One row per `llm-answer-watcher run` execution. **Columns:** ```sql CREATE TABLE runs ( run_id TEXT PRIMARY KEY, -- YYYY-MM-DDTHH-MM-SSZ timestamp_utc TEXT NOT NULL, -- ISO 8601 with Z suffix config_file TEXT, -- Path to config file used total_cost_usd REAL NOT NULL, -- Sum of all query costs queries_completed INTEGER NOT NULL, -- Successful queries queries_failed INTEGER NOT NULL, -- Failed queries status TEXT NOT NULL, -- "success", "partial", "failed" output_dir TEXT NOT NULL -- Directory with run artifacts ); ``` **Example Query:** ```sql -- View recent runs SELECT run_id, timestamp_utc, status, total_cost_usd, queries_completed FROM runs ORDER BY timestamp_utc DESC LIMIT 10; ``` ### Table: answers_raw One row per intent Γ— model combination. **Columns:** ```sql CREATE TABLE answers_raw ( run_id TEXT NOT NULL, intent_id TEXT NOT NULL, model_provider TEXT NOT NULL, -- "openai", "anthropic", etc. model_name TEXT NOT NULL, -- "gpt-4o-mini", etc. timestamp_utc TEXT NOT NULL, answer_text TEXT NOT NULL, -- Full LLM response tokens_used INTEGER, -- Total tokens (input + output) estimated_cost_usd REAL, -- Query cost extraction_method TEXT, -- "regex" or "function_calling" web_search_count INTEGER DEFAULT 0, -- Number of web searches error_message TEXT, -- NULL if successful PRIMARY KEY (run_id, intent_id, model_provider, model_name), FOREIGN KEY (run_id) REFERENCES runs(run_id) ); ``` **Example Query:** ```sql -- Cost by provider SELECT model_provider, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost_per_query FROM answers_raw WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider ORDER BY total_cost DESC; ``` ### Table: mentions One row per brand mention. Denormalized for fast queries. **Columns:** ```sql CREATE TABLE mentions ( run_id TEXT NOT NULL, intent_id TEXT NOT NULL, model_provider TEXT NOT NULL, model_name TEXT NOT NULL, timestamp_utc TEXT NOT NULL, brand TEXT NOT NULL, -- Original brand name normalized_name TEXT NOT NULL, -- Lowercase, hyphenated is_mine INTEGER NOT NULL, -- 1 = your brand, 0 = competitor rank_position INTEGER, -- 1, 2, 3... or NULL detection_method TEXT NOT NULL, -- "regex" or "function_calling" confidence REAL DEFAULT 1.0, -- 0.0-1.0 confidence score PRIMARY KEY (run_id, intent_id, model_provider, model_name, normalized_name), FOREIGN KEY (run_id) REFERENCES runs(run_id) ); CREATE INDEX idx_mentions_timestamp ON mentions(timestamp_utc); CREATE INDEX idx_mentions_brand ON mentions(brand); CREATE INDEX idx_mentions_normalized ON mentions(normalized_name); CREATE INDEX idx_mentions_rank ON mentions(rank_position); CREATE INDEX idx_mentions_is_mine ON mentions(is_mine); ``` **Example Query:** ```sql -- Brand mentions over time SELECT DATE(timestamp_utc) as date, brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE normalized_name = 'warmly' AND timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc), brand ORDER BY date DESC; ``` ### Table: schema_version Tracks database migrations. **Columns:** ```sql CREATE TABLE schema_version ( version INTEGER PRIMARY KEY, applied_at TEXT NOT NULL ); ``` **Current version:** 3 ## Common Queries ### Basic Analytics **Your brand visibility:** ```sql -- How often do we appear? SELECT COUNT(DISTINCT run_id) as runs_appeared, COUNT(*) as total_mentions, AVG(rank_position) as average_rank FROM mentions WHERE is_mine = 1 AND timestamp_utc >= datetime('now', '-30 days'); ``` **Competitor comparison:** ```sql SELECT brand, COUNT(*) as mentions, COUNT(DISTINCT intent_id) as intents_appeared, AVG(rank_position) as avg_rank, MIN(rank_position) as best_rank, COUNT(CASE WHEN rank_position = 1 THEN 1 END) as first_place_count FROM mentions WHERE rank_position IS NOT NULL AND timestamp_utc >= datetime('now', '-30 days') GROUP BY brand ORDER BY mentions DESC; ``` ### Trend Analysis **Daily brand mentions:** ```sql SELECT DATE(timestamp_utc) as date, COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_mentions, COUNT(CASE WHEN is_mine = 0 THEN 1 END) as competitor_mentions, COUNT(*) as total_mentions FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` **Ranking trends:** ```sql SELECT DATE(timestamp_utc) as date, AVG(CASE WHEN is_mine = 1 THEN rank_position END) as my_avg_rank, AVG(CASE WHEN is_mine = 0 THEN rank_position END) as competitor_avg_rank FROM mentions WHERE rank_position IS NOT NULL AND timestamp_utc >= datetime('now', '-30 days') GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` ### Intent Analysis **Which intents work best for your brand?** ```sql SELECT intent_id, COUNT(*) as total_mentions, COUNT(DISTINCT model_provider) as providers, AVG(rank_position) as avg_rank FROM mentions WHERE is_mine = 1 AND timestamp_utc >= datetime('now', '-30 days') GROUP BY intent_id ORDER BY total_mentions DESC; ``` **Intents where you're NOT mentioned:** ```sql -- Get all intent IDs from recent runs WITH recent_intents AS ( SELECT DISTINCT intent_id FROM answers_raw WHERE timestamp_utc >= datetime('now', '-7 days') ), -- Get intents where you appeared appeared_intents AS ( SELECT DISTINCT intent_id FROM mentions WHERE is_mine = 1 AND timestamp_utc >= datetime('now', '-7 days') ) -- Find the difference SELECT ri.intent_id FROM recent_intents ri LEFT JOIN appeared_intents ai ON ri.intent_id = ai.intent_id WHERE ai.intent_id IS NULL; ``` ### Cost Analysis **Total spending:** ```sql SELECT SUM(total_cost_usd) as total_spent, COUNT(*) as total_runs, AVG(total_cost_usd) as avg_cost_per_run FROM runs WHERE timestamp_utc >= datetime('now', '-30 days'); ``` **Cost by provider:** ```sql SELECT model_provider, model_name, COUNT(*) as queries, SUM(estimated_cost_usd) as total_cost, AVG(estimated_cost_usd) as avg_cost FROM answers_raw WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider, model_name ORDER BY total_cost DESC; ``` **Cost per brand mention:** ```sql SELECT r.run_id, r.total_cost_usd, COUNT(m.brand) as mentions, r.total_cost_usd / COUNT(m.brand) as cost_per_mention FROM runs r JOIN mentions m ON r.run_id = m.run_id WHERE r.timestamp_utc >= datetime('now', '-30 days') AND m.is_mine = 1 GROUP BY r.run_id, r.total_cost_usd ORDER BY cost_per_mention ASC; ``` ### Provider Comparison **Which provider mentions you more?** ```sql SELECT model_provider, COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_mentions, COUNT(*) as total_mentions, CAST(COUNT(CASE WHEN is_mine = 1 THEN 1 END) AS REAL) / COUNT(*) * 100 as my_mention_rate FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider ORDER BY my_mention_rate DESC; ``` **Average ranking by provider:** ```sql SELECT model_provider, model_name, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE is_mine = 1 AND rank_position IS NOT NULL AND timestamp_utc >= datetime('now', '-30 days') GROUP BY model_provider, model_name ORDER BY avg_rank ASC; ``` ## Exporting Data ### CSV Export ```bash # Export mentions to CSV sqlite3 -header -csv output/watcher.db \ "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')" \ > mentions_30days.csv # Export runs summary sqlite3 -header -csv output/watcher.db \ "SELECT * FROM runs ORDER BY timestamp_utc DESC" \ > runs_summary.csv ``` ### JSON Export ```bash # Export as JSON Lines sqlite3 output/watcher.db <= datetime('now', '-7 days'); SQL ``` ### Excel/Google Sheets 1. Export to CSV: ```bash sqlite3 -header -csv output/watcher.db \ "SELECT * FROM mentions" > mentions.csv ``` 1. Import CSV into Excel or Google Sheets ## Database Maintenance ### Vacuum Database Reclaim space after deletions: ```bash sqlite3 output/watcher.db "VACUUM;" ``` ### Delete Old Data ```sql -- Delete runs older than 90 days DELETE FROM runs WHERE timestamp_utc < datetime('now', '-90 days'); -- Vacuum to reclaim space VACUUM; ``` ### Check Database Size ```bash ls -lh output/watcher.db # Example: -rw-r--r-- 1 user user 2.5M Nov 5 14:30 watcher.db ``` ### Backup Database ```bash # Simple copy cp output/watcher.db output/watcher.backup.db # Or use SQLite backup command sqlite3 output/watcher.db ".backup output/watcher.backup.db" # Compress backup gzip output/watcher.backup.db ``` ## Schema Migrations ### Check Schema Version ```sql SELECT * FROM schema_version ORDER BY version DESC; ``` **Output:** ```text version | applied_at --------|--------------------- 3 | 2025-11-05T14:30:00Z 2 | 2025-10-25T10:15:00Z 1 | 2025-10-20T09:00:00Z ``` ### Migration Process Migrations run automatically on startup. No manual intervention needed. **What happens:** 1. Check current schema version 1. Compare to required version 1. Apply migrations sequentially 1. Update schema_version table ### Manual Migration (Advanced) If needed, manually upgrade: ```python from llm_answer_watcher.storage.db import init_db_if_needed init_db_if_needed("./output/watcher.db") ``` ## Connecting with BI Tools ### Metabase 1. Add SQLite database 1. Point to `./output/watcher.db` 1. Create dashboards ### Tableau 1. Use SQLite connector 1. Connect to `watcher.db` 1. Create visualizations ### Python/Pandas ```python import sqlite3 import pandas as pd # Connect to database conn = sqlite3.connect("output/watcher.db") # Load mentions into DataFrame df = pd.read_sql_query( "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')", conn ) # Analyze print(df.groupby('brand')['rank_position'].mean()) # Close connection conn.close() ``` ### R ```r library(DBI) library(RSQLite) # Connect conn <- dbConnect(RSQLite::SQLite(), "output/watcher.db") # Query mentions <- dbGetQuery(conn, "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')" ) # Analyze aggregate(rank_position ~ brand, data=mentions, FUN=mean) # Disconnect dbDisconnect(conn) ``` ## Performance Tips ### Indexes Indexes already exist on: - `timestamp_utc` - `brand` - `normalized_name` - `rank_position` - `is_mine` ### Query Optimization **Use indexed columns in WHERE:** ```sql -- βœ… Fast - uses index WHERE timestamp_utc >= datetime('now', '-30 days') -- ❌ Slow - no index WHERE DATE(timestamp_utc) = '2025-11-05' ``` **Limit result sets:** ```sql -- βœ… Good - only get what you need SELECT brand, rank_position FROM mentions WHERE is_mine = 1 LIMIT 100; -- ❌ Bad - retrieves all columns SELECT * FROM mentions; ``` ### Analyze Query Plans ```sql EXPLAIN QUERY PLAN SELECT brand, COUNT(*) FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days') GROUP BY brand; ``` ## Troubleshooting ### Database Locked **Problem:** `database is locked` **Solution:** ```bash # Check for locks lsof output/watcher.db # Kill process if safe kill -9 # Or wait and retry ``` ### Corrupted Database **Problem:** Database errors on queries **Solution:** ```bash # Check integrity sqlite3 output/watcher.db "PRAGMA integrity_check;" # If corrupted, restore from backup cp output/watcher.backup.db output/watcher.db ``` ### Schema Version Mismatch **Problem:** "Schema version X is newer than expected Y" **Solution:** Update LLM Answer Watcher to latest version: ```bash pip install --upgrade llm-answer-watcher ``` ## Next Steps - **Query Examples** ______________________________________________________________________ More SQL query examples [Query Examples β†’](../query-examples/) - **Trends Analysis** ______________________________________________________________________ Track changes over time [Trends Analysis β†’](../trends-analysis/) - **Output Structure** ______________________________________________________________________ Understand JSON output files [Output Structure β†’](../output-structure/) - **Database Schema** ______________________________________________________________________ Complete schema reference [Schema Reference β†’](../../reference/database-schema/) # SQL Query Examples Useful SQL queries for analyzing monitoring data. ## Brand Performance ```sql -- Your brand's mention rate SELECT COUNT(DISTINCT run_id) as total_runs, COUNT(*) as total_mentions, CAST(COUNT(*) AS FLOAT) / COUNT(DISTINCT run_id) as mentions_per_run FROM mentions WHERE normalized_name = 'yourbrand'; ``` ## Competitor Analysis ```sql -- Top mentioned competitors SELECT brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE normalized_name != 'yourbrand' GROUP BY brand ORDER BY mentions DESC LIMIT 10; ``` ## Trends Over Time ```sql -- Weekly mention trends SELECT strftime('%Y-W%W', timestamp_utc) as week, COUNT(*) as mentions FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY week ORDER BY week DESC; ``` See [SQLite Database](../sqlite-database/) for schema details. # Trends Analysis Analyze brand visibility trends over time. ## Time-Series Analysis ```sql -- Daily mention count SELECT DATE(timestamp_utc) as date, COUNT(*) as mentions, AVG(rank_position) as avg_rank FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY DATE(timestamp_utc) ORDER BY date DESC; ``` ## Comparative Trends ```sql -- Your brand vs top competitor SELECT DATE(m.timestamp_utc) as date, m.brand, COUNT(*) as mentions FROM mentions m WHERE m.normalized_name IN ('yourbrand', 'competitor') GROUP BY DATE(m.timestamp_utc), m.brand ORDER BY date DESC, mentions DESC; ``` ## Export to CSV ```bash sqlite3 -header -csv output/watcher.db "SELECT * FROM mentions WHERE normalized_name = 'yourbrand'" > brand_data.csv ``` See [Query Examples](../query-examples/) for more queries. # Evaluation Framework # Evaluation Framework Quality control and accuracy testing for brand extraction. ## Purpose The evaluation framework validates: - Mention detection accuracy - Rank extraction correctness - False positive/negative rates ## Running Evaluations ```bash llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml ``` ## Metrics Tracked - **Mention Precision**: Correct mentions / total found - **Mention Recall**: Correct mentions / expected mentions - **Rank Accuracy**: Correctly ranked brands - **F1 Score**: Harmonic mean of precision/recall See [Running Evals](../running-evals/) for detailed usage. # Running Evaluations How to run the evaluation suite and interpret results. ## Basic Usage ```bash llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml ``` ## Command Options ```bash llm-answer-watcher eval --fixtures fixtures.yaml --db eval_results.db --format json ``` ## Example Output ```text βœ… Evaluation completed β”œβ”€β”€ Test cases: 15 β”œβ”€β”€ Passed: 14 β”œβ”€β”€ Failed: 1 └── Pass rate: 93.3% Metrics: β”œβ”€β”€ Mention Precision: 95.2% β”œβ”€β”€ Mention Recall: 91.8% β”œβ”€β”€ Rank Accuracy: 88.5% └── F1 Score: 93.5% ``` See [Metrics](../metrics/) for metric definitions. # Evaluation Metrics Understanding evaluation metrics and thresholds. ## Core Metrics ### Mention Precision Ratio of correct mentions to total mentions found. **Threshold**: β‰₯ 90% ### Mention Recall Ratio of correct mentions to expected mentions. **Threshold**: β‰₯ 80% ### Mention F1 Harmonic mean of precision and recall. **Threshold**: β‰₯ 85% ### Rank Accuracy Percentage of correctly ranked brands. **Threshold**: β‰₯ 85% ## Interpreting Results - **High precision, low recall**: Missing mentions - **Low precision, high recall**: False positives - **Low both**: Extraction needs improvement See [Test Cases](../test-cases/) for creating fixtures. # Test Cases Creating evaluation test fixtures. ## Fixture Format ```yaml test_cases: - description: "Brand detection test" intent_id: "test-intent" llm_answer_text: | The best tools are: 1. YourBrand 2. CompetitorA brands_mine: ["YourBrand"] brands_competitors: ["CompetitorA"] expected_my_mentions: ["YourBrand"] expected_competitor_mentions: ["CompetitorA"] expected_ranked_list: - "YourBrand" - "CompetitorA" ``` ## Running Custom Fixtures ```bash llm-answer-watcher eval --fixtures my_tests.yaml ``` See [CI Integration](../ci-integration/) for automated testing. # CI Integration Run evaluations in continuous integration. ## GitHub Actions ```yaml - name: Run Evaluation Suite run: | uv run llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml --format json - name: Check Results run: | if [ $? -ne 0 ]; then echo "Evaluations failed" exit 1 fi ``` ## Exit Codes - `0`: All tests passed - `1`: Tests failed (below thresholds) - `2`: Configuration error See [Running Evals](../running-evals/) for usage details. # Advanced Topics # Architecture LLM Answer Watcher follows Domain-Driven Design principles with strict separation of concerns. ## Core Domains ```text llm_answer_watcher/ β”œβ”€β”€ config/ # Configuration loading and validation β”œβ”€β”€ llm_runner/ # LLM client abstraction β”œβ”€β”€ extractor/ # Brand mention detection β”œβ”€β”€ storage/ # SQLite and JSON persistence β”œβ”€β”€ report/ # HTML report generation β”œβ”€β”€ utils/ # Shared utilities └── cli.py # CLI interface ``` ## Design Patterns ### 1. Provider Abstraction ```python class LLMClient(Protocol): def generate_answer(self, prompt: str) -> LLMResponse: ... def build_client(provider: str, model: str) -> LLMClient: ... ``` ### 2. API-First Contract ```python def run_all(config: RuntimeConfig) -> dict: # Internal "POST /run" contract # OSS CLI calls in-process # Cloud will expose over HTTP return {"run_id": "...", "cost_usd": 0.01} ``` ### 3. Dual-Mode CLI ```python class OutputMode(Enum): HUMAN = "human" # Rich formatting JSON = "json" # Structured output QUIET = "quiet" # Minimal output ``` See [API Contract](../api-contract/) for internal API details. # API Contract Internal API designed for future HTTP exposure. ## Core Contract ```python def run_all(config: RuntimeConfig) -> dict: """ Execute monitoring run. Args: config: Validated runtime configuration Returns: { "run_id": "YYYY-MM-DDTHH-MM-SSZ", "status": "success" | "partial" | "failed", "queries_completed": int, "queries_failed": int, "total_cost_usd": float, "output_dir": str, "brands_detected": {...} } """ ``` ## Future HTTP API The internal contract is designed to become an HTTP API: ```http POST /api/v1/run Content-Type: application/json { "config": {...}, "return_format": "json" } ``` ## Provider Interface ```python @dataclass class LLMResponse: answer_text: str tokens_used: int cost_usd: float provider: str model_name: str timestamp_utc: str ``` See [Architecture](../architecture/) for overall design. # Extending Providers Add support for new LLM providers. ## Provider Interface ```python from llm_answer_watcher.llm_runner.models import LLMClient, LLMResponse class MyCustomClient: def __init__(self, model_name: str, api_key: str, system_prompt: str): self.model = model_name self.api_key = api_key self.system_prompt = system_prompt def generate_answer(self, prompt: str) -> LLMResponse: # Call your LLM API response = call_my_llm_api(prompt) return LLMResponse( answer_text=response.text, tokens_used=response.tokens, cost_usd=calculate_cost(response), provider="my_provider", model_name=self.model, timestamp_utc=utc_timestamp() ) ``` ## Registering Provider ```python # llm_runner/models.py def build_client(provider: str, model_name: str, ...) -> LLMClient: if provider == "my_provider": return MyCustomClient(...) # ... ``` ## Testing Your Provider ```python def test_my_provider(httpx_mock): httpx_mock.add_response(...) client = MyCustomClient(...) response = client.generate_answer("test") assert response.provider == "my_provider" ``` See [Architecture](../architecture/) for design patterns. # Custom System Prompts Customize system prompts for LLM queries. ## Built-in Prompts Located in `llm_answer_watcher/system_prompts/`: ```text system_prompts/ β”œβ”€β”€ openai/ β”‚ β”œβ”€β”€ gpt-4-default.json β”‚ └── extraction-default.json β”œβ”€β”€ anthropic/ β”‚ └── default.json └── mistral/ └── default.json ``` ## Using Custom Prompts ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" system_prompt: "openai/custom-prompt" ``` ## Creating Custom Prompts 1. Create JSON file in `system_prompts/provider/`: ```json { "role": "system", "content": "You are a helpful assistant focused on..." } ``` 1. Reference in config: ```yaml system_prompt: "openai/custom-prompt" ``` ## Prompt Guidelines - Keep prompts neutral (avoid biasing toward your brand) - Be concise yet comprehensive - Test with evaluation framework See [API Contract](../api-contract/) for technical details. # Security Security best practices for LLM Answer Watcher. ## API Key Management ### βœ… Do This ```bash # Use environment variables export OPENAI_API_KEY=sk-your-key # Use secrets management OPENAI_API_KEY=$(aws secretsmanager get-secret-value ...) # Use .env files (add to .gitignore) echo "OPENAI_API_KEY=sk-..." > .env echo ".env" >> .gitignore ``` ### ❌ Don't Do This ```yaml # NEVER hardcode API keys in config files models: - provider: "openai" api_key: "sk-hardcoded-key" # DON'T DO THIS! ``` ## SQL Injection Prevention The tool uses parameterized queries: ```python # βœ… Safe - parameterized cursor.execute("SELECT * FROM runs WHERE id=?", (run_id,)) # ❌ Never done - string concatenation cursor.execute(f"SELECT * FROM runs WHERE id='{run_id}'") ``` ## XSS Prevention Jinja2 autoescaping enabled: ```python # βœ… Safe - autoescaping on env = Environment(loader=..., autoescape=True) ``` ## Best Practices 1. **Never commit secrets** 1. **Rotate API keys** regularly 1. **Use read-only file permissions** for configs 1. **Review logs** before sharing 1. **Keep dependencies updated** ## Reporting Security Issues Email: [security contact] (replace with actual contact) See [Contributing](../../contributing/development-setup/). # Performance Optimizing LLM Answer Watcher for speed and efficiency. ## Query Performance ### Parallel Queries (Future) Currently synchronous. Async support planned: ```python # Future: parallel execution await asyncio.gather(*[ query_model(intent, model) for intent in intents for model in models ]) ``` ### Current: Sequential ```python # Current: one at a time for intent in intents: for model in models: query_model(intent, model) ``` ## Cost Optimization ### Use Cheaper Models ```yaml models: - provider: "openai" model_name: "gpt-4o-mini" # $0.15/1M vs $2.50/1M ``` ### Regex vs LLM Extraction ```yaml # Fast and cheap (recommended) use_llm_rank_extraction: false # Accurate but costly use_llm_rank_extraction: true ``` ## Database Performance ### Indexes SQLite indexes on: - `timestamp_utc` - `intent_id` - `brand` - `rank_position` ### Vacuum Periodically compact database: ```bash sqlite3 output/watcher.db "VACUUM;" ``` ## Caching ### Pricing Cache LLM prices cached for 24 hours to reduce API calls. ### Future Caching Planned: - Response caching (identical queries) - Extracted data caching See [Architecture](../architecture/) for design details. # Reference # CLI Reference Complete command-line interface reference. ## Commands ### `run` Execute monitoring run. ```bash llm-answer-watcher run --config PATH [OPTIONS] ``` **Options**: - `--config PATH` (required): Configuration file - `--format [human|json|quiet]`: Output format - `--yes, -y`: Skip prompts - `--force`: Override budget limits - `--verbose, -v`: Verbose logging ### `validate` Validate configuration. ```bash llm-answer-watcher validate --config PATH [OPTIONS] ``` ### `eval` Run evaluation suite. ```bash llm-answer-watcher eval --fixtures PATH [OPTIONS] ``` ### `prices show` Display LLM pricing. ```bash llm-answer-watcher prices show [OPTIONS] ``` See [CLI Commands](../../user-guide/usage/cli-commands/) for detailed usage. # Configuration Schema Complete YAML configuration schema reference. ## Root Structure ```yaml run_settings: # Required brands: # Required intents: # Required ``` ## `run_settings` ```yaml run_settings: output_dir: string # Required sqlite_db_path: string # Required models: [ModelConfig] # Required use_llm_rank_extraction: bool # Optional, default: false extraction_settings: ExtractionSettings # Optional budget: BudgetConfig # Optional web_search: WebSearchConfig # Optional ``` ## `ModelConfig` ```yaml provider: string # Required: openai, anthropic, etc. model_name: string # Required env_api_key: string # Required system_prompt: string # Optional ``` ## `brands` ```yaml brands: mine: [string] # Required competitors: [string] # Required ``` ## `intents` ```yaml intents: - id: string # Required prompt: string # Required operations: [Operation] # Optional ``` See [Configuration Overview](../../user-guide/configuration/overview/). # Database Schema SQLite database schema reference. ## Tables ### `schema_version` ```sql CREATE TABLE schema_version ( version INTEGER PRIMARY KEY ); ``` ### `runs` ```sql CREATE TABLE runs ( run_id TEXT PRIMARY KEY, timestamp_utc TEXT NOT NULL, config_path TEXT, total_cost_usd REAL, queries_completed INTEGER, queries_failed INTEGER ); ``` ### `answers_raw` ```sql CREATE TABLE answers_raw ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, intent_id TEXT NOT NULL, model_provider TEXT NOT NULL, model_name TEXT NOT NULL, answer_text TEXT NOT NULL, tokens_used INTEGER, estimated_cost_usd REAL, timestamp_utc TEXT NOT NULL, UNIQUE(run_id, intent_id, model_provider, model_name) ); ``` ### `mentions` ```sql CREATE TABLE mentions ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, intent_id TEXT NOT NULL, model_provider TEXT NOT NULL, model_name TEXT NOT NULL, brand TEXT NOT NULL, normalized_name TEXT NOT NULL, is_mine BOOLEAN NOT NULL, rank_position INTEGER, context_snippet TEXT, sentiment TEXT, -- NEW: positive/neutral/negative mention_context TEXT, -- NEW: primary_recommendation, alternative_listing, etc. timestamp_utc TEXT NOT NULL, UNIQUE(run_id, intent_id, model_provider, model_name, normalized_name) ); ``` **New Columns (v0.1.0+)**: - `sentiment`: Emotional tone - `positive`, `neutral`, `negative`, or `NULL` - `mention_context`: How brand was mentioned - `primary_recommendation`, `alternative_listing`, `competitor_negative`, `competitor_neutral`, `passing_reference`, or `NULL` ### `intent_classifications` ```sql CREATE TABLE intent_classifications ( id INTEGER PRIMARY KEY AUTOINCREMENT, run_id TEXT NOT NULL, intent_id TEXT NOT NULL, query_text TEXT NOT NULL, query_hash TEXT NOT NULL, -- SHA256 hash for caching intent_type TEXT NOT NULL, -- transactional, informational, navigational, commercial_investigation buyer_stage TEXT NOT NULL, -- awareness, consideration, decision urgency_signal TEXT NOT NULL, -- high, medium, low classification_confidence REAL NOT NULL, -- 0.0-1.0 reasoning TEXT, -- Explanation of classification timestamp_utc TEXT NOT NULL, UNIQUE(run_id, intent_id) ); ``` **Purpose**: Stores query intent classifications for prioritizing high-value mentions. **Query Hash**: Normalized SHA256 hash enables caching - same query text always produces same hash, avoiding redundant LLM calls. ## Indexes ```sql -- Original indexes CREATE INDEX idx_mentions_timestamp ON mentions(timestamp_utc); CREATE INDEX idx_mentions_brand ON mentions(normalized_name); CREATE INDEX idx_mentions_intent ON mentions(intent_id); -- Sentiment/Intent indexes (NEW in v0.1.0+) CREATE INDEX idx_mentions_sentiment ON mentions(sentiment); CREATE INDEX idx_mentions_context ON mentions(mention_context); CREATE INDEX idx_intent_type ON intent_classifications(intent_type); CREATE INDEX idx_buyer_stage ON intent_classifications(buyer_stage); CREATE INDEX idx_urgency_signal ON intent_classifications(urgency_signal); ``` ## Schema Versioning The database schema uses versioning for migrations: ```sql SELECT version FROM schema_version; -- Returns: 1 (current version) ``` Future schema changes will increment this version and provide migration scripts. ## Example Queries ### Sentiment Analysis ```sql -- Brand mentions by sentiment SELECT sentiment, COUNT(*) as count FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY sentiment; ``` ### High-Value Intent Filtering ```sql -- High-intent brand mentions SELECT m.brand, ic.intent_type, ic.buyer_stage, ic.urgency_signal FROM mentions m JOIN intent_classifications ic ON m.intent_id = ic.intent_id AND m.run_id = ic.run_id WHERE ic.intent_type = 'transactional' AND ic.buyer_stage = 'decision' AND ic.urgency_signal = 'high' AND m.sentiment = 'positive'; ``` See [SQLite Database](../../data-analytics/sqlite-database/) for more queries. # Python API Using LLM Answer Watcher as a Python library. ## Programmatic Usage ```python from llm_answer_watcher.config.loader import load_config_from_file from llm_answer_watcher.llm_runner.runner import run_all # Load configuration config = load_config_from_file("config.yaml") # Run monitoring result = run_all(config) print(f"Run ID: {result['run_id']}") print(f"Cost: ${result['total_cost_usd']:.4f}") print(f"Brands: {result['brands_detected']}") ``` ## Core Modules ### Config Loading ```python from llm_answer_watcher.config.loader import load_config_from_file from llm_answer_watcher.config.schema import RuntimeConfig config: RuntimeConfig = load_config_from_file("config.yaml") ``` ### LLM Clients ```python from llm_answer_watcher.llm_runner.models import build_client client = build_client( provider="openai", model_name="gpt-4o-mini", api_key=api_key, system_prompt=prompt ) response = client.generate_answer("What are the best tools?") ``` ### Extraction ```python from llm_answer_watcher.extractor.mention_detector import detect_mentions mentions = detect_mentions( text=llm_response, brands_mine=["YourBrand"], brands_competitors=["CompetitorA"] ) ``` See [Architecture](../../advanced/architecture/) for design details. # Contributing # Development Setup Set up your development environment for contributing. ## Prerequisites - Python 3.12 or 3.13 - Git - uv or pip ## Clone and Install ```bash # Clone repository git clone https://github.com/nibzard/llm-answer-watcher.git cd llm-answer-watcher # Install with uv (recommended) uv sync --dev # Or with pip pip install -e ".[dev]" ``` ## Development Tools ### Testing ```bash # Run all tests pytest # Run with coverage pytest --cov=llm_answer_watcher --cov-report=html # Run specific test pytest tests/test_config_loader.py::test_load_valid_config ``` ### Linting ```bash # Check code quality ruff check . # Auto-fix issues ruff check . --fix # Format code ruff format . ``` ### Documentation ```bash # Build docs mkdocs build # Serve docs locally mkdocs serve ``` ## Making Changes 1. Create a branch: `git checkout -b feature/my-feature` 1. Make changes 1. Run tests: `pytest` 1. Run linting: `ruff check .` 1. Commit: `git commit -m "feat: add feature"` 1. Push: `git push origin feature/my-feature` 1. Create Pull Request See [Code Standards](../code-standards/) for coding guidelines. # Code Standards Coding standards and best practices. ## Python Style ### Modern Type Hints (Python 3.12+) ```python # βœ… Good - use | for unions def process(data: dict | None = None) -> str | None: pass # ❌ Bad - old style from typing import Union, Optional def process(data: Optional[dict] = None) -> Union[str, None]: pass ``` ### Docstrings ```python def detect_mentions(text: str, brands: list[str]) -> list[Mention]: """ Detect brand mentions in text. Args: text: Text to search brands: List of brand names Returns: List of detected mentions """ ``` ### Word Boundaries ```python # βœ… Good - word boundary matching pattern = r'\b' + re.escape(brand) + r'\b' # ❌ Bad - substring matching if brand.lower() in text.lower(): ... ``` ## Testing ### Coverage Requirements - Core modules: 80%+ coverage - Critical paths: 100% coverage ### Test Structure ```python def test_feature(): # Arrange config = create_test_config() # Act result = run_feature(config) # Assert assert result.status == "success" ``` ## Commits Use Conventional Commits: ```text feat: add new provider fix: correct rank extraction docs: update README test: add coverage for extractor chore: update dependencies ``` See [Testing](../testing/) for testing guidelines. # Testing Guidelines Writing and running tests. ## Test Structure ```text tests/ β”œβ”€β”€ test_config_loader.py β”œβ”€β”€ test_openai_client.py β”œβ”€β”€ test_mention_detector.py β”œβ”€β”€ test_rank_extractor.py └── ... ``` ## Writing Tests ### Unit Tests ```python def test_brand_detection(): text = "Use HubSpot for CRM" brands = ["HubSpot", "Salesforce"] mentions = detect_mentions(text, brands) assert len(mentions) == 1 assert mentions[0].brand == "HubSpot" ``` ### Mocking LLM APIs ```python def test_openai_client(httpx_mock): httpx_mock.add_response( method="POST", url="https://api.openai.com/v1/chat/completions", json={"choices": [{"message": {"content": "..."}}]} ) client = OpenAIClient(...) response = client.generate_answer("test") assert response.provider == "openai" ``` ### Time Mocking ```python from freezegun import freeze_time @freeze_time("2025-11-01 08:00:00") def test_timestamp(): run_id = run_id_from_timestamp() assert run_id == "2025-11-01T08-00-00Z" ``` ## Running Tests ```bash # All tests pytest # With coverage pytest --cov=llm_answer_watcher # Specific test pytest tests/test_config_loader.py # Verbose pytest -v # Skip slow tests pytest -m "not slow" ``` ## Coverage Requirements - **Core modules**: 80%+ - **Critical paths**: 100% ```bash pytest --cov=llm_answer_watcher --cov-report=html open htmlcov/index.html ``` See [Code Standards](../code-standards/) for style guidelines. # Testing Utilities LLM Answer Watcher provides specialized testing utilities to help you write reliable tests without making real API calls or dealing with brittle HTTP mocking. ## Overview The testing utilities follow patterns inspired by modern LLM abstraction layers: - **MockLLMClient**: Deterministic responses for testing extraction logic - **ChaosLLMClient**: Resilience testing with controlled failure injection - **Protocol-based**: Both implement the `LLMClient` protocol ## MockLLMClient ### Basic Usage The `MockLLMClient` provides deterministic responses without making real API calls: ```python from llm_answer_watcher.llm_runner.mock_client import MockLLMClient # Create client with configured responses client = MockLLMClient( responses={ "What are the best CRM tools?": "HubSpot and Salesforce are leading CRM platforms.", "best email warmup": "Warmly, HubSpot, and Instantly are top choices." }, default_response="No specific answer available.", tokens_per_response=300, cost_per_response=0.001 ) # Use in tests response = await client.generate_answer("What are the best CRM tools?") assert response.answer_text == "HubSpot and Salesforce are leading CRM platforms." assert response.tokens_used == 300 assert response.cost_usd == 0.001 ``` ### Configuration Options ```python MockLLMClient( responses={"prompt": "answer"}, # Dict mapping prompts to answers default_response="Default answer", # Fallback when prompt not found model_name="mock-gpt-4", # Model name in responses provider="mock-openai", # Provider name in responses tokens_per_response=100, # Token count to report cost_per_response=0.0, # Cost to report streaming_chunk_size=None, # Enable streaming (see below) streaming_delay_ms=50 # Delay between chunks ) ``` ### Integration Testing MockLLMClient works seamlessly with the extraction pipeline: ```python from llm_answer_watcher.config.schema import Brands from llm_answer_watcher.extractor.parser import parse_answer # Create mock client client = MockLLMClient( responses={"best CRM": "1. HubSpot\n2. Salesforce\n3. Warmly"} ) # Generate answer response = await client.generate_answer("best CRM") # Test extraction brands = Brands(mine=["Warmly"], competitors=["HubSpot", "Salesforce"]) extraction = parse_answer(response.answer_text, brands) assert extraction.appeared_mine is True assert len(extraction.my_mentions) == 1 assert len(extraction.competitor_mentions) == 2 ``` ### Streaming Support MockLLMClient supports optional streaming for testing streaming workflows: ```python chunks = [] client = MockLLMClient( responses={"test": "Hello world from LLM"}, streaming_chunk_size=5, # Stream in 5-char chunks streaming_delay_ms=10 # 10ms delay between chunks ) response = await client.generate_answer( "test", on_chunk=lambda chunk: chunks.append(chunk) ) # Chunks received during streaming assert chunks == ['Hello', ' worl', 'd fro', 'm LLM'] # Full response still returned assert response.answer_text == "Hello world from LLM" ``` ## ChaosLLMClient ### Basic Usage The `ChaosLLMClient` wraps any `LLMClient` and probabilistically injects failures: ```python from llm_answer_watcher.llm_runner.chaos_client import ChaosLLMClient # Wrap a base client (e.g., MockLLMClient) base = MockLLMClient(responses={"test": "answer"}) chaos = ChaosLLMClient( base_client=base, success_rate=0.7, # 70% success, 30% failure rate_limit_prob=0.1, # 10% chance of 429 error server_error_prob=0.1, # 10% chance of 5xx error timeout_prob=0.05, # 5% chance of timeout auth_error_prob=0.05, # 5% chance of 401 error seed=42 # Optional: reproducible chaos ) # May succeed or fail try: response = await chaos.generate_answer("test") print("Success!") except RuntimeError as e: print(f"Chaos injected: {e}") ``` ### Factory Function Use `create_chaos_client()` for balanced error distribution: ```python from llm_answer_watcher.llm_runner.chaos_client import create_chaos_client chaos = create_chaos_client( base_client=base, failure_rate=0.3, # 30% overall failures seed=42 ) # Failures distributed evenly: # - 7.5% rate limit (429) # - 7.5% server errors (500/502/503) # - 7.5% timeout # - 7.5% auth error (401) ``` ### Testing Retry Logic Validate your retry logic handles transient failures: ```python # High failure rate to force retries chaos = ChaosLLMClient( base_client=base, success_rate=0.3, # 70% failure rate seed=42 ) # Retry loop max_attempts = 3 for attempt in range(max_attempts): try: response = await chaos.generate_answer("test") break # Success! except RuntimeError as e: if attempt == max_attempts - 1: raise # Give up after max attempts # Otherwise retry ``` ### Reproducible Chaos Use `seed` for deterministic test runs: ```python # Two clients with same seed produce identical behavior chaos1 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123) chaos2 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123) # Same sequence of successes/failures for i in range(10): result1 = await chaos1.generate_answer("test") result2 = await chaos2.generate_answer("test") # Both succeed or both fail identically ``` ## Error Types Injected ChaosLLMClient injects realistic errors: | Error Type | Status Code | Description | Retryable? | | ------------ | ----------- | ------------------ | ---------- | | Rate Limit | 429 | Too many requests | Yes | | Server Error | 500/502/503 | Server-side issues | Yes | | Timeout | - | Network timeout | Yes | | Auth Error | 401 | Invalid API key | No | ## Best Practices ### 1. Use MockLLMClient for Logic Tests Test extraction, parsing, and business logic: ```python def test_brand_detection(): client = MockLLMClient( responses={"test": "Warmly and HubSpot are great tools."} ) # Test extraction logic ``` ### 2. Use ChaosLLMClient for Resilience Tests Test error handling and retry logic: ```python def test_retry_on_rate_limit(): chaos = ChaosLLMClient( base_client=base, rate_limit_prob=1.0 # Always 429 ) # Test retry behavior ``` ### 3. Avoid HTTP Mocking Instead of: ```python # ❌ Brittle HTTP mocking httpx_mock.add_response( url="https://api.openai.com/...", json={"choices": [{"message": {"content": "..."}}]} ) ``` Use: ```python # βœ… Clean protocol-based mocking client = MockLLMClient(responses={"prompt": "answer"}) ``` ### 4. Test Statistical Distribution For chaos testing, validate statistical properties: ```python successes = 0 failures = 0 trials = 1000 chaos = ChaosLLMClient(base_client=base, success_rate=0.7, seed=42) for _ in range(trials): try: await chaos.generate_answer("test") successes += 1 except RuntimeError: failures += 1 success_rate = successes / trials assert 0.65 <= success_rate <= 0.75 # Allow 5% tolerance ``` ## Migration from HTTP Mocking ### Before (pytest-httpx) ```python def test_openai_client(httpx_mock): httpx_mock.add_response( method="POST", url="https://api.openai.com/v1/chat/completions", json={ "choices": [{"message": {"content": "test answer"}}], "usage": {"total_tokens": 100} } ) client = OpenAIClient(...) response = await client.generate_answer("test") assert response.answer_text == "test answer" ``` ### After (MockLLMClient) ```python def test_extraction_pipeline(): client = MockLLMClient(responses={"test": "test answer"}) response = await client.generate_answer("test") assert response.answer_text == "test answer" # Now test the entire pipeline extraction = parse_answer(response.answer_text, brands) # ... test extraction logic ``` ## See Also - [Development Setup](../development-setup/) - Setting up your dev environment - [Testing Guide](../testing/) - Overall testing strategy - [Code Standards](../code-standards/) - Code quality requirements # Documentation Guidelines Contributing to documentation. ## Documentation Structure ```text docs/ β”œβ”€β”€ index.md β”œβ”€β”€ getting-started/ β”œβ”€β”€ user-guide/ β”œβ”€β”€ providers/ β”œβ”€β”€ examples/ β”œβ”€β”€ data-analytics/ β”œβ”€β”€ evaluation/ β”œβ”€β”€ advanced/ β”œβ”€β”€ reference/ └── contributing/ ``` ## Writing Guidelines ### Style - Use clear, concise language - Write in active voice - Include code examples - Add links to related pages ### Formatting ```markdown # Page Title Brief introduction paragraph. ## Section Heading Content with examples: \`\`\`python # Code example config = load_config("config.yaml") \`\`\` ### Subsection More detailed content. ``` ### Material for MkDocs Features ```markdown !!! tip "Pro Tip" Use this feature for better results. !!! warning This operation costs money. === "Python" \`\`\`python import module \`\`\` === "Bash" \`\`\`bash command --flag \`\`\` ``` ## Building Docs ```bash # Install dependencies uv sync --dev # Build docs mkdocs build # Serve locally mkdocs serve # Open browser to http://localhost:8000 ``` ## Previewing Changes ```bash mkdocs serve --watch docs/ ``` See [Development Setup](../development-setup/) for environment setup. # Special Optional # LLM Answer Watcher **Monitor how Large Language Models talk about your brand versus competitors in buyer-intent queries** [Get Started](getting-started/quick-start/) [View on GitHub](https://github.com/nibzard/llm-answer-watcher) ______________________________________________________________________ ## What is LLM Answer Watcher? LLM Answer Watcher is a production-ready CLI tool that helps you understand how AI models like ChatGPT, Claude, and others represent your brand when users ask buyer-intent questions. As AI-powered search becomes mainstream, monitoring your brand's presence in LLM responses is crucial for: - **Brand Visibility**: Track if your product appears in AI recommendations - **Competitive Intelligence**: See which competitors are mentioned alongside you - **Market Positioning**: Understand your ranking compared to alternatives - **Trend Analysis**: Historical data shows how your presence changes over time ## Demo See LLM Answer Watcher in action: **What you're seeing:** - Configuration validation with brand and competitor definitions - Real-time progress bars showing query execution across LLM providers - Brand mention extraction and ranking from AI responses - Cost tracking and results summary **Try it yourself:** Run `llm-answer-watcher demo` for an interactive demo (no API keys needed!) ______________________________________________________________________ ## Key Features ### πŸ” Brand Mention Detection Advanced word-boundary regex matching prevents false positives while accurately identifying your brand and competitors in LLM responses. ### πŸ“Š Historical Tracking All responses are stored in a local SQLite database, enabling powerful trend analysis and long-term visibility tracking. ### πŸ€– Multi-Provider Support Works with **6+ LLM providers**: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, and Perplexity, with an extensible architecture for adding more. ### 🌐 Browser Runners (BETA - New in v0.2.0) Interact with web-based LLM interfaces (ChatGPT, Perplexity) using headless browser automation via Steel API. Captures true user experience with screenshots and HTML snapshots. ### ⚑ Async Parallelization (New in v0.2.0) 3-4x faster performance with async/await parallel query execution across multiple models and providers. ### πŸ“ˆ Intelligent Rank Extraction Automatically detects where your brand appears in ranked lists using pattern-based extraction and optional LLM-assisted ranking. ### 🎭 Sentiment Analysis & Intent Classification - **Sentiment Analysis**: Analyze the tone (positive/neutral/negative) and context of each brand mention - **Intent Classification**: Determine user intent type, buyer journey stage, and urgency signals - **Prioritization**: Focus on high-value queries with ready-to-buy intent - **ROI Tracking**: Understand which mentions drive real business value ### πŸ’° Dynamic Pricing & Budget Protection - Real-time pricing from [llm-prices.com](https://www.llm-prices.com) - Pre-run cost estimation - Configurable spending limits - Accurate web search cost calculation ### 🎯 Dual-Mode CLI - **Human Mode**: Beautiful Rich output with spinners, colors, and formatted tables - **Agent Mode**: Structured JSON output for AI agent automation - **Quiet Mode**: Minimal tab-separated output for scripts ### πŸ“‹ Professional HTML Reports Auto-generated reports with: - Brand mention visualizations - Rank distribution charts - Historical trends - Raw response inspection ### πŸ”’ Local-First & Secure - All data stored locally on your machine - BYOK (Bring Your Own Keys) - use your own API keys - No external dependencies except LLM APIs - Built-in SQL injection and XSS protection ## Quick Example ```bash # Set your API keys export OPENAI_API_KEY=sk-your-key-here export ANTHROPIC_API_KEY=sk-ant-your-key-here # Run with a config file llm-answer-watcher run --config watcher.config.yaml ``` **Output:** ```text πŸ” Running LLM Answer Watcher... β”œβ”€β”€ Query: "What are the best email warmup tools?" β”œβ”€β”€ Models: OpenAI gpt-4o-mini, Anthropic claude-3-5-haiku β”œβ”€β”€ Brands: 2 monitored, 5 competitors └── Output: ./output/2025-11-01T14-30-00Z/ βœ… Queries completed: 6/6 πŸ’° Total cost: $0.0142 πŸ“Š Report: ./output/2025-11-01T14-30-00Z/report.html ``` ## Use Cases ### 1. Brand Monitoring Track your product's visibility in AI-powered search results across multiple LLM providers. ### 2. Competitive Analysis See which competitors appear most frequently and in what context they're recommended. ### 3. SEO for AI Era Optimize your brand presence in LLM training data and real-time retrieval systems. ### 4. Market Research Understand how AI models categorize and compare products in your space. ### 5. Product Development Identify gaps where competitors are mentioned but your product isn't. ### 6. Sales Intelligence Know what alternatives prospects might be comparing you against. ## Architecture Highlights LLM Answer Watcher is built with production-ready patterns: - **Domain-Driven Design**: Clear separation between config, LLM clients, extraction, storage, and reporting - **Provider Abstraction**: Easy to add new LLM providers with unified interface - **Plugin System**: Extensible runner architecture supporting both API and browser-based runners - **Async/Await**: Parallel query execution for 3-4x performance improvement (v0.2.0+) - **Retry Logic**: Exponential backoff with tenacity for resilient API calls - **Type Safety**: Full Pydantic validation and modern Python 3.12+ type hints - **Testability**: 750+ test cases with 100% coverage on critical paths - **API-First Contract**: Internal structure designed to become HTTP API for Cloud product ## Documentation Structure This documentation is organized progressively from beginner to advanced: ### [Getting Started](getting-started/quick-start/) Everything you need to get up and running in 5 minutes. ### [User Guide](user-guide/configuration/overview/) Comprehensive guides for configuration, usage, and features. ### [Supported Providers](providers/overview/) Detailed information about each LLM provider integration. ### [Examples](examples/basic-monitoring/) Real-world examples and use cases with complete configurations. ### [Data & Analytics](data-analytics/output-structure/) Understanding output structure and running SQL analytics. ### [Evaluation Framework](evaluation/overview/) Quality control and accuracy testing for extraction logic. ### [Advanced Topics](advanced/architecture/) Deep dives into architecture, security, and extending the system. ### [Reference](reference/cli-reference/) Complete CLI command reference and configuration schemas. ## For LLMs & AI Agents This documentation is available in LLM-optimized formats following the [llmstxt.org](https://llmstxt.org) standard: - **[llms.txt](https://nibzard.github.io/llm-answer-watcher/llms.txt)** - Concise navigation index (~800 tokens) - **[llms-full.txt](https://nibzard.github.io/llm-answer-watcher/llms-full.txt)** - Complete documentation (~59K tokens) These files are auto-generated on every documentation build and provide structured, markdown-formatted content optimized for LLM context injection. ## Philosophy LLM Answer Watcher is built on these principles: - **Boring is Good**: Simple, readable code over clever abstractions - **Local-First**: Your data stays on your machine - **Production-Ready**: Proper error handling, retry logic, and security from day one - **Data is the Moat**: Historical SQLite tracking provides long-term value - **Developer Experience**: Both human-friendly and AI agent-ready interfaces ## Next Steps - **Quick Start** ______________________________________________________________________ Install and run your first monitoring job in minutes. [Get Started β†’](getting-started/quick-start/) - **Configuration** ______________________________________________________________________ Learn how to configure models, brands, and intents. [Configuration Guide β†’](user-guide/configuration/overview/) - **Examples** ______________________________________________________________________ See real-world examples for common use cases. [View Examples β†’](examples/basic-monitoring/) - **Analytics** ______________________________________________________________________ Query your data with SQL for powerful insights. [Data & Analytics β†’](data-analytics/sqlite-database/) ## Community & Support - **GitHub Issues**: [Report bugs or request features](https://github.com/nibzard/llm-answer-watcher/issues) - **Contributing**: [Read our contributing guide](contributing/development-setup/) - **License**: MIT - see [LICENSE](https://github.com/nibzard/llm-answer-watcher/blob/main/LICENSE) ______________________________________________________________________ Built with ❀️ by [Nikola BaliΔ‡](https://github.com/nibzard) # Frequently Asked Questions ## General ### What is LLM Answer Watcher? LLM Answer Watcher is a CLI tool that monitors how large language models (like ChatGPT, Claude) talk about your brand versus competitors when answering buyer-intent queries. ### Why should I use this? As AI-powered search becomes mainstream (ChatGPT, Perplexity, Google AI Overview), understanding your brand's presence in LLM responses is crucial for: - Brand visibility tracking - Competitive intelligence - SEO for the AI era - Market positioning ### Is it free? The tool is **open source** (MIT license) and free to use. However, you pay for: - LLM API calls (typically (0.001-)0.01 per query) - Your own compute resources ### How much does it cost to run? **Example costs per run**: - 3 intents Γ— 1 model (gpt-4o-mini): ~$0.006 - 5 intents Γ— 2 models: ~$0.020 - 10 intents Γ— 5 models: ~$0.150 See [Cost Management](../user-guide/features/cost-management/) for details. ## Installation & Setup ### What Python version do I need? **Python 3.12 or 3.13** is required. The tool uses modern Python features. ### Can I use pip instead of uv? Yes! Both work: ```bash # With uv (recommended - faster) uv sync # With pip (traditional) pip install -e . ``` ### Which LLM providers are supported? - OpenAI (GPT models) - Anthropic (Claude models) - Mistral AI - X.AI (Grok models) - Google (Gemini models) - Perplexity See [Providers](../providers/overview/) for complete list. ### Do I need API keys for all providers? No! You only need API keys for providers you want to use. Start with just OpenAI if you want. ## Configuration ### How do I create a configuration file? See [Basic Configuration](../getting-started/basic-configuration/). Minimum config: ```yaml run_settings: output_dir: "./output" models: - provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" brands: mine: ["YourBrand"] competitors: ["CompetitorA"] intents: - id: "best-tools" prompt: "What are the best tools?" ``` ### How many brands should I track? **Your brands**: Include all variations (e.g., "HubSpot", "HubSpot CRM", "hubspot.com") **Competitors**: Start with top 5-10 direct competitors. You can always add more. ### What makes a good intent prompt? Good prompts are: - **Natural**: How real users ask - **Buyer-intent**: Imply evaluation/purchase - **Specific**: Target a use case Examples: - βœ… "What are the best email warmup tools for startups?" - ❌ "Tell me about email" ### Can I use the same config for multiple runs? Yes! Configs are reusable. All data is timestamped and stored separately. ## Usage ### Why aren't my brands being detected? Common causes: 1. **Name mismatch**: LLM used "HubSpot CRM" but you only configured "HubSpot" 1. **Solution**: Add all brand variations 1. **Brand not mentioned**: LLM didn't include your brand 1. **Solution**: This is valuable data! Your brand isn't top-of-mind for that query 1. **Word boundary issue**: "Hub" won't match in "GitHub" 1. **Solution**: This is intentional to prevent false positives ### How do I track historical trends? All data is stored in SQLite at `./output/watcher.db`: ```sql SELECT DATE(timestamp_utc), AVG(rank_position) FROM mentions WHERE normalized_name = 'yourbrand' GROUP BY DATE(timestamp_utc); ``` See [Data Analytics](../data-analytics/sqlite-database/). ### Can I run this in CI/CD? Yes! Use `--yes --format json` for automation: ```bash llm-answer-watcher run --config config.yaml --yes --format json ``` See [Automation Guide](../user-guide/usage/automation/). ### What are the exit codes? - `0`: Success - `1`: Configuration error - `2`: Database error - `3`: Partial failure (acceptable) - `4`: Complete failure See [Exit Codes](../user-guide/usage/exit-codes/). ## Features ### What's the difference between regex and LLM extraction? **Regex extraction** (default): - Fast and cheap - Pattern-based matching - 90%+ accuracy **LLM extraction** (`use_llm_rank_extraction: true`): - More accurate for complex cases - Costs extra (additional LLM calls) - 95%+ accuracy Start with regex. Only use LLM if needed. ### What is function calling? Function calling uses LLM's built-in structured output feature for extraction. More accurate than regex. Enable it: ```yaml extraction_settings: method: "function_calling" extraction_model: provider: "openai" model_name: "gpt-4o-mini" env_api_key: "OPENAI_API_KEY" ``` See [Function Calling](../user-guide/features/function-calling/). ### How do budget controls work? Set spending limits: ```yaml budget: enabled: true max_per_run_usd: 1.00 max_per_intent_usd: 0.10 ``` Tool validates **before running** and aborts if estimated cost exceeds limits. See [Budget Controls](../user-guide/configuration/budget/). ### Can I use web search? Yes, but it increases costs significantly ((10-)25 per 1,000 calls): ```yaml web_search: enabled: true max_results: 10 ``` See [Web Search](../user-guide/configuration/web-search/). ## Data & Privacy ### Where is my data stored? **Locally on your machine**: - SQLite database: `./output/watcher.db` - JSON files: `./output/YYYY-MM-DDTHH-MM-SSZ/` - HTML reports: `./output/YYYY-MM-DDTHH-MM-SSZ/report.html` **No data leaves your machine** except LLM API calls. ### Is my data sent anywhere? Only to configured LLM providers (OpenAI, Anthropic, etc.) for query processing. We don't collect any data. ### Are API keys secure? API keys are: - Loaded from environment variables - Never logged or written to disk - Never sent anywhere except the respective LLM provider See [Security](../advanced/security/). ### Can I delete old data? Yes! Simply delete directories or database records: ```bash # Delete runs older than 90 days find output/ -name "2024-*" -type d -mtime +90 -exec rm -rf {} + ``` ## Troubleshooting ### "Configuration error: API key not found" **Solution**: ```bash # Check if key is set echo $OPENAI_API_KEY # If empty, export it export OPENAI_API_KEY=sk-your-key-here ``` ### "Rate limit exceeded" **Solution**: LLM provider rate limit hit. Options: 1. Wait and retry 1. Reduce number of queries 1. Use slower model tiers 1. Upgrade API plan ### "No brands detected" **Causes**: 1. Brand not mentioned by LLM 1. Brand name mismatch (add aliases) 1. Case sensitivity (should work - file a bug) ### "Database locked" **Solution**: Another process is using the database: ```bash # Find process lsof output/watcher.db # Kill if needed kill -9 ``` ### Build/Import Errors **Solution**: ```bash # Reinstall pip install -e . # Check Python version python --version # Should be 3.12+ ``` ## Advanced ### Can I extend it with new providers? Yes! See [Extending Providers](../advanced/extending-providers/). ### Can I customize system prompts? Yes! See [Custom System Prompts](../advanced/custom-system-prompts/). ### Is there a Python API? Yes! See [Python API Reference](../reference/python-api/). ### Can I contribute? Absolutely! See [Contributing Guide](../contributing/development-setup/). ## Still Have Questions? - **GitHub Issues**: [Report bugs or ask questions](https://github.com/nibzard/llm-answer-watcher/issues) - **Documentation**: Browse this site - **Examples**: Check `examples/` directory in the repository # Changelog All notable changes to LLM Answer Watcher will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Planned - Additional browser runners (Claude, Gemini web UIs) - Enhanced cost tracking for browser runners - DeepEval integration for quality metrics - Trends command for historical analysis ## [0.2.0] - 2025-11-08 ### Added - Major Features - **🌐 Browser Runners (BETA)**: Steel API integration for web-based LLM interfaces - ChatGPT web UI runner with session management - Perplexity web UI runner with citation extraction - Screenshot capture and HTML snapshot support - Session reuse for cost optimization - Plugin system for extensible browser automation - See [Browser Runners Guide](../BROWSER_RUNNERS/) for details - **⚑ Async/Await Parallelization**: 3-4x performance improvement - Parallel query execution across models - Async progress callbacks - RuntimeWarning fixes for async operations - **πŸ” Google Search Grounding**: Enhanced Gemini model support - Google Search grounding for Gemini models - Accurate web search cost calculation - Grounded responses with citations - **🎯 Post-Intent Operations**: Dynamic workflow support - Configurable operations to run after each intent - Operation models with validation - Config filename tracking in reports - Model capability detection - **πŸ“Š Advanced Analysis Features**: - **Sentiment Analysis**: Analyze tone (positive/neutral/negative) and context of each brand mention - **Intent Classification**: Classify user queries by intent type, buyer journey stage, and urgency signals - Intent types: transactional, informational, navigational, commercial_investigation - Buyer stages: awareness, consideration, decision - Urgency signals: high, medium, low - Confidence scoring and reasoning explanations - Brand visibility score in reports - HTML report filtering and web search badges - **πŸ“š Documentation Expansion**: - Comprehensive MkDocs documentation with Material theme (60+ pages) - Browser runners guide with Steel integration - Google Search grounding documentation - 44 example configurations across 8 directories ### Added - Database & Storage - New database tables and columns for sentiment and intent data - `mentions` table: `sentiment` and `mention_context` columns - `intent_classifications` table with query hash caching - 5 new indexes for filtering by sentiment, context, intent type, buyer stage, and urgency - SQLite schema version 5 (migration support included) ### Added - Configuration - Configuration options: `enable_sentiment_analysis` and `enable_intent_classification` (both default true) - Runner plugin configuration system - Browser runner specific settings (Steel API, screenshots, sessions) ### Changed - **Breaking**: Configuration format updated to support runner plugins - Improved test coverage to 100% for core modules - Enhanced error messages for better debugging - Function calling extraction schema expanded with sentiment/context fields - Correct Responses API format with required type field - Improved validation, error handling, and config validation ### Fixed - Database schema mismatches and exception handling in CLI - Rank display in HTML reports (shows actual positions not match positions) - GPT-4.1 model support in OpenAI client - Code review findings (validation, error handling, config) - RuntimeWarnings for async operations - Indentation in runner loop to process all models ### Cost Impact - Intent classification: ~$0.00012 per query (one-time per unique query, cached) - Sentiment extraction: ~33% increase per extraction call (integrated into function calling) - Browser runners: $0.10-0.30/hour via Steel (not yet tracked in cost estimates) ### Known Limitations (v0.2.0) - Browser runner cost tracking returns $0.00 (placeholder - actual Steel costs not calculated) - Browser runners are BETA quality (added Nov 8, 2025) - CSS selectors for browser runners may break if web UIs change - No authentication handling documented for ChatGPT login - Response completion detection is heuristic-based ## [0.1.0] - 2025-11-05 ### Added - Initial release of LLM Answer Watcher - Multi-provider support: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, Perplexity - Brand mention detection with word-boundary matching - Rank extraction (pattern-based and LLM-assisted) - SQLite database for historical tracking - HTML report generation with Jinja2 - Dual-mode CLI (human-friendly Rich output, structured JSON for automation) - Budget controls and cost estimation - Dynamic pricing from llm-prices.com with 24-hour caching - Web search cost calculation for OpenAI models - Retry logic with exponential backoff - Evaluation framework for extraction accuracy - Configuration validation with Pydantic - Exit codes for automation (0-4) - Example configurations - Comprehensive test suite (750+ tests) - GitHub Actions CI/CD pipeline ### Core Modules - `config/`: YAML loading and Pydantic validation - `llm_runner/`: Multi-provider LLM client abstraction - `extractor/`: Brand mention detection and rank extraction - `storage/`: SQLite schema and JSON writers - `report/`: HTML report generation - `utils/`: Time utilities, logging, cost estimation, Rich console - `evals/`: Evaluation framework ### Supported Models - **OpenAI**: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo - **Anthropic**: claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus - **Mistral**: mistral-large-latest, mistral-small-latest - **X.AI**: grok-beta, grok-2-1212, grok-2-latest, grok-3 - **Google**: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash - **Perplexity**: sonar, sonar-pro, sonar-reasoning ### Documentation - README with quick start and examples - CLAUDE.md with development guidelines - CONTRIBUTING.md with contribution guidelines - SPECS.md with complete engineering specification - TODO.md with milestone tracking ### Security - Environment variable-based API key management - SQL injection prevention (parameterized queries) - XSS prevention (Jinja2 autoescaping) - No API key logging ## Release Notes ### Version 0.1.0 - Production Ready This is the first production-ready release of LLM Answer Watcher. The tool is feature-complete for core brand monitoring use cases: **Highlights**: - βœ… 8,200 lines of production Python code - βœ… 17,400 lines of test code (750+ tests) - βœ… 100% coverage on critical paths - βœ… 6 LLM providers supported - βœ… Complete evaluation framework - βœ… Full documentation **What's Working**: - All core features tested and validated - Multi-provider queries with retry logic - Accurate brand mention detection (90%+ precision) - Historical tracking in SQLite - Professional HTML reports - Budget protection - CI/CD integration **Known Limitations** (v0.1.0 - resolved in v0.2.0): - ~~No async support (intentionally - keeping it simple)~~ - **ADDED in v0.2.0** - ~~Web search only for OpenAI models~~ - **Google Search grounding added in v0.2.0** - Perplexity request fees not yet in cost estimates - Trends command not yet implemented (data collection works) **Upgrade Notes**: - This is the initial release - no upgrades needed - SQLite schema version 1 - Configuration format stable ## Future Roadmap ### Planned Features **v0.2.0** - βœ… **RELEASED 2025-11-08**: - βœ… Async support for parallel queries (3-4x faster) - βœ… Enhanced web search support (Google Search grounding) - βœ… Browser runners (BETA) - ⏳ `trends` command for historical analysis (moved to v0.3.0) - ⏳ Dashboard UI for visualizing trends (moved to v0.3.0) - ⏳ DeepEval integration for quality metrics (moved to v0.3.0) **v0.3.0** (Q1 2025): - `trends` command for historical analysis - Dashboard UI for visualizing trends - DeepEval integration for quality metrics - Production-ready browser runners (cost tracking, authentication) - Additional browser runners (Claude, Gemini web UIs) - Cloud deployment option - HTTP API (expose internal contract) - Real-time alerts and webhooks - Advanced analytics and insights - Multi-user support **v1.0.0** (Q3 2025): - Enterprise features - Advanced provider integrations - Custom model support - White-label options - SaaS offering ## Contributing We welcome contributions! See [CONTRIBUTING.md](../contributing/development-setup/) for guidelines. ## Links - **Repository**: [github.com/nibzard/llm-answer-watcher](https://github.com/nibzard/llm-answer-watcher) - **Issues**: [github.com/nibzard/llm-answer-watcher/issues](https://github.com/nibzard/llm-answer-watcher/issues) - **Documentation**: This site - **License**: MIT