# LLM Answer Watcher

> Monitor how Large Language Models talk about your brand versus competitors in buyer-intent queries

Production-ready CLI tool that monitors how large language models mention brands versus competitors in buyer-intent queries.

Key Features:
- Multi-provider support: OpenAI, Anthropic, Mistral, Grok, Google Gemini, Perplexity
- Local-first SQLite storage with historical tracking
- Dual-mode CLI: Beautiful Rich output for humans, structured JSON for AI agents
- BYOK model: Bring Your Own API Keys
- Word-boundary brand detection to prevent false positives
- Cost management and budget controls
- HTML reports with Jinja2
- Evaluation framework with metrics


# Quick Start

# Installation

This guide covers all installation methods for LLM Answer Watcher.

## System Requirements

### Python Version

LLM Answer Watcher requires **Python 3.12 or 3.13**. It uses modern Python features including:

- Native union type syntax (`|` instead of `Union`)
- Improved type hints
- Performance optimizations

Check your Python version:

```bash
python3 --version
# Should output: Python 3.12.x or Python 3.13.x
```

### Installing Python 3.12+

```bash
# Using Homebrew
brew install python@3.12

# Verify installation
python3.12 --version
```

```bash
# Add deadsnakes PPA
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update

# Install Python 3.12
sudo apt install python3.12 python3.12-venv python3.12-dev

# Verify installation
python3.12 --version
```

Download Python 3.12 from [python.org](https://www.python.org/downloads/)

During installation:

- ✅ Check "Add Python to PATH"
- ✅ Check "Install pip"

Verify installation:

```text
python --version
```

## Installation Methods

### Method 1: uv (Recommended)

[uv](https://github.com/astral-sh/uv) is a fast, modern Python package installer written in Rust. It's significantly faster than pip and handles virtual environments automatically.

#### Install uv

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

```powershell
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

```bash
pip install uv
```

#### Install LLM Answer Watcher

```bash
# Clone the repository
git clone https://github.com/nibzard/llm-answer-watcher.git
cd llm-answer-watcher

# Install all dependencies (creates .venv automatically)
uv sync

# For development with extra dependencies
uv sync --dev
```

#### Activate Virtual Environment

uv creates a `.venv` directory automatically. You can optionally activate it:

```bash
# macOS/Linux
source .venv/bin/activate

# Windows
.venv\Scripts\activate
```

uv handles activation automatically

When you run `uv run llm-answer-watcher`, uv automatically uses the virtual environment. Explicit activation is optional.

### Method 2: pip

Traditional pip installation with manual virtual environment management.

```bash
# Clone the repository
git clone https://github.com/nibzard/llm-answer-watcher.git
cd llm-answer-watcher

# Create virtual environment
python3.12 -m venv .venv

# Activate virtual environment
source .venv/bin/activate  # macOS/Linux
# or
.venv\Scripts\activate     # Windows

# Install package in editable mode
pip install -e .

# For development with extra dependencies
pip install -e ".[dev]"
```

### Method 3: PyPI (Coming Soon)

Once published to PyPI, you'll be able to install directly:

```bash
# Future installation method
pip install llm-answer-watcher
```

## Verify Installation

Check that the installation was successful:

```bash
llm-answer-watcher --version
```

You should see output like:

```text
llm-answer-watcher version 0.1.0
```

Test the CLI help:

```bash
llm-answer-watcher --help
```

## API Keys Setup

LLM Answer Watcher requires API keys from LLM providers. You need at least one provider configured.

### Supported Providers

| Provider            | Environment Variable | Get API Key                                                              |
| ------------------- | -------------------- | ------------------------------------------------------------------------ |
| **OpenAI**          | `OPENAI_API_KEY`     | [platform.openai.com](https://platform.openai.com/api-keys)              |
| **Anthropic**       | `ANTHROPIC_API_KEY`  | [console.anthropic.com](https://console.anthropic.com/)                  |
| **Mistral**         | `MISTRAL_API_KEY`    | [console.mistral.ai](https://console.mistral.ai/)                        |
| **X.AI (Grok)**     | `XAI_API_KEY`        | [x.ai/api](https://x.ai/api)                                             |
| **Google (Gemini)** | `GOOGLE_API_KEY`     | [aistudio.google.com](https://aistudio.google.com/app/apikey)            |
| **Perplexity**      | `PERPLEXITY_API_KEY` | [www.perplexity.ai/settings/api](https://www.perplexity.ai/settings/api) |

### Setting API Keys

#### Temporary (Current Session)

```bash
export OPENAI_API_KEY=sk-your-key-here
export ANTHROPIC_API_KEY=sk-ant-your-key-here
```

#### Persistent (`.env` file)

Create a `.env` file in your project directory:

```bash
# .env file
OPENAI_API_KEY=sk-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
MISTRAL_API_KEY=mistral-your-key
XAI_API_KEY=xai-your-grok-key
GOOGLE_API_KEY=AIza-your-google-key
PERPLEXITY_API_KEY=pplx-your-perplexity-key
```

Load the file before running:

```bash
source .env
```

Security: Never commit API keys

Add `.env` to `.gitignore`:

```bash
echo ".env" >> .gitignore
```

#### Using direnv (Recommended for Development)

[direnv](https://direnv.net/) automatically loads `.env` when you enter the directory:

```bash
# Install direnv
brew install direnv  # macOS
# or
sudo apt install direnv  # Ubuntu/Debian

# Create .envrc file
echo 'source .env' > .envrc

# Allow direnv to load it
direnv allow
```

Now your keys load automatically when you `cd` into the directory.

## Development Dependencies

If you're contributing or want to run tests:

```bash
# With uv
uv sync --dev

# With pip
pip install -e ".[dev]"
```

This installs additional tools:

- **pytest** - Test runner
- **pytest-httpx** - HTTP mocking for tests
- **pytest-cov** - Coverage reporting
- **pytest-mock** - Advanced mocking
- **freezegun** - Time mocking
- **ruff** - Fast Python linter and formatter
- **mkdocs** - Documentation builder
- **mkdocs-material** - Material theme for MkDocs

## Docker Installation (Optional)

For containerized deployment:

```dockerfile
# Dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy project files
COPY . .

# Install dependencies
RUN uv sync

# Set entrypoint
ENTRYPOINT ["uv", "run", "llm-answer-watcher"]
```

Build and run:

```bash
docker build -t llm-answer-watcher .
docker run -e OPENAI_API_KEY=$OPENAI_API_KEY \
           -v $(pwd)/output:/app/output \
           llm-answer-watcher run --config config.yaml
```

## Troubleshooting

### Python Version Issues

If you have multiple Python versions:

```bash
# Use specific Python version
python3.12 -m venv .venv
source .venv/bin/activate
python --version  # Verify it's 3.12.x
```

### Permission Errors

If you get permission errors during installation:

```bash
# Don't use sudo with pip in virtual environments
# Instead, ensure your virtual environment is activated
source .venv/bin/activate
pip install -e .
```

### SSL Certificate Errors

On macOS, you might need to install certificates:

```bash
/Applications/Python\ 3.12/Install\ Certificates.command
```

### Module Not Found Errors

If you get `ModuleNotFoundError` after installation:

```bash
# Ensure you're in the virtual environment
which python  # Should point to .venv/bin/python

# Re-install the package
pip install -e .
```

### uv Installation Issues

If `uv sync` fails:

```bash
# Try updating uv
pip install --upgrade uv

# Or fall back to pip
pip install -e .
```

## Next Steps

Now that LLM Answer Watcher is installed:

1. [Run your first monitoring job](../first-run/)
1. [Learn about configuration](../basic-configuration/)
1. [Explore supported providers](../../providers/overview/)

## Uninstallation

To remove LLM Answer Watcher:

```bash
# Remove the package
pip uninstall llm-answer-watcher

# Remove the virtual environment
rm -rf .venv

# Remove output data (optional)
rm -rf output/
```

# Quick Start

Get LLM Answer Watcher up and running in 5 minutes.

## Prerequisites

Before you begin, ensure you have:

- **Python 3.12 or 3.13** installed
- **API keys** for at least one LLM provider (OpenAI recommended for getting started)
- **Basic terminal knowledge**

## Installation

### Option 1: Using uv (Recommended)

[uv](https://github.com/astral-sh/uv) is a fast Python package installer and resolver.

```bash
# Clone the repository
git clone https://github.com/nibzard/llm-answer-watcher.git
cd llm-answer-watcher

# Install dependencies
uv sync

# Activate virtual environment (optional, uv handles this automatically)
source .venv/bin/activate
```

### Option 2: Using pip

```bash
# Clone the repository
git clone https://github.com/nibzard/llm-answer-watcher.git
cd llm-answer-watcher

# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
pip install -e .
```

## Set Up API Keys

LLM Answer Watcher uses environment variables for API keys. Set up at least one:

```bash
# OpenAI (recommended for getting started)
export OPENAI_API_KEY=sk-your-openai-key-here

# Optional: Add more providers
export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
export MISTRAL_API_KEY=mistral-your-key-here
export XAI_API_KEY=xai-your-grok-key-here
export GOOGLE_API_KEY=AIza-your-google-api-key-here
export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here
```

Persistent API Keys

Create a `.env` file to persist your keys:

```bash
echo "OPENAI_API_KEY=sk-your-key" > .env
source .env
```

Add `.env` to your `.gitignore` to avoid accidentally committing secrets!

## Your First Run

LLM Answer Watcher includes example configurations you can use immediately.

### 1. Choose an Example Config

The repository includes organized example configs in the `examples/` directory:

- **Quick Start**: `examples/01-quickstart/minimal.config.yaml` - Simplest possible config (1 provider, 1 intent)
- **Explained**: `examples/01-quickstart/explained.config.yaml` - Same config with detailed comments
- **Multi-Provider**: `examples/02-providers/multi-provider-comparison.config.yaml` - Compare all 6 providers

Start with the minimal example:

### 2. Run the Tool

```bash
llm-answer-watcher run --config examples/01-quickstart/minimal.config.yaml
```

### 3. View the Output

You'll see a beautiful progress display:

```text
🔍 Running LLM Answer Watcher...
├── Configuration loaded from examples/watcher.config.yaml
├── Query 1/2: "What are the best email warmup tools?"
├── Query 2/2: "Compare the top email warmup tools"
├── Models: OpenAI gpt-4o-mini
├── Brands: 2 monitored, 4 competitors
└── Output: ./output/2025-11-05T14-30-00Z/

✅ Queries completed: 2/2
💰 Total cost: $0.0042
📊 Report: ./output/2025-11-05T14-30-00Z/report.html
```

### 4. Explore Results

Open the HTML report in your browser:

```bash
open ./output/2025-11-05T14-30-00Z/report.html
# Or on Linux:
xdg-open ./output/2025-11-05T14-30-00Z/report.html
```

The report shows:

- **Summary**: Total costs, queries completed, brands found
- **Brand Mentions**: Which brands appeared in each response
- **Rank Distribution**: Visual charts of ranking positions
- **Raw Responses**: Full LLM outputs for inspection

## Understanding the Output

Each run creates a timestamped directory with:

```text
output/2025-11-05T14-30-00Z/
├── run_meta.json                    # Run summary and stats
├── report.html                      # Interactive HTML report
├── intent_*_raw_*.json             # Raw LLM responses
├── intent_*_parsed_*.json          # Extracted brand mentions
└── intent_*_error_*.json           # Error details (if any)
```

All data is also stored in a SQLite database at `./output/watcher.db` for historical analysis.

## What's Next?

Now that you've run your first monitoring job, here are suggested next steps:

### Create Your Own Configuration

Create `my-watcher.config.yaml`:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"

  competitors:
    - "CompetitorA"
    - "CompetitorB"
    - "IndustryTool"

intents:
  - id: "best-tools-in-category"
    prompt: "What are the best [your category] tools?"

  - id: "comparison-query"
    prompt: "Compare the top [your category] tools"
```

Then run:

```bash
llm-answer-watcher run --config my-watcher.config.yaml
```

### Explore More Features

- **Configuration Deep Dive**

  ______________________________________________________________________

  Learn about all configuration options

  [Configuration Guide →](../../user-guide/configuration/overview/)

- **Multiple Providers**

  ______________________________________________________________________

  Add Anthropic, Mistral, Grok, and more

  [Provider Guide →](../../providers/overview/)

- **Query Your Data**

  ______________________________________________________________________

  Use SQL to analyze historical trends

  [Data Analytics →](../../data-analytics/sqlite-database/)

- **Automate Monitoring**

  ______________________________________________________________________

  Set up scheduled runs with cron or GitHub Actions

  [Automation Guide →](../../user-guide/usage/automation/)

## Common Issues

### "Command not found: llm-answer-watcher"

Make sure you've activated your virtual environment:

```bash
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows
```

### "Configuration error: API key not found"

Ensure your API keys are exported:

```bash
echo $OPENAI_API_KEY  # Should print your key
```

If empty, export it:

```bash
export OPENAI_API_KEY=sk-your-key-here
```

### "ImportError: No module named 'llm_answer_watcher'"

Re-install the package:

```bash
pip install -e .
```

## Explore More Examples

The `examples/` directory is organized by use case:

- **[01-quickstart/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart)** - Minimal examples for first-time users
- **[02-providers/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers)** - All 6 LLM providers (OpenAI, Anthropic, Google, Mistral, Grok, Perplexity)
- **[03-web-search/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/03-web-search)** - Real-time web search integration
- **[04-extraction/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/04-extraction)** - Brand extraction methods (regex, function calling, hybrid)
- **[05-operations/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/05-operations)** - Automated analysis and insights
- **[06-advanced/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced)** - Budget controls, high concurrency, production configs
- **[07-real-world/](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world)** - Complete use case templates

Each directory includes a README with detailed explanations.

## Getting Help

- **Documentation**: Browse this site for comprehensive guides
- **Examples**: Check the `examples/` directory in the repository
- **Issues**: [Report bugs or ask questions](https://github.com/nibzard/llm-answer-watcher/issues)
- **Contributing**: [Read the contributing guide](../../contributing/development-setup/)

______________________________________________________________________

Ready to dive deeper? Continue to the [Installation Guide](../installation/) for more installation options.

# Your First Run

This guide walks you through running LLM Answer Watcher for the first time and understanding the results.

## Before You Start

Ensure you have:

- ✅ Installed LLM Answer Watcher ([Installation Guide](../installation/))
- ✅ Set up at least one API key
- ✅ Activated your virtual environment

## Step 1: Verify Installation

Check that everything is working:

```bash
# Verify the CLI is available
llm-answer-watcher --version

# Check help documentation
llm-answer-watcher --help
```

## Step 2: Validate Example Configuration

Before running, validate the configuration file:

```bash
llm-answer-watcher validate --config examples/watcher.config.yaml
```

Expected output:

```text
✅ Configuration valid
├── Models: 1 configured (openai gpt-4o-mini)
├── Brands: 2 mine, 4 competitors
├── Intents: 2 queries
└── Estimated cost: $0.004
```

If validation fails, you'll see specific error messages about what needs to be fixed.

## Step 3: Run Your First Monitoring Job

Execute a monitoring run:

```bash
llm-answer-watcher run --config examples/watcher.config.yaml
```

### What Happens During a Run

#### 1. Configuration Loading

```text
🔍 Loading configuration from examples/watcher.config.yaml...
├── ✅ YAML syntax valid
├── ✅ Schema validation passed
├── ✅ API keys found
└── ✅ Output directory accessible
```

#### 2. Cost Estimation

```text
💰 Estimated cost breakdown:
├── OpenAI gpt-4o-mini: $0.002 × 2 intents = $0.004
└── Total estimated cost: $0.004

Continue with this run? [Y/n]:
```

Press `Y` to continue, or `n` to abort.

Skip confirmation prompts

Use `--yes` flag to auto-confirm in automated scripts:

```bash
llm-answer-watcher run --config config.yaml --yes
```

#### 3. Query Execution

You'll see progress for each query:

```text
📤 Query 1/2: "What are the best email warmup tools?"
├── Provider: OpenAI (gpt-4o-mini)
├── Sending request... ⏳
├── ✅ Response received (1.2s)
├── Tokens: 145 input, 387 output
├── Cost: $0.002
└── Brands detected: 3 found (Lemwarm, Instantly, HubSpot)

📤 Query 2/2: "Compare the top email warmup tools"
├── Provider: OpenAI (gpt-4o-mini)
├── Sending request... ⏳
├── ✅ Response received (1.4s)
├── Tokens: 152 input, 421 output
├── Cost: $0.002
└── Brands detected: 4 found (Lemwarm, Lemlist, Instantly, Apollo.io)
```

#### 4. Results Summary

```text
✅ Run completed successfully!

📊 Summary:
├── Run ID: 2025-11-05T14-30-00Z
├── Queries: 2/2 completed (100%)
├── Total cost: $0.004
├── Brands found: 5 unique
├── Your brands mentioned: 2/2 queries
├── Competitor mentions: 4/2 queries
└── Output: ./output/2025-11-05T14-30-00Z/

📁 Artifacts created:
├── report.html - Interactive HTML report
├── run_meta.json - Run summary and metadata
├── *.raw.json - Raw LLM responses
├── *.parsed.json - Extracted brand mentions
└── watcher.db - Historical SQLite database
```

## Step 4: Explore the Results

### HTML Report

Open the interactive report:

```bash
# macOS
open ./output/2025-11-05T14-30-00Z/report.html

# Linux
xdg-open ./output/2025-11-05T14-30-00Z/report.html

# Windows
start ./output/2025-11-05T14-30-00Z/report.html
```

The report contains:

#### Summary Section

- Total cost breakdown
- Queries completed vs failed
- Unique brands detected
- Your brand mention rate

#### Brand Mentions Table

| Intent                  | Model       | Your Brand                 | Competitors                    | Rank |
| ----------------------- | ----------- | -------------------------- | ------------------------------ | ---- |
| best-email-warmup-tools | gpt-4o-mini | Lemwarm (#1)               | Instantly (#2), HubSpot (#3)   | 1    |
| email-warmup-comparison | gpt-4o-mini | Lemwarm (#1), Lemlist (#2) | Instantly (#3), Apollo.io (#4) | 1    |

#### Rank Distribution Chart

Visual representation of where your brand appears in ranked lists.

#### Historical Trends

If you've run multiple times, you'll see trend charts showing:

- Brand mention frequency over time
- Average ranking position changes
- Competitor appearance patterns

#### Raw Responses

Expandable sections showing the full LLM response for each query.

### JSON Artifacts

Each run creates structured JSON files:

#### `run_meta.json`

Summary of the entire run:

```json
{
  "run_id": "2025-11-05T14-30-00Z",
  "timestamp_utc": "2025-11-05T14:30:00Z",
  "config_path": "examples/watcher.config.yaml",
  "total_cost_usd": 0.004,
  "queries_completed": 2,
  "queries_failed": 0,
  "brands_detected": {
    "mine": ["Lemwarm", "Lemlist"],
    "competitors": ["Instantly", "HubSpot", "Apollo.io"]
  }
}
```

#### `intent_*_raw_*.json`

Raw LLM response with metadata:

```json
{
  "intent_id": "best-email-warmup-tools",
  "provider": "openai",
  "model_name": "gpt-4o-mini",
  "prompt": "What are the best email warmup tools?",
  "answer_text": "Here are the best email warmup tools:\n\n1. Lemwarm...",
  "tokens_used": 532,
  "cost_usd": 0.002,
  "timestamp_utc": "2025-11-05T14:30:00Z"
}
```

#### `intent_*_parsed_*.json`

Extracted brand mentions and ranks:

```json
{
  "intent_id": "best-email-warmup-tools",
  "provider": "openai",
  "model_name": "gpt-4o-mini",
  "brands_found": {
    "mine": [
      {
        "brand": "Lemwarm",
        "normalized": "lemwarm",
        "rank_position": 1,
        "context": "1. Lemwarm - Best for automated warmup"
      }
    ],
    "competitors": [
      {
        "brand": "Instantly",
        "normalized": "instantly",
        "rank_position": 2,
        "context": "2. Instantly - Great deliverability features"
      }
    ]
  }
}
```

### SQLite Database

All data is stored in `./output/watcher.db` for historical tracking:

```bash
# Open the database
sqlite3 ./output/watcher.db

# View recent runs
SELECT run_id, timestamp_utc, total_cost_usd, queries_completed
FROM runs
ORDER BY timestamp_utc DESC
LIMIT 5;
```

## Step 5: Run with Different Modes

### Agent Mode (Structured JSON Output)

Perfect for automation and AI agents:

```bash
llm-answer-watcher run --config examples/watcher.config.yaml --format json
```

Output:

```json
{
  "run_id": "2025-11-05T14-30-00Z",
  "status": "success",
  "queries_completed": 2,
  "queries_failed": 0,
  "total_cost_usd": 0.004,
  "output_dir": "./output/2025-11-05T14-30-00Z",
  "brands_detected": {
    "mine": ["Lemwarm", "Lemlist"],
    "competitors": ["Instantly", "HubSpot", "Apollo.io"]
  }
}
```

### Quiet Mode (Minimal Output)

For scripts and pipelines:

```bash
llm-answer-watcher run --config examples/watcher.config.yaml --quiet
```

Output (tab-separated):

```text
2025-11-05T14-30-00Z    success 2   0.004   ./output/2025-11-05T14-30-00Z
```

### Automation Mode (No Prompts)

Skip confirmation prompts:

```bash
llm-answer-watcher run --config examples/watcher.config.yaml --yes --format json
```

## Understanding Exit Codes

LLM Answer Watcher uses exit codes for automation:

```bash
llm-answer-watcher run --config config.yaml
echo $?  # Print exit code
```

| Exit Code | Meaning             | When It Happens                            |
| --------- | ------------------- | ------------------------------------------ |
| **0**     | Success             | All queries completed successfully         |
| **1**     | Configuration Error | Invalid YAML, missing API keys, bad schema |
| **2**     | Database Error      | Cannot create/access SQLite database       |
| **3**     | Partial Failure     | Some queries failed, but run completed     |
| **4**     | Complete Failure    | No queries succeeded                       |

Use in scripts:

```bash
#!/bin/bash
llm-answer-watcher run --config config.yaml --yes

case $? in
    0) echo "✅ Success!" ;;
    1) echo "❌ Configuration error" && exit 1 ;;
    2) echo "❌ Database error" && exit 1 ;;
    3) echo "⚠️  Partial failure" ;;
    4) echo "❌ Complete failure" && exit 1 ;;
esac
```

## Common First-Run Issues

### Issue: "API key not found"

**Solution**: Ensure API keys are exported:

```bash
echo $OPENAI_API_KEY  # Should print your key
export OPENAI_API_KEY=sk-your-key-here
```

### Issue: "Permission denied: ./output/"

**Solution**: Create output directory with correct permissions:

```bash
mkdir -p output
chmod 755 output
```

### Issue: "No brands detected"

**Possible causes**:

1. **Brand name mismatch**: LLM used different name (e.g., "HubSpot CRM" vs "HubSpot")
1. **Not mentioned**: Brand wasn't included in LLM response
1. **Word boundary issue**: Brand name contains special characters

**Solution**: Check raw response and add brand aliases:

```yaml
brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"
    - "YourBrand CRM"  # Add variations
```

### Issue: "Rate limit exceeded"

**Solution**: LLM API rate limit hit. Wait and retry, or add retry configuration:

```yaml
run_settings:
  retry_max_attempts: 5
  retry_wait_exponential_multiplier: 2
```

## Next Steps

Now that you've completed your first run:

- **Customize Configuration**

  ______________________________________________________________________

  Create your own config with your brands and intents

  [Basic Configuration →](../basic-configuration/)

- **Query Your Data**

  ______________________________________________________________________

  Use SQL to analyze results and track trends

  [Data Analytics →](../../data-analytics/sqlite-database/)

- **Add More Providers**

  ______________________________________________________________________

  Compare results across OpenAI, Anthropic, Mistral, and more

  [Provider Guide →](../../providers/overview/)

- :material-calendar-repeat: **Automate Runs**

  ______________________________________________________________________

  Set up scheduled monitoring with cron or GitHub Actions

  [Automation →](../../user-guide/usage/automation/)

# Basic Configuration

Learn how to create your first custom configuration file for LLM Answer Watcher.

## Configuration File Structure

LLM Answer Watcher uses YAML configuration files with three main sections:

```yaml
run_settings:    # How and where to run
brands:          # What brands to monitor
intents:         # What questions to ask
```

## Minimal Configuration

The simplest possible configuration:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

brands:
  mine:
    - "YourBrand"

  competitors:
    - "CompetitorA"
    - "CompetitorB"

intents:
  - id: "best-tools"
    prompt: "What are the best [your category] tools?"
```

This configuration:

- Uses OpenAI's `gpt-4o-mini` (cost-effective)
- Monitors 1 brand vs 2 competitors
- Asks 1 intent question
- Stores results in `./output/`

## Run Settings Section

### Basic Run Settings

```yaml
run_settings:
  # Where to store output files
  output_dir: "./output"

  # SQLite database path for historical tracking
  sqlite_db_path: "./output/watcher.db"

  # Use LLM for rank extraction (more accurate but costs more)
  use_llm_rank_extraction: false
```

### Model Configuration

Define which LLM providers and models to use:

```yaml
run_settings:
  models:
    # OpenAI configuration
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    # Anthropic configuration
    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"
```

**Key Points:**

- **provider**: Must be one of: `openai`, `anthropic`, `mistral`, `grok`, `google`, `perplexity`
- **model_name**: Specific model identifier (see [Provider Guide](../../providers/overview/))
- **env_api_key**: Environment variable name containing your API key

Model Selection

Start with cost-effective models:

- **OpenAI**: `gpt-4o-mini` ($0.15/1M input tokens)
- **Anthropic**: `claude-3-5-haiku-20241022` ($0.80/1M input tokens)
- **Mistral**: `mistral-small-latest` ($0.20/1M input tokens)
- **Grok**: `grok-2-1212` ($2.00/1M input tokens)
- **Google**: `gemini-2.0-flash-exp` (free tier available)

### Optional System Prompts

Customize the system prompt for each model:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/gpt-4-default"  # Uses built-in prompt
```

If not specified, uses the provider's default prompt.

## Brands Section

### Your Brands

Define all variations of your brand name:

```yaml
brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"
    - "YourBrand CRM"
    - "yourbrand.com"
```

**Why multiple aliases?**

LLMs might reference your brand differently:

- "HubSpot" vs "HubSpot CRM"
- "Lemwarm" vs "Lemwarm.io"
- Domain names: "acme.com"

Word Boundary Matching

Brands are matched using word boundaries. "Hub" will NOT match in "GitHub". Add specific variations if needed.

### Competitors

List all competitors to track:

```yaml
brands:
  competitors:
    - "CompetitorA"
    - "CompetitorB"
    - "IndustryTool"
    - "Alternative.io"
    - "BigPlayer CRM"
```

**Tips:**

- Include direct competitors (same category)
- Include indirect competitors (adjacent use cases)
- Use specific names, not generic terms
- Add variations for well-known competitors

### Complete Brands Example

```yaml
brands:
  mine:
    - "Lemwarm"
    - "Lemwarm.io"
    - "Lemlist"
    - "Lemlist.com"

  competitors:
    - "Instantly"
    - "Instantly.ai"
    - "Warmbox"
    - "Warmbox.ai"
    - "MailReach"
    - "HubSpot"
    - "Apollo.io"
    - "Woodpecker"
```

## Intents Section

Intents are the questions you want to ask LLMs.

### Basic Intent

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools?"
```

**Intent Structure:**

- **id**: Unique identifier (used in filenames and database)
- **prompt**: The exact question to ask the LLM

### Multiple Intents

Test different question types:

```yaml
intents:
  # Direct question
  - id: "best-email-warmup-tools"
    prompt: "What are the best email warmup tools?"

  # Comparison query
  - id: "comparison-warmup-tools"
    prompt: "Compare the top email warmup tools for improving deliverability"

  # Specific use case
  - id: "cold-outreach-tools"
    prompt: "Which email warmup tools are best for cold outreach campaigns?"

  # Alternative phrasing
  - id: "recommended-warmup-services"
    prompt: "What email warmup services do you recommend for startups?"
```

### Intent ID Best Practices

Use descriptive, URL-safe IDs:

✅ Good IDs:

- `best-crm-tools`
- `email-automation-comparison`
- `startup-friendly-options`

❌ Avoid:

- `query1` (not descriptive)
- `best CRM tools` (spaces)
- `what's-best?` (special characters)

### Crafting Effective Prompts

**Good prompts are:**

1. **Natural**: How a real user would ask
1. **Specific**: Target a particular use case or category
1. **Open-ended**: Allow for varied responses
1. **Buyer-intent**: Imply readiness to evaluate/purchase

Examples:

```yaml
intents:
  # ✅ Good: Natural buyer-intent query
  - id: "saas-analytics-tools"
    prompt: "What are the best analytics tools for SaaS companies?"

  # ✅ Good: Specific use case
  - id: "startup-crm-budget"
    prompt: "Which CRM is best for startups on a tight budget?"

  # ❌ Too broad
  - id: "software"
    prompt: "Tell me about software"

  # ❌ Not buyer-intent
  - id: "history"
    prompt: "What is the history of CRM software?"
```

## Complete Basic Configuration Example

Here's a complete, production-ready configuration:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    # Cost-effective model for regular monitoring
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    # High-quality model for comparison
    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

  # Use regex-based extraction (faster, cheaper)
  use_llm_rank_extraction: false

brands:
  mine:
    - "Lemwarm"
    - "Lemwarm.io"
    - "Lemlist"
    - "lemlist.com"

  competitors:
    - "Instantly"
    - "Instantly.ai"
    - "Warmbox"
    - "Warmbox.ai"
    - "MailReach"
    - "HubSpot"
    - "Apollo.io"
    - "Woodpecker"

intents:
  # Direct question
  - id: "best-email-warmup-tools"
    prompt: "What are the best email warmup tools?"

  # Comparison query
  - id: "warmup-tools-comparison"
    prompt: "Compare the top email warmup tools for improving email deliverability"

  # Use case specific
  - id: "cold-outreach-warmup"
    prompt: "Which email warmup tools are best for cold outreach campaigns?"

  # Budget-conscious
  - id: "affordable-warmup-tools"
    prompt: "What are the most affordable email warmup tools for startups?"
```

## Testing Your Configuration

Always validate before running:

```bash
llm-answer-watcher validate --config my-config.yaml
```

Expected output:

```text
✅ Configuration valid
├── Models: 2 configured
│   ├── openai: gpt-4o-mini
│   └── anthropic: claude-3-5-haiku-20241022
├── Brands: 4 mine, 8 competitors
├── Intents: 4 queries
└── Estimated cost: $0.016 (8 queries total)
```

## Next Steps

Now that you understand basic configuration:

- **Advanced Configuration**

  ______________________________________________________________________

  Budget controls, web search, custom operations

  [Configuration Guide →](../../user-guide/configuration/overview/)

- **Run Your Config**

  ______________________________________________________________________

  Execute monitoring with your custom configuration

  [First Run →](../first-run/)

- **Add More Providers**

  ______________________________________________________________________

  Learn about Mistral, Grok, Google, Perplexity

  [Providers →](../../providers/overview/)

- **See Examples**

  ______________________________________________________________________

  Browse organized configuration examples

  [Quickstart Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart) | [All Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples)
# Configuration

# Configuration Overview

LLM Answer Watcher uses a YAML configuration file to control all aspects of monitoring: which LLMs to query, which brands to track, what questions to ask, and how to manage costs.

## Configuration Structure

A complete configuration file has these main sections:

```yaml
run_settings:        # Output paths, models, and run behavior
extraction_settings: # Optional: advanced extraction configuration
brands:              # Your brand and competitors to track
intents:             # Questions to ask LLMs
global_operations:   # Optional: operations run for every intent
```

## Quick Start Example

Here's a minimal configuration to get started:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

brands:
  mine:
    - "MyBrand"
    - "MyBrand.io"

  competitors:
    - "CompetitorA"
    - "CompetitorB"

intents:
  - id: "best-tools"
    prompt: "What are the best tools for [your category]?"
```

Environment Variables

Set your API keys as environment variables before running:

```bash
export OPENAI_API_KEY=sk-your-key-here
export ANTHROPIC_API_KEY=sk-ant-your-key-here
```

## Configuration Sections Explained

### Run Settings

Controls where output is stored, which models to query, and runtime behavior.

**Key fields:**

- `output_dir`: Directory for run results (JSON files, HTML reports)
- `sqlite_db_path`: SQLite database path for historical tracking
- `models`: List of LLM models to query (see [Model Configuration](../models/))
- `use_llm_rank_extraction`: Use LLM to extract rankings (slower, more accurate)
- `budget`: Optional cost controls (see [Budget Configuration](../budget/))

**Example:**

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

  use_llm_rank_extraction: false

  budget:
    enabled: true
    max_per_run_usd: 1.00
    warn_threshold_usd: 0.50
```

### Extraction Settings (Optional)

Advanced configuration for brand mention and rank extraction using function calling.

**Key fields:**

- `extraction_model`: Dedicated model for extraction (faster, cheaper than main models)
- `method`: Extraction method (`function_calling`, `regex`, or `hybrid`)
- `fallback_to_regex`: Fall back to regex if function calling fails
- `min_confidence`: Minimum confidence threshold (0.0-1.0)

**Example:**

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7
```

When to Use Extraction Settings

Use extraction settings when:

- Regex extraction misses complex brand mentions
- You need higher accuracy for ranking positions
- You want to extract additional structured data

Skip it when:

- You're optimizing for cost (regex is free)
- Your brand names are simple and unambiguous
- You're running frequent monitoring jobs

### Brands

Defines which brands to track in LLM responses.

**Two categories:**

1. **mine**: Your brand aliases (at least one required)
1. **competitors**: Competitor brands to monitor

**Example:**

```yaml
brands:
  mine:
    - "Warmly"
    - "Warmly.io"
    - "Warmly AI"

  competitors:
    - "Instantly"
    - "Lemwarm"
    - "HubSpot"
    - "Apollo.io"
    - "Woodpecker"
```

Brand Alias Best Practices

- Include all variations (with/without TLD, with/without product name)
- Use word-boundary matching to avoid false positives
- Add common misspellings if relevant
- Keep list focused (10-20 competitors maximum)

See [Brand Configuration](../brands/) for detailed strategies.

### Intents

Questions you want to ask LLMs to test brand visibility.

**Key fields:**

- `id`: Unique identifier (alphanumeric, hyphens, underscores)
- `prompt`: Natural language question to ask
- `operations`: Optional post-query operations (see [Operations](../operations/))

**Example:**

```yaml
intents:
  - id: "best-email-warmup-tools"
    prompt: "What are the best email warmup tools?"

  - id: "email-warmup-comparison"
    prompt: "Compare the top email warmup tools for improving deliverability"

  - id: "hubspot-alternatives"
    prompt: "What are the best alternatives to HubSpot for small sales teams?"
```

Intent Prompt Design

Good prompts are:

- **Natural**: How a real user would ask
- **Specific**: Target a clear use case
- **Buyer-focused**: Imply purchase intent
- **Ranking-friendly**: Ask for "best" or "top" tools

Bad prompts:

- Too generic: "Tell me about CRM tools"
- No ranking signal: "What is HubSpot?"
- Biased: "Why is MyBrand better than CompetitorA?"

### Global Operations (Optional)

Operations that run for **every** intent across all models.

**Use cases:**

- Quality scoring for all LLM responses
- Sentiment analysis
- Content gap detection
- Consistent post-processing

**Example:**

```yaml
global_operations:
  - id: "quality-score"
    description: "Rate LLM response quality"
    prompt: |
      Rate this LLM response on a scale of 1-10 for accuracy and completeness:

      Question: {intent:prompt}
      Response: {intent:response}

      Provide a single number score (1-10) and brief justification.
    model: "gpt-4o-mini"
    enabled: true
```

Global vs Intent-Specific Operations

Use **global operations** for:

- Consistent quality checks
- Standard metrics across all intents
- Cost-effective batch analysis

Use **intent-specific operations** for:

- Detailed competitive analysis
- Context-specific insights
- Intent-dependent workflows

## Configuration Validation

Validate your configuration before running:

```bash
llm-answer-watcher validate --config watcher.config.yaml
```

**Common validation errors:**

1. **Missing API keys**: Environment variable not set
1. **Duplicate intent IDs**: Intent IDs must be unique
1. **Invalid provider**: Unsupported provider name
1. **Empty brand list**: At least one brand in `mine` required
1. **Invalid intent ID**: Must be alphanumeric with hyphens/underscores

Validation Output

```text
✅ Configuration valid
├── 3 intents configured
├── 2 models configured (OpenAI, Anthropic)
├── 2 brands monitored
├── 5 competitors tracked
└── Estimated cost: $0.0142 per run
```

## Configuration Best Practices

### 1. Start Small

Begin with one model and a few intents:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # Cheapest option
    env_api_key: "OPENAI_API_KEY"

intents:
  - id: "primary-intent"
    prompt: "Your most important question"
```

### 2. Use Budget Controls

Prevent unexpected costs:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 1.00
    warn_threshold_usd: 0.50
```

### 3. Keep Brand Lists Focused

Track 10-20 competitors maximum:

```yaml
brands:
  mine:
    - "YourBrand"      # Exact name
    - "YourBrand.io"   # With TLD

  competitors:
    # Top 5 direct competitors
    - "CompetitorA"
    - "CompetitorB"
    # Top 3 category leaders
    - "MarketLeader"
```

### 4. Design Intent Prompts Carefully

Ask natural questions with ranking signals:

```yaml
intents:
  # Good: Natural, specific, implies ranking
  - id: "best-crm-for-startups"
    prompt: "What are the best CRM tools for early-stage startups?"

  # Bad: Generic, no ranking signal
  - id: "crm-info"
    prompt: "Tell me about CRM software"
```

### 5. Use System Prompts

Customize model behavior with system prompts:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/gpt-4-default"  # Uses default prompt
```

System prompts are stored in `llm_answer_watcher/system_prompts/{provider}/{prompt_name}.json`.

### 6. Enable Web Search for Fresh Data

Use web search for queries needing current information:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

Web Search Costs

Web search adds (10-)25 per 1,000 calls depending on the model. See [Web Search Configuration](../web-search/).

### 7. Version Your Config

Track configuration changes with git:

```bash
git add watcher.config.yaml
git commit -m "feat: add new competitor tracking"
```

This creates an audit trail of what you were monitoring when.

## Configuration Examples

### Production Monitoring

Multi-model, budget-controlled, comprehensive tracking:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

    - provider: "perplexity"
      model_name: "sonar-pro"
      env_api_key: "PERPLEXITY_API_KEY"

  budget:
    enabled: true
    max_per_run_usd: 5.00
    warn_threshold_usd: 2.50

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7

brands:
  mine:
    - "MyBrand"
    - "MyBrand.io"

  competitors:
    - "TopCompetitor"
    - "MainRival"
    - "IndustryLeader"

intents:
  - id: "best-tools-general"
    prompt: "What are the best [category] tools?"

  - id: "best-tools-startups"
    prompt: "What are the best [category] tools for startups?"

  - id: "best-tools-enterprise"
    prompt: "What are the best [category] tools for enterprises?"
```

### Development Testing

Minimal config for fast iteration:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

  budget:
    enabled: true
    max_per_run_usd: 0.10

brands:
  mine:
    - "TestBrand"

  competitors:
    - "CompetitorA"

intents:
  - id: "test-intent"
    prompt: "What are the best tools for testing?"
```

### CI/CD Regression Testing

Automated monitoring with strict controls:

```yaml
run_settings:
  output_dir: "./ci-output"
  sqlite_db_path: "./ci-output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

  budget:
    enabled: true
    max_per_run_usd: 0.50

brands:
  mine:
    - "MyBrand"

  competitors:
    - "TopCompetitor"

intents:
  - id: "regression-test"
    prompt: "What are the best [category] tools?"
```

## Configuration File Location

LLM Answer Watcher looks for configuration in these locations (in order):

1. Path specified with `--config` flag
1. `watcher.config.yaml` in current directory
1. `~/.config/llm-answer-watcher/config.yaml`

**Best practice**: Keep config files in your project directory and specify explicitly:

```bash
llm-answer-watcher run --config watcher.config.yaml
```

## Environment Variables

### API Keys

All API keys are loaded from environment variables for security:

```bash
# OpenAI
export OPENAI_API_KEY=sk-your-key-here

# Anthropic
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Mistral
export MISTRAL_API_KEY=your-mistral-key-here

# X.AI Grok
export XAI_API_KEY=xai-your-key-here

# Google Gemini
export GOOGLE_API_KEY=AIza-your-key-here

# Perplexity
export PERPLEXITY_API_KEY=pplx-your-key-here
```

### Configuration Overrides

Override config values with environment variables:

```bash
# Override output directory
export LLM_WATCHER_OUTPUT_DIR="./custom-output"

# Override database path
export LLM_WATCHER_DB_PATH="./custom.db"

# Disable budget checks
export LLM_WATCHER_BUDGET_ENABLED=false
```

## Security Considerations

### Never Commit API Keys

Add `.env` files to `.gitignore`:

```text
# .gitignore
.env
.env.local
*.env
watcher.config.local.yaml
```

### Use Environment-Specific Configs

Create separate configs for each environment:

```text
configs/
├── watcher.config.dev.yaml      # Development
├── watcher.config.staging.yaml  # Staging
├── watcher.config.prod.yaml     # Production
```

Load the appropriate config:

```bash
llm-answer-watcher run --config configs/watcher.config.prod.yaml
```

### Rotate API Keys Regularly

Update API keys in your environment:

```bash
# Update key
export OPENAI_API_KEY=sk-new-key-here

# Verify it works
llm-answer-watcher validate --config watcher.config.yaml
```

## Troubleshooting

### Configuration Validation Fails

**Problem**: `Configuration error: Invalid YAML syntax`

**Solution**: Check YAML syntax with a validator:

```bash
python -c "import yaml; yaml.safe_load(open('watcher.config.yaml'))"
```

Common YAML errors:

- Inconsistent indentation (use 2 spaces)
- Missing colons after keys
- Unquoted strings with special characters
- Mixing tabs and spaces

______________________________________________________________________

**Problem**: `API key not found: OPENAI_API_KEY`

**Solution**: Set the environment variable:

```bash
export OPENAI_API_KEY=sk-your-key-here
```

Verify it's set:

```bash
echo $OPENAI_API_KEY
```

______________________________________________________________________

**Problem**: `Duplicate intent IDs found: best-tools`

**Solution**: Make each intent ID unique:

```yaml
intents:
  - id: "best-tools-general"      # Changed from "best-tools"
    prompt: "What are the best tools?"

  - id: "best-tools-startups"     # Changed from "best-tools"
    prompt: "What are the best tools for startups?"
```

### Output Directory Issues

**Problem**: `Cannot write to output directory: Permission denied`

**Solution**: Check directory permissions:

```bash
mkdir -p ./output
chmod 755 ./output
```

Or change to a directory you own:

```yaml
run_settings:
  output_dir: "~/llm-watcher-output"
```

______________________________________________________________________

**Problem**: `SQLite database is locked`

**Solution**: Ensure no other processes are using the database:

```bash
# Check for locks
lsof ./output/watcher.db

# Kill blocking processes if safe
kill -9 <PID>
```

Or use a separate database:

```yaml
run_settings:
  sqlite_db_path: "./output/watcher-$(date +%s).db"
```

### Model Configuration Issues

**Problem**: `Unsupported provider: openai-gpt4`

**Solution**: Use correct provider names:

```yaml
# ❌ Wrong
provider: "openai-gpt4"

# ✅ Correct
provider: "openai"
model_name: "gpt-4o-mini"
```

Supported providers: `openai`, `anthropic`, `mistral`, `grok`, `google`, `perplexity`

______________________________________________________________________

**Problem**: `Model not found: gpt-4o-mini-turbo`

**Solution**: Use valid model names:

```yaml
# ❌ Wrong (doesn't exist)
model_name: "gpt-4o-mini-turbo"

# ✅ Correct
model_name: "gpt-4o-mini"
```

Check [Model Configuration](../models/) for valid model names.

## Next Steps

Now that you understand the configuration structure, dive into specific sections:

- **[Model Configuration](../models/)**: Choose the right models for your use case
- **[Brand Configuration](../brands/)**: Optimize brand detection strategies
- **[Intent Configuration](../intents/)**: Design effective prompts
- **[Budget Configuration](../budget/)**: Control costs and prevent overruns
- **[Web Search Configuration](../web-search/)**: Enable real-time information retrieval
- **[Operations Configuration](../operations/)**: Automate post-query analysis

Or jump to usage guides:

- **[CLI Commands](../../usage/cli-commands/)**: Run your first monitoring job
- **[Output Modes](../../usage/output-modes/)**: Understand output formats
- **[Automation](../../usage/automation/)**: Set up scheduled monitoring

# Model Configuration

Model configuration controls which LLMs to query and how they're accessed. LLM Answer Watcher supports multiple providers with unified configuration.

## Supported Providers

| Provider       | Models Available                                   | Pricing             | Best For                         |
| -------------- | -------------------------------------------------- | ------------------- | -------------------------------- |
| **OpenAI**     | gpt-4o-mini, gpt-4o, gpt-4-turbo                   | (0.15-)10/1M tokens | Fast, cost-effective, production |
| **Anthropic**  | claude-3-5-haiku, claude-3-5-sonnet, claude-3-opus | (0.80-)75/1M tokens | High-quality reasoning           |
| **Mistral**    | mistral-large, mistral-medium, mistral-small       | (2-)8/1M tokens     | European compliance              |
| **X.AI Grok**  | grok-beta, grok-2-1212, grok-3                     | (2-)25/1M tokens    | Real-time X integration          |
| **Google**     | gemini-2.0-flash, gemini-1.5-pro                   | (0.075-)7/1M tokens | Multimodal, fast                 |
| **Perplexity** | sonar, sonar-pro, sonar-reasoning                  | (1-)15/1M tokens    | Web-grounded answers             |

## Basic Model Configuration

### Single Model Setup

Minimal configuration with one model:

```yaml
run_settings:
  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"
```

**Required fields:**

- `provider`: Provider name (see supported providers above)
- `model_name`: Specific model identifier
- `env_api_key`: Environment variable name containing API key

### Multi-Model Setup

Query multiple models for comparison:

```yaml
run_settings:
  models:
    # Fast and cheap
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    # High quality
    - provider: "anthropic"
      model_name: "claude-3-5-sonnet-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

    # Web-grounded
    - provider: "perplexity"
      model_name: "sonar-pro"
      env_api_key: "PERPLEXITY_API_KEY"
```

Multi-Model Benefits

Querying multiple models helps you:

- **Compare providers**: See which LLMs favor your brand
- **Reduce variance**: Average rankings across models
- **Hedge risk**: Don't depend on one provider's algorithm
- **Track trends**: Monitor provider-specific changes over time

## Provider-Specific Configuration

### OpenAI

**Supported models:**

- `gpt-4o-mini`: Fast, cheap, production-ready ((0.15/)0.60 per 1M input/output tokens)
- `gpt-4o`: High quality, balanced cost ((2.50/)10 per 1M tokens)
- `gpt-4-turbo`: Fast GPT-4, good for complex tasks ((10/)30 per 1M tokens)
- `gpt-3.5-turbo`: Legacy, very cheap ((0.50/)1.50 per 1M tokens)

**Basic configuration:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
```

**With custom system prompt:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/gpt-4-default"
```

**With web search enabled:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

Web Search Costs

OpenAI web search adds (10-)25 per 1,000 calls. See [Web Search Configuration](../web-search/).

**API key setup:**

```bash
export OPENAI_API_KEY=sk-your-openai-key-here
```

Get your API key from: [platform.openai.com/api-keys](https://platform.openai.com/api-keys)

______________________________________________________________________

### Anthropic (Claude)

**Supported models:**

- `claude-3-5-haiku-20241022`: Fast, cheap, smart ((0.80/)4 per 1M tokens)
- `claude-3-5-sonnet-20241022`: Balanced quality/cost ((3/)15 per 1M tokens)
- `claude-3-opus-20240229`: Highest quality ((15/)75 per 1M tokens)

**Basic configuration:**

```yaml
models:
  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"
    env_api_key: "ANTHROPIC_API_KEY"
```

**With custom system prompt:**

```yaml
models:
  - provider: "anthropic"
    model_name: "claude-3-5-sonnet-20241022"
    env_api_key: "ANTHROPIC_API_KEY"
    system_prompt: "anthropic/default"
```

**API key setup:**

```bash
export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
```

Get your API key from: [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)

Claude Strengths

Claude models excel at:

- **Nuanced reasoning**: Better at understanding context
- **Longer responses**: More comprehensive answers
- **Safety**: Strong content moderation
- **Instruction following**: Precise adherence to prompts

______________________________________________________________________

### Mistral

**Supported models:**

- `mistral-large-latest`: Flagship model ((2/)6 per 1M tokens)
- `mistral-medium-latest`: Balanced ((2.50/)7.50 per 1M tokens)
- `mistral-small-latest`: Fast and cheap ((0.20/)0.60 per 1M tokens)

**Basic configuration:**

```yaml
models:
  - provider: "mistral"
    model_name: "mistral-large-latest"
    env_api_key: "MISTRAL_API_KEY"
```

**API key setup:**

```bash
export MISTRAL_API_KEY=your-mistral-api-key-here
```

Get your API key from: [console.mistral.ai/api-keys](https://console.mistral.ai/api-keys)

Mistral Strengths

Mistral models are ideal for:

- **European compliance**: GDPR-friendly European provider
- **Multilingual**: Strong performance in French, German, Spanish
- **Cost efficiency**: Competitive pricing
- **Open weights**: Some models have open weights available

______________________________________________________________________

### X.AI (Grok)

**Supported models:**

- `grok-beta`: Beta access model ((2/)10 per 1M tokens)
- `grok-2-1212`: Latest stable version ((2/)10 per 1M tokens)
- `grok-2-latest`: Always latest version ((2/)10 per 1M tokens)
- `grok-3`: Next-generation model ((5/)25 per 1M tokens)
- `grok-3-mini`: Fast, lightweight ((2/)8 per 1M tokens)

**Basic configuration:**

```yaml
models:
  - provider: "grok"
    model_name: "grok-2-1212"
    env_api_key: "XAI_API_KEY"
```

**API key setup:**

```bash
export XAI_API_KEY=xai-your-grok-key-here
```

Get your API key from: [console.x.ai](https://console.x.ai)

Grok Strengths

Grok models offer:

- **X platform integration**: Real-time data from X (Twitter)
- **OpenAI compatibility**: Drop-in replacement for OpenAI API
- **Current events**: Up-to-date information
- **Humor**: Unique personality in responses

______________________________________________________________________

### Google (Gemini)

**Supported models:**

| Model                   | Cost (Input/Output) | Grounding       | Best For                     |
| ----------------------- | ------------------- | --------------- | ---------------------------- |
| `gemini-2.5-flash`      | (0.04/)0.12 per 1M  | ✅ Yes          | **Recommended** - production |
| `gemini-2.5-flash-lite` | (0.02/)0.06 per 1M  | ❌ No           | High-volume, non-grounded    |
| `gemini-2.5-pro`        | (0.60/)1.80 per 1M  | ✅ Yes          | Highest quality              |
| `gemini-2.0-flash-exp`  | (0.075/)0.30 per 1M | ⚠️ Experimental | Testing                      |
| `gemini-1.5-pro`        | (1.25/)5 per 1M     | ❌ No           | Legacy (not recommended)     |

**Basic configuration** (without grounding):

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash-lite"
    env_api_key: "GEMINI_API_KEY"
```

**With Google Search grounding** (recommended for brand monitoring):

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"
    tools:
      - google_search: {}  # Enable Google Search
```

**API key setup:**

```bash
export GEMINI_API_KEY=AIza-your-google-api-key-here
```

Get your API key from: [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)

Gemini Strengths

Gemini models excel at:

- **Google Search grounding**: Real-time web data with no per-request fees
- **Speed**: Very fast inference
- **Cost**: Most cost-effective for web-grounded queries
- **Multimodal**: Built for text, image, video, audio
- **Long context**: Up to 2M token context window

Configuration Format Difference

Google uses `google_search: {}` (dictionary format) while OpenAI uses `type: "web_search"` (typed format). This reflects different provider API specifications. See [Google provider docs](../../../providers/google/) for details.

______________________________________________________________________

### Perplexity

**Supported models:**

- `sonar`: Fast, web-grounded ((1/)1 per 1M tokens + request fees)
- `sonar-pro`: High-quality grounded ((3/)15 per 1M tokens + request fees)
- `sonar-reasoning`: Enhanced reasoning ((1/)5 per 1M tokens + request fees)
- `sonar-reasoning-pro`: Best reasoning ((3/)15 per 1M tokens + request fees)
- `sonar-deep-research`: In-depth research ((3/)15 per 1M tokens + request fees)

**Basic configuration:**

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

**API key setup:**

```bash
export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here
```

Get your API key from: [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api)

Perplexity Request Fees

Perplexity charges additional **request fees** based on search context:

- Basic searches: ~$0.005 per request
- Complex searches: ~(0.01-)0.03 per request

These fees are **not yet included** in cost estimates. Budget accordingly.

Perplexity Strengths

Perplexity models offer:

- **Web grounding**: All answers cite web sources
- **Fresh data**: Real-time web search
- **Citations**: Transparent source attribution
- **Research mode**: Deep-dive analysis

## Advanced Model Configuration

### Custom System Prompts

System prompts customize model behavior. LLM Answer Watcher includes default prompts for each provider.

**Using default provider prompt:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    # Uses openai/default.json automatically
```

**Using custom prompt:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"
```

**Prompt file structure:**

System prompts are stored in `llm_answer_watcher/system_prompts/{provider}/{prompt_name}.json`:

```json
{
  "role": "system",
  "content": "You are a helpful assistant that provides accurate, comprehensive answers to user questions about software tools and services. When asked for recommendations, provide a balanced view of multiple options with their strengths and weaknesses."
}
```

**Creating custom prompts:**

1. Create a new prompt file in the provider directory
1. Reference it in your configuration
1. Test with validation:

```bash
llm-answer-watcher validate --config watcher.config.yaml
```

System Prompt Best Practices

- **Be specific**: Clear instructions produce better results
- **Stay neutral**: Don't bias toward your brand
- **Request structure**: Ask for ranked lists, numbered items
- **Test variations**: Try different prompts, measure impact

### Temperature and Sampling

Control response randomness (some providers only):

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    temperature: 0.7  # 0.0 = deterministic, 1.0 = creative
    top_p: 0.9        # Nucleus sampling
```

Temperature Guide

- **0.0-0.3**: Deterministic, consistent answers (recommended for monitoring)
- **0.4-0.7**: Balanced creativity and consistency
- **0.8-1.0**: Creative, varied responses (not recommended for tracking)

### Max Tokens

Limit response length:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    max_tokens: 1000  # Limit to ~750 words
```

Max Tokens and Cost

Setting `max_tokens` limits output cost but may truncate responses. For monitoring, allow enough tokens for complete answers (500-2000 recommended).

### Tools and Function Calling

Enable tools like web search:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"  # or "required", "none"
```

**Tool choice options:**

- `auto`: Model decides when to use tools (recommended)
- `required`: Model must use tools for every query
- `none`: Disable tools for this query

See [Web Search Configuration](../web-search/) for details.

## Model Selection Strategies

### Cost-Optimized

Minimize costs with cheap models:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # $0.15/$0.60 per 1M tokens
    env_api_key: "OPENAI_API_KEY"

  - provider: "google"
    model_name: "gemini-2.0-flash-exp"  # $0.075/$0.30 per 1M tokens
    env_api_key: "GOOGLE_API_KEY"
```

**Estimated cost per run** (3 intents): ~(0.003-)0.005

**Use when:**

- Running frequent monitoring (hourly/daily)
- Testing configuration changes
- Limited budget
- High query volume

### Quality-Optimized

Best accuracy with premium models:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o"  # $2.50/$10 per 1M tokens
    env_api_key: "OPENAI_API_KEY"

  - provider: "anthropic"
    model_name: "claude-3-5-sonnet-20241022"  # $3/$15 per 1M tokens
    env_api_key: "ANTHROPIC_API_KEY"
```

**Estimated cost per run** (3 intents): ~(0.05-)0.10

**Use when:**

- Weekly/monthly executive reports
- Competitive intelligence deep-dives
- High-stakes positioning decisions
- Complex queries requiring reasoning

### Balanced

Mix of cost and quality:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # Fast, cheap baseline
    env_api_key: "OPENAI_API_KEY"

  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"  # Quality check
    env_api_key: "ANTHROPIC_API_KEY"

  - provider: "perplexity"
    model_name: "sonar-pro"  # Web-grounded
    env_api_key: "PERPLEXITY_API_KEY"
```

**Estimated cost per run** (3 intents): ~(0.02-)0.04

**Use when:**

- Regular monitoring (daily/weekly)
- Comparing provider perspectives
- Balanced budget
- Production use cases

### Fresh Data

Web-grounded models for current information:

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"

  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

**Use when:**

- Monitoring recent product launches
- Tracking current events impact
- Detecting real-time ranking changes
- Competitive news monitoring

### Regional Compliance

Models for specific regulatory requirements:

```yaml
models:
  # European providers for GDPR
  - provider: "mistral"
    model_name: "mistral-large-latest"
    env_api_key: "MISTRAL_API_KEY"

  # Baseline comparison
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
```

**Use when:**

- GDPR compliance required
- Data residency requirements
- Regional preference testing

## Model Pricing Comparison

Current pricing as of November 2024:

| Model             | Input (per 1M tokens) | Output (per 1M tokens) | Cost per Query\* |
| ----------------- | --------------------- | ---------------------- | ---------------- |
| gpt-4o-mini       | $0.15                 | $0.60                  | $0.0004          |
| gpt-4o            | $2.50                 | $10.00                 | $0.0056          |
| claude-3-5-haiku  | $0.80                 | $4.00                  | $0.0022          |
| claude-3-5-sonnet | $3.00                 | $15.00                 | $0.0090          |
| mistral-large     | $2.00                 | $6.00                  | $0.0040          |
| grok-2-1212       | $2.00                 | $10.00                 | $0.0054          |
| gemini-2.0-flash  | $0.075                | $0.30                  | $0.0002          |
| sonar-pro         | $3.00                 | $15.00                 | $0.0090\*\*      |

\* Assumes ~150 input tokens + ~500 output tokens per query \*\* Plus request fees (~(0.005-)0.03 per query)

Dynamic Pricing

LLM Answer Watcher automatically loads current pricing from [llm-prices.com](https://www.llm-prices.com) with 24-hour caching. Prices may change.

Check current pricing:

```bash
llm-answer-watcher prices show
```

## Extraction Model Configuration

Use a dedicated model for extraction (faster, cheaper than querying main models):

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"  # Fast, cheap model
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7
```

**Benefits:**

- **Cost savings**: Use cheap model for extraction
- **Speed**: Fast models for quick parsing
- **Separation**: Main models for quality, extraction model for structure
- **Accuracy**: Function calling more accurate than regex

**Recommended extraction models:**

- `gpt-4o-mini`: Best balance of speed, cost, accuracy
- `gpt-4.1-nano`: Ultra-fast, ultra-cheap (OpenAI only)
- `gemini-2.0-flash-exp`: Very fast, very cheap
- `claude-3-5-haiku-20241022`: High accuracy, reasonable cost

See [Function Calling](../../features/function-calling/) for details.

## Multi-Model Comparison Strategies

### A/B Testing

Compare two providers:

```yaml
models:
  # Variant A: OpenAI
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  # Variant B: Anthropic
  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"
    env_api_key: "ANTHROPIC_API_KEY"
```

Analyze results:

```sql
-- Compare brand mentions by provider
SELECT
    model_provider,
    COUNT(*) as total_mentions,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY model_provider;
```

### Provider Diversity

Query multiple providers for comprehensive coverage:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"
    env_api_key: "ANTHROPIC_API_KEY"

  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"

  - provider: "google"
    model_name: "gemini-2.0-flash-exp"
    env_api_key: "GOOGLE_API_KEY"
```

**Benefits:**

- Reduce algorithm dependence
- Hedge against provider changes
- Capture diverse perspectives
- Build comprehensive dataset

### Model Size Comparison

Compare model sizes within a provider:

```yaml
models:
  # Small model
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  # Large model
  - provider: "openai"
    model_name: "gpt-4o"
    env_api_key: "OPENAI_API_KEY"
```

Analyze cost vs. quality trade-offs:

```sql
-- Compare cost and mention rates by model
SELECT
    model_name,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost_per_query,
    SUM(CASE WHEN brand IN (SELECT * FROM mine_brands) THEN 1 ELSE 0 END) as my_brand_mentions
FROM answers_raw
GROUP BY model_name;
```

## Troubleshooting

### API Key Issues

**Problem**: `API key not found: OPENAI_API_KEY`

**Solution**: Set the environment variable:

```bash
export OPENAI_API_KEY=sk-your-key-here
```

Verify:

```bash
echo $OPENAI_API_KEY
llm-answer-watcher validate --config watcher.config.yaml
```

______________________________________________________________________

**Problem**: `Invalid API key for provider openai`

**Solution**: Check API key format and validity:

```bash
# Test with curl (OpenAI)
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# Test with curl (Anthropic)
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"
```

Get a new API key from your provider's console.

### Model Not Found

**Problem**: `Model not found: gpt-4-mini`

**Solution**: Use correct model name:

```yaml
# ❌ Wrong (doesn't exist)
model_name: "gpt-4-mini"

# ✅ Correct
model_name: "gpt-4o-mini"
```

Check [provider documentation](https://platform.openai.com/docs/models) for valid models.

### Rate Limiting

**Problem**: `Rate limit exceeded for provider openai`

**Solution**: LLM Answer Watcher automatically retries with exponential backoff. If persistent:

1. Upgrade to higher rate limits (pay-as-you-go tier)
1. Reduce concurrent queries
1. Add delays between queries:

```yaml
run_settings:
  rate_limit_delay_seconds: 1  # Delay between queries
```

### Cost Overruns

**Problem**: Unexpected high costs

**Solution**: Enable budget controls:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 1.00
    warn_threshold_usd: 0.50
```

Check estimated costs before running:

```bash
llm-answer-watcher run --config watcher.config.yaml --dry-run
```

See [Budget Configuration](../budget/) for details.

## Best Practices

### 1. Start with One Model

Begin with a single cheap model:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
```

Validate your configuration, then expand to multiple models.

### 2. Use Cost-Optimized Models for Frequent Runs

Daily/hourly monitoring:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # ~$0.0004 per query
    env_api_key: "OPENAI_API_KEY"
```

Weekly reports:

```yaml
models:
  - provider: "anthropic"
    model_name: "claude-3-5-sonnet-20241022"  # ~$0.009 per query
    env_api_key: "ANTHROPIC_API_KEY"
```

### 3. Enable Web Search for Fresh Data

When tracking current events:

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

Or:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

### 4. Separate Extraction Models

Use dedicated model for extraction:

```yaml
# Main models for quality answers
run_settings:
  models:
    - provider: "anthropic"
      model_name: "claude-3-5-sonnet-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

# Cheap model for extraction
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
  method: "function_calling"
```

### 5. Version Control Model Configs

Track model changes in git:

```bash
git add watcher.config.yaml
git commit -m "feat: add Claude 3.5 Sonnet for quality comparison"
```

This creates an audit trail of which models you were using when.

### 6. Monitor Provider Changes

Providers update models frequently. Subscribe to:

- [OpenAI Blog](https://openai.com/blog)
- [Anthropic Blog](https://www.anthropic.com/news)
- [Mistral Announcements](https://mistral.ai/news)
- [Google AI Blog](https://ai.google/blog)

Update your config when new models release.

### 7. Test Before Production

Validate new model configurations:

```bash
# Dry run to check costs
llm-answer-watcher run --config watcher.config.yaml --dry-run

# Validate configuration
llm-answer-watcher validate --config watcher.config.yaml

# Test with single intent
llm-answer-watcher run --config watcher.config.yaml --intent best-tools
```

## Next Steps

- **[Brand Configuration](../brands/)**: Optimize brand detection
- **[Intent Configuration](../intents/)**: Design effective prompts
- **[Budget Configuration](../budget/)**: Control costs
- **[Web Search Configuration](../web-search/)**: Enable real-time data
- **[Cost Management](../../features/cost-management/)**: Track spending

# Brand Configuration

Brand configuration defines which brands to track in LLM responses. Proper brand configuration is critical for accurate mention detection and false-positive prevention.

## Brand Categories

LLM Answer Watcher tracks two categories of brands:

### Mine

Your brand(s) that you want to monitor. **At least one required.**

```yaml
brands:
  mine:
    - "MyBrand"
    - "MyBrand.io"
    - "MyBrand CRM"
```

### Competitors

Competitor brands you want to track for comparison.

```yaml
brands:
  competitors:
    - "CompetitorA"
    - "CompetitorB"
    - "MarketLeader"
```

## Basic Brand Configuration

### Minimal Example

Simplest configuration with single brand:

```yaml
brands:
  mine:
    - "Warmly"

  competitors:
    - "Instantly"
    - "Lemwarm"
```

### Comprehensive Example

Full configuration with aliases:

```yaml
brands:
  mine:
    - "Warmly"          # Base name
    - "Warmly.io"       # With TLD
    - "Warmly AI"       # With product descriptor

  competitors:
    # Direct competitors
    - "Instantly"
    - "Lemwarm"
    - "Smartlead"

    # Indirect competitors
    - "HubSpot"
    - "Salesforce"

    # Category leaders
    - "Apollo.io"
```

How Many Competitors?

**Recommended**: 5-15 competitors

- **Too few**: Miss important context
- **Too many**: Dilutes focus, increases noise

Focus on competitors that:

- Directly compete for the same customers
- Appear frequently in buyer comparisons
- Represent different market segments

## Brand Alias Strategies

### Why Use Aliases?

LLMs may refer to your brand in different ways:

- With/without TLD: "Warmly" vs "Warmly.io"
- With/without product name: "HubSpot" vs "HubSpot CRM"
- Common variations: "Salesforce" vs "SFDC"
- Capitalization: "GitHub" vs "Github"

### Alias Best Practices

**Include common variations:**

```yaml
brands:
  mine:
    - "GitHub"
    - "Github"        # Common misspelling
    - "GitHub.com"
    - "GitHub Actions" # Product line
```

**TLD variations:**

```yaml
brands:
  mine:
    - "Stripe"
    - "Stripe.com"
    - "stripe.io"     # If you own it
```

**Product family variations:**

```yaml
brands:
  mine:
    - "HubSpot"
    - "HubSpot CRM"
    - "HubSpot Marketing Hub"
    - "HubSpot Sales Hub"
```

**Abbreviations and acronyms:**

```yaml
brands:
  mine:
    - "Salesforce"
    - "SFDC"          # Common abbreviation
    - "Salesforce.com"
```

Avoid Over-Aliasing

Don't include:

- Generic terms: "CRM" (too broad)
- Common words: "Hub" (false positives)
- Competitor names: Track separately in competitors list

### Case Sensitivity

Brand matching is **case-insensitive** by default:

```yaml
brands:
  mine:
    - "GitHub"  # Matches: GitHub, github, GITHUB, GiTHuB
```

You only need one capitalization variant:

```yaml
# ❌ Redundant
brands:
  mine:
    - "GitHub"
    - "github"
    - "GITHUB"

# ✅ Sufficient
brands:
  mine:
    - "GitHub"
```

## Word-Boundary Matching

LLM Answer Watcher uses **word-boundary regex** to prevent false positives.

### How It Works

Word boundaries (`\b`) ensure brands match only as complete words:

```python
pattern = r'\b' + re.escape(brand_alias) + r'\b'
```

**Examples:**

| Text                   | Brand Alias | Matches? | Reason           |
| ---------------------- | ----------- | -------- | ---------------- |
| "Use HubSpot daily"    | "HubSpot"   | ✅ Yes   | Complete word    |
| "GitHub and HubSpot"   | "HubSpot"   | ✅ Yes   | Complete word    |
| "Hubspot is great"     | "HubSpot"   | ✅ Yes   | Case-insensitive |
| "Use hub for projects" | "Hub"       | ✅ Yes   | Complete word    |
| "GitHub has features"  | "Hub"       | ❌ No    | Inside "GitHub"  |
| "rehub your content"   | "Hub"       | ❌ No    | Inside "rehub"   |

### Why Word Boundaries Matter

**Without word boundaries** (naive substring matching):

```yaml
brands:
  mine:
    - "Hub"  # ❌ BAD: Matches "GitHub", "HubSpot", "rehub", etc.
```

**With word boundaries** (LLM Answer Watcher default):

```yaml
brands:
  mine:
    - "Hub"  # ✅ GOOD: Only matches "Hub" as complete word
```

Special Characters

Word boundaries work with special characters:

- `Apollo.io` matches "Apollo.io" but not "Apolloio"
- `Slack-Bot` matches "Slack-Bot" but not "SlackBot"

### Testing Word Boundaries

Test your brand aliases:

```python
import re

def test_brand_match(text: str, brand: str) -> bool:
    pattern = r'\b' + re.escape(brand) + r'\b'
    return bool(re.search(pattern, text, re.IGNORECASE))

# Test cases
print(test_brand_match("Use HubSpot daily", "HubSpot"))  # True
print(test_brand_match("GitHub and GitLab", "Git"))      # False
```

## Brand Normalization

Brands are normalized for deduplication and analysis.

### Normalization Process

1. **Case folding**: Convert to lowercase
1. **TLD removal**: Strip `.com`, `.io`, etc.
1. **Whitespace normalization**: Collapse multiple spaces
1. **Punctuation handling**: Preserve hyphens, remove others

**Examples:**

| Original         | Normalized       | Rationale         |
| ---------------- | ---------------- | ----------------- |
| "HubSpot"        | "hubspot"        | Lowercase         |
| "HubSpot.com"    | "hubspot"        | TLD removed       |
| "Apollo.io"      | "apollo"         | TLD removed       |
| "Slack Bot"      | "slackbot"       | Spaces collapsed  |
| "GitHub-Actions" | "github-actions" | Hyphens preserved |

### Why Normalization Matters

Prevents duplicate counting:

```yaml
brands:
  mine:
    - "Warmly"
    - "Warmly.io"
```

LLM response: "I recommend Warmly and Warmly.io for outreach."

**Without normalization**: 2 mentions counted **With normalization**: 1 mention counted (both normalize to "warmly")

### Normalized Name in Database

The SQLite database stores both:

- `brand`: Original matched text
- `normalized_name`: Normalized version for deduplication

```sql
SELECT brand, normalized_name, COUNT(*) as mentions
FROM mentions
WHERE run_id = '2025-11-01T08-00-00Z'
GROUP BY normalized_name;
```

## Competitor Selection Strategies

### Direct Competitors

Brands solving the same problem for the same audience:

```yaml
brands:
  competitors:
    # Email warmup tools (if you're Warmly)
    - "Instantly"
    - "Lemwarm"
    - "Smartlead"
    - "Woodpecker"
```

### Indirect Competitors

Brands in adjacent categories:

```yaml
brands:
  competitors:
    # If you're an email warmup tool
    - "HubSpot"        # Full sales platform
    - "Apollo.io"      # Sales intelligence
    - "Salesforce"     # Enterprise CRM
```

### Category Leaders

Market-defining brands to benchmark against:

```yaml
brands:
  competitors:
    # Category leaders (if you're a startup CRM)
    - "Salesforce"     # Enterprise standard
    - "HubSpot"        # SMB leader
    - "Pipedrive"      # Sales-focused
```

### Segment-Specific Competitors

Brands targeting different segments:

```yaml
brands:
  competitors:
    # Startup segment
    - "Attio"
    - "Folk"

    # SMB segment
    - "Pipedrive"
    - "Copper"

    # Enterprise segment
    - "Salesforce"
    - "Microsoft Dynamics"
```

## Brand Configuration Patterns

### Single Product Company

Simple brand with variations:

```yaml
brands:
  mine:
    - "MyProduct"
    - "MyProduct.io"
    - "MyProduct.com"

  competitors:
    - "CompetitorA"
    - "CompetitorB"
    - "CompetitorC"
```

### Multi-Product Company

Track different product lines:

```yaml
brands:
  mine:
    - "MyCompany"
    - "MyCompany CRM"
    - "MyCompany Marketing"
    - "MyCompany Sales Hub"

  competitors:
    # CRM competitors
    - "Salesforce"
    - "HubSpot"

    # Marketing automation competitors
    - "Marketo"
    - "Pardot"
```

### Parent Company + Subsidiaries

Track corporate structure:

```yaml
brands:
  mine:
    - "ParentCo"
    - "ProductA"       # Subsidiary
    - "ProductB"       # Subsidiary

  competitors:
    - "CompetitorCorp"
    - "CompetitorProduct"
```

### Rebranded Company

Track both old and new names:

```yaml
brands:
  mine:
    - "NewBrand"       # Current name
    - "OldBrand"       # Legacy name (still in training data)
    - "NewBrand.io"

  competitors:
    - "Competitor"
```

### Regional Variations

Track region-specific brands:

```yaml
brands:
  mine:
    - "MyBrand"        # Global
    - "MyBrand US"
    - "MyBrand EU"

  competitors:
    - "GlobalCompetitor"
    - "USCompetitor"
    - "EUCompetitor"
```

## Advanced Brand Configuration

### Fuzzy Matching

Enable fuzzy matching for misspellings (optional):

```yaml
extraction_settings:
  fuzzy_matching:
    enabled: true
    threshold: 0.9     # Similarity threshold (0.0-1.0)

brands:
  mine:
    - "Warmly"         # Also matches: "Warmley", "Warmlly"
```

Fuzzy Matching Trade-offs

**Pros:**

- Catches misspellings
- More comprehensive tracking

**Cons:**

- Higher false-positive rate
- Slower extraction
- May match unrelated words

**Recommended threshold**: 0.9 (very strict)

### Brand Exclusions

Exclude certain patterns (advanced):

```yaml
brands:
  mine:
    - "Apple"

  exclusions:
    - "apple pie"      # Don't match "apple" in "apple pie"
    - "apple juice"
```

Exclusions Not Yet Implemented

This feature is planned for a future release. Currently, use word boundaries to minimize false positives.

### Brand Categories

Group brands by category (for analysis):

```yaml
brands:
  mine:
    - "MyBrand"

  competitors:
    # Tag with category (custom metadata)
    - name: "CompetitorA"
      category: "direct"

    - name: "CompetitorB"
      category: "direct"

    - name: "MarketLeader"
      category: "aspirational"
```

Categories Not Yet Implemented

This feature is planned for a future release. Currently, track categories externally.

## Brand Mention Analysis

### Viewing Mentions

Query SQLite database:

```sql
-- All mentions for a run
SELECT brand, COUNT(*) as mentions
FROM mentions
WHERE run_id = '2025-11-01T08-00-00Z'
GROUP BY normalized_name
ORDER BY mentions DESC;
```

```sql
-- My brand mentions over time
SELECT DATE(timestamp_utc) as date, COUNT(*) as mentions
FROM mentions
WHERE normalized_name = 'mybrand'
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

```sql
-- Competitor comparison
SELECT
    brand,
    COUNT(*) as total_mentions,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE run_id = '2025-11-01T08-00-00Z'
GROUP BY normalized_name
ORDER BY avg_rank ASC;
```

### Mention Metrics

Key metrics to track:

- **Mention rate**: % of queries where brand appears
- **Average rank**: Mean position in ranked lists
- **Top-3 rate**: % of mentions in top 3
- **Share of voice**: Your mentions / total mentions

Calculate in SQL:

```sql
-- Mention rate
SELECT
    (COUNT(DISTINCT CASE WHEN normalized_name = 'mybrand' THEN intent_id END) * 100.0 / COUNT(DISTINCT intent_id)) as mention_rate
FROM mentions
WHERE run_id = '2025-11-01T08-00-00Z';
```

```sql
-- Top-3 rate
SELECT
    (SUM(CASE WHEN rank_position <= 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) as top3_rate
FROM mentions
WHERE normalized_name = 'mybrand'
  AND run_id = '2025-11-01T08-00-00Z';
```

## Validation and Testing

### Validate Brand Configuration

Check for common issues:

```bash
llm-answer-watcher validate --config watcher.config.yaml
```

**Validation checks:**

- At least one brand in `mine` list
- No empty brand aliases
- No duplicate aliases (warning)
- Brand aliases >= 3 characters (warning)

### Test Brand Matching

Test your brands against sample text:

```bash
# Create test file
echo "I recommend HubSpot, Salesforce, and Warmly for CRM." > test.txt

# Test matching (hypothetical command)
llm-answer-watcher test-brands --config watcher.config.yaml --text test.txt
```

Expected output:

```text
✅ Found 3 brand mentions:
   - HubSpot (competitor, position 16)
   - Salesforce (competitor, position 26)
   - Warmly (mine, position 42)
```

### Common Validation Errors

**Error**: `At least one brand required in 'mine'`

```yaml
# ❌ Wrong
brands:
  mine: []

# ✅ Correct
brands:
  mine:
    - "MyBrand"
```

______________________________________________________________________

**Error**: `Brand alias too short: "io"`

```yaml
# ❌ Warning (high false-positive risk)
brands:
  mine:
    - "io"

# ✅ Better
brands:
  mine:
    - "MyBrand.io"
```

______________________________________________________________________

**Warning**: `Duplicate brand alias: "HubSpot"`

```yaml
# ❌ Redundant
brands:
  mine:
    - "HubSpot"
  competitors:
    - "HubSpot"  # Same brand in both categories!

# ✅ Correct
brands:
  mine:
    - "MyBrand"
  competitors:
    - "HubSpot"
```

## Best Practices

### 1. Start with Core Brand Names

Begin with unambiguous brand names:

```yaml
brands:
  mine:
    - "Warmly"       # Clear, unambiguous

  competitors:
    - "Instantly"
    - "Lemwarm"
```

### 2. Add TLD Variations Gradually

Monitor results, then add TLDs if needed:

```yaml
# Week 1: Start simple
brands:
  mine:
    - "Warmly"

# Week 2: Add TLD after seeing LLM responses
brands:
  mine:
    - "Warmly"
    - "Warmly.io"
```

### 3. Use Specific Names, Not Generic Terms

```yaml
# ❌ Bad (too generic)
brands:
  mine:
    - "CRM"
    - "Email"
    - "Sales Tool"

# ✅ Good (specific)
brands:
  mine:
    - "Warmly CRM"
    - "Warmly Email"
```

### 4. Track 10-15 Competitors Maximum

Focus on key competitors:

```yaml
brands:
  competitors:
    # Top 5 direct competitors
    - "DirectA"
    - "DirectB"
    - "DirectC"
    - "DirectD"
    - "DirectE"

    # Top 3 category leaders
    - "LeaderA"
    - "LeaderB"
    - "LeaderC"
```

### 5. Review Mentions Regularly

Check for unexpected matches:

```sql
-- Find unexpected brand mentions
SELECT brand, answer_text
FROM mentions
JOIN answers_raw USING (run_id, intent_id)
WHERE normalized_name = 'mybrand'
  AND run_id = '2025-11-01T08-00-00Z';
```

Look for false positives or missing variations.

### 6. Version Brand Lists

Track brand list changes:

```bash
git add watcher.config.yaml
git commit -m "feat: add HubSpot as competitor"
```

### 7. Test Before Production

Validate brand configuration:

```bash
llm-answer-watcher validate --config watcher.config.yaml
llm-answer-watcher run --config watcher.config.yaml --dry-run
```

## Troubleshooting

### False Positives

**Problem**: Brand matches where it shouldn't

**Example**: "Hub" matches in "GitHub"

**Solution**: Use more specific aliases:

```yaml
# ❌ Too generic
brands:
  mine:
    - "Hub"

# ✅ More specific
brands:
  mine:
    - "MyHub"
    - "MyHub.io"
```

### False Negatives

**Problem**: Brand doesn't match when it should

**Example**: LLM says "Warmly.ai" but you only track "Warmly.io"

**Solution**: Add missing variation:

```yaml
brands:
  mine:
    - "Warmly"
    - "Warmly.io"
    - "Warmly.ai"    # Add missing TLD
```

### Duplicate Counting

**Problem**: Same brand counted multiple times

**Example**: "Warmly" and "Warmly.io" counted separately

**Solution**: This is expected! Normalization prevents duplicates in analysis:

```sql
-- Use normalized_name for deduplication
SELECT normalized_name, COUNT(*) as mentions
FROM mentions
GROUP BY normalized_name;

-- Use brand to see exact matches
SELECT brand, COUNT(*) as raw_mentions
FROM mentions
GROUP BY brand;
```

### Brand Not Found

**Problem**: Brand not detected in LLM response

**Possible causes:**

1. **LLM didn't mention it**: Check raw response
1. **Misspelling**: Add variations or enable fuzzy matching
1. **Different phrasing**: LLM used different name

**Debug:**

```sql
-- Check raw response
SELECT answer_text
FROM answers_raw
WHERE run_id = '2025-11-01T08-00-00Z'
  AND intent_id = 'best-tools';
```

Look for how LLM referred to your brand.

## Next Steps

- **[Intent Configuration](../intents/)**: Design prompts that surface your brand
- **[Rank Extraction](../../features/rank-extraction/)**: Understand how ranking works
- **[Brand Detection](../../features/brand-detection/)**: Deep dive into detection algorithms
- **[Historical Tracking](../../features/historical-tracking/)**: Analyze brand trends over time

# Intent Configuration

Intents are the questions you ask LLMs to test brand visibility. Well-designed intents produce actionable insights about how LLMs recommend your brand versus competitors.

## What is an Intent?

An **intent** represents a buyer-journey question that prospects might ask an LLM when researching solutions.

**Examples:**

- "What are the best CRM tools for startups?"
- "Compare HubSpot vs Salesforce for small teams"
- "How do I improve email deliverability?"

## Basic Intent Configuration

### Minimal Intent

Simplest intent with required fields:

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best tools for my category?"
```

**Required fields:**

- `id`: Unique identifier (alphanumeric, hyphens, underscores)
- `prompt`: Natural language question

### Multiple Intents

Test different buyer scenarios:

```yaml
intents:
  - id: "best-tools-general"
    prompt: "What are the best email warmup tools?"

  - id: "best-tools-startups"
    prompt: "What are the best email warmup tools for startups?"

  - id: "comparison-with-competitor"
    prompt: "Compare Instantly vs Warmly for email warmup"
```

How Many Intents?

**Recommended**: 3-10 intents

- **Too few**: Limited coverage of buyer journey
- **Too many**: High costs, slow execution

Focus on intents that represent actual buyer questions.

## Intent Design Principles

### 1. Natural Language

Write prompts as real users would ask:

```yaml
# ✅ Good: Natural question
intents:
  - id: "best-crm-startups"
    prompt: "What's the best CRM for early-stage startups?"

# ❌ Bad: Unnatural phrasing
intents:
  - id: "crm-query"
    prompt: "List CRM software products ranked by quality for startup segment"
```

### 2. Buyer-Focused

Imply purchase intent:

```yaml
# ✅ Good: Clear purchase intent
intents:
  - id: "best-email-tools"
    prompt: "What are the best email warmup tools to buy?"

# ❌ Bad: Informational query
intents:
  - id: "email-info"
    prompt: "What is email warming?"
```

### 3. Ranking-Friendly

Ask for ranked or ordered lists:

```yaml
# ✅ Good: Implies ranking
intents:
  - id: "top-tools"
    prompt: "What are the top 5 email warmup tools?"

# ❌ Bad: No ranking signal
intents:
  - id: "tools-info"
    prompt: "Tell me about email warmup tools"
```

### 4. Specific Use Cases

Target specific scenarios:

```yaml
# ✅ Good: Specific use case
intents:
  - id: "best-for-cold-email"
    prompt: "What are the best email warmup tools for cold outreach campaigns?"

# ❌ Bad: Too generic
intents:
  - id: "email-tools"
    prompt: "What are email tools?"
```

## Intent Patterns

### Category Leadership

Test if your brand is considered a category leader:

```yaml
intents:
  - id: "best-in-category"
    prompt: "What are the best [category] tools?"

  - id: "top-choices"
    prompt: "What are the top [category] platforms?"

  - id: "leading-solutions"
    prompt: "What are the leading [category] solutions?"
```

### Segment-Specific

Target different customer segments:

```yaml
intents:
  # Startup segment
  - id: "best-for-startups"
    prompt: "What are the best CRM tools for early-stage startups?"

  # SMB segment
  - id: "best-for-smb"
    prompt: "What are the best CRM tools for small businesses?"

  # Enterprise segment
  - id: "best-for-enterprise"
    prompt: "What are the best CRM tools for large enterprises?"
```

### Use-Case Specific

Target specific jobs-to-be-done:

```yaml
intents:
  - id: "improve-deliverability"
    prompt: "What tools can help me improve email deliverability?"

  - id: "warm-cold-emails"
    prompt: "How can I warm up my email domain for cold outreach?"

  - id: "avoid-spam"
    prompt: "What tools help me avoid the spam folder?"
```

### Competitive Comparison

Test head-to-head comparisons:

```yaml
intents:
  - id: "vs-main-competitor"
    prompt: "Compare [YourBrand] vs [MainCompetitor] for [use case]"

  - id: "alternatives-to-competitor"
    prompt: "What are the best alternatives to [Competitor]?"

  - id: "hubspot-replacement"
    prompt: "What's the best replacement for HubSpot for small teams?"
```

### Problem-Solution

Frame around customer pain points:

```yaml
intents:
  - id: "solve-deliverability"
    prompt: "My emails are going to spam. What tools can help?"

  - id: "improve-open-rates"
    prompt: "How can I improve my email open rates?"

  - id: "scale-outreach"
    prompt: "What tools help me scale cold email outreach?"
```

### Buying Journey Stages

Target different stages:

```yaml
intents:
  # Awareness: "What is...?"
  - id: "awareness"
    prompt: "What is email warmup and why do I need it?"

  # Consideration: "What are the options?"
  - id: "consideration"
    prompt: "What are the best email warmup tools?"

  # Decision: "Which should I choose?"
  - id: "decision"
    prompt: "Should I use Warmly or Instantly for email warmup?"
```

## Advanced Intent Configuration

### Intent with Operations

Run custom operations after each query:

```yaml
intents:
  - id: "best-email-tools"
    prompt: "What are the best email warmup tools?"

    operations:
      - id: "content-gaps"
        description: "Identify content opportunities"
        prompt: |
          Analyze this LLM response and identify content gaps that could improve our ranking.

          My brand: {brand:mine}
          Current rank: {rank:mine}
          Response: {intent:response}

          Provide 3 specific content recommendations.
        model: "gpt-4o-mini"

      - id: "competitor-analysis"
        description: "Extract competitor strengths"
        prompt: |
          What strengths are mentioned for each competitor?

          Competitors: {competitors:mentioned}
          Response: {intent:response}
        model: "gpt-4o-mini"
```

See [Operations Configuration](../operations/) for details.

### Intent with Dependencies

Chain operations with dependencies:

```yaml
intents:
  - id: "best-crm-tools"
    prompt: "What are the best CRM tools for startups?"

    operations:
      - id: "extract-features"
        description: "Extract features mentioned"
        prompt: "Extract features mentioned for each tool: {intent:response}"
        model: "gpt-4o-mini"

      - id: "gap-analysis"
        description: "Identify feature gaps"
        prompt: |
          Based on these features: {operation:extract-features}

          What features are missing from {brand:mine} compared to competitors?
        depends_on: ["extract-features"]
        model: "gpt-4o-mini"
```

### Intent with Custom Metadata

Add metadata for analysis (future feature):

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools?"
    metadata:
      stage: "consideration"
      priority: "high"
      segment: "smb"
```

Metadata Not Yet Implemented

Custom metadata is planned for a future release.

## Intent ID Naming Conventions

Intent IDs must be:

- Unique across the configuration
- Alphanumeric with hyphens and underscores
- Descriptive and readable

**Good intent IDs:**

```yaml
intents:
  - id: "best-email-warmup-tools"
  - id: "hubspot-alternatives-smb"
  - id: "improve-deliverability-2025"
  - id: "warmly-vs-instantly"
```

**Bad intent IDs:**

```yaml
# ❌ Not descriptive
intents:
  - id: "intent1"
  - id: "test"
  - id: "query"

# ❌ Invalid characters
intents:
  - id: "best tools"        # Space not allowed
  - id: "best-tools!"       # Special char not allowed
  - id: "best/tools"        # Slash not allowed
```

Intent ID Best Practices

- Use descriptive names that explain the intent
- Include segment/use-case in ID if relevant
- Use hyphens for readability: `best-crm-for-startups`
- Keep under 50 characters
- Avoid special characters except `-` and `_`

## Prompt Engineering for Intents

### Effective Prompt Patterns

**Pattern 1: Top N Format**

```yaml
intents:
  - id: "top-5-crm"
    prompt: "What are the top 5 CRM tools for startups in 2025?"
```

Benefits:

- Clear ranking expectation
- Limited scope (5 items)
- Time-bound (2025)

______________________________________________________________________

**Pattern 2: Use-Case Specific**

```yaml
intents:
  - id: "best-for-cold-email"
    prompt: "What are the best email warmup tools specifically for cold email campaigns?"
```

Benefits:

- Targets specific use case
- Filters out generic responses
- Relevant to your positioning

______________________________________________________________________

**Pattern 3: Comparison**

```yaml
intents:
  - id: "compare-top-tools"
    prompt: "Compare the top email warmup tools for improving deliverability"
```

Benefits:

- Encourages detailed analysis
- Shows relative positioning
- Highlights differentiators

______________________________________________________________________

**Pattern 4: Problem-Oriented**

```yaml
intents:
  - id: "solve-spam-problem"
    prompt: "My sales emails are going to spam. What tools can help me fix this?"
```

Benefits:

- Natural buyer question
- Solution-focused
- Real pain point

______________________________________________________________________

**Pattern 5: Segment-Specific**

```yaml
intents:
  - id: "best-for-startups"
    prompt: "What's the best CRM for a 10-person startup with limited budget?"
```

Benefits:

- Targets specific segment
- Includes constraints (budget)
- Realistic buyer scenario

### Prompt Length

**Recommended**: 10-30 words

```yaml
# ✅ Good: Clear and concise
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools for cold outreach in 2025?"

# ❌ Too short: Lacks context
intents:
  - id: "tools"
    prompt: "Email tools?"

# ❌ Too long: Overly specific
intents:
  - id: "detailed-query"
    prompt: "I am a sales development representative at a B2B SaaS startup with 5 SDRs sending approximately 500 cold emails per day and we're experiencing deliverability issues with 40% of our emails going to spam, what are the absolute best email warmup tools that can help us improve our domain reputation and inbox placement rate while being cost-effective for a startup budget?"
```

### Time-Bounding Prompts

Include year for current recommendations:

```yaml
# ✅ Good: Time-bound
intents:
  - id: "best-tools-2025"
    prompt: "What are the best email warmup tools in 2025?"

# ⚠️ Generic: May return outdated info
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools?"
```

Training Data Cutoff

Most LLMs have training data cutoffs (e.g., October 2023 for GPT-4). Time-bounding may not help unless:

- Using web search-enabled models
- Using Perplexity (real-time web search)
- Using models with recent training data

### Neutral vs. Biased Prompts

**Neutral prompts** (recommended):

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools?"
```

**Biased prompts** (avoid):

```yaml
# ❌ Biased toward your brand
intents:
  - id: "why-warmly-best"
    prompt: "Why is Warmly the best email warmup tool?"

# ❌ Biased against competitor
intents:
  - id: "hubspot-problems"
    prompt: "What are the problems with HubSpot?"
```

Neutral prompts give you realistic brand positioning data.

## Intent Validation

### Validate Intent Configuration

Check for common issues:

```bash
llm-answer-watcher validate --config watcher.config.yaml
```

**Validation checks:**

- At least one intent configured
- Intent IDs are unique
- Intent IDs are valid (alphanumeric, hyphens, underscores)
- Prompts are non-empty
- Prompts are at least 10 characters

### Common Validation Errors

**Error**: `At least one intent must be configured`

```yaml
# ❌ Wrong
intents: []

# ✅ Correct
intents:
  - id: "best-tools"
    prompt: "What are the best tools?"
```

______________________________________________________________________

**Error**: `Duplicate intent IDs found: best-tools`

```yaml
# ❌ Wrong
intents:
  - id: "best-tools"
    prompt: "What are the best CRM tools?"

  - id: "best-tools"  # Duplicate!
    prompt: "What are the best email tools?"

# ✅ Correct
intents:
  - id: "best-crm-tools"
    prompt: "What are the best CRM tools?"

  - id: "best-email-tools"
    prompt: "What are the best email tools?"
```

______________________________________________________________________

**Error**: `Intent ID must be alphanumeric with hyphens/underscores: best tools!`

```yaml
# ❌ Wrong (space and special char)
intents:
  - id: "best tools!"
    prompt: "What are the best tools?"

# ✅ Correct
intents:
  - id: "best-tools"
    prompt: "What are the best tools?"
```

## Intent Organization Strategies

### By Buyer Journey Stage

Organize intents by funnel stage:

```yaml
intents:
  # Awareness stage
  - id: "awareness-what-is-email-warmup"
    prompt: "What is email warmup and why is it important?"

  - id: "awareness-deliverability-problems"
    prompt: "Why are my emails going to spam?"

  # Consideration stage
  - id: "consideration-best-tools"
    prompt: "What are the best email warmup tools?"

  - id: "consideration-tool-comparison"
    prompt: "Compare the top email warmup platforms"

  # Decision stage
  - id: "decision-warmly-vs-instantly"
    prompt: "Should I use Warmly or Instantly?"

  - id: "decision-pricing"
    prompt: "What's the most cost-effective email warmup tool?"
```

### By Customer Segment

Organize intents by target segment:

```yaml
intents:
  # Startup segment
  - id: "startup-best-crm"
    prompt: "What's the best CRM for early-stage startups?"

  - id: "startup-affordable-tools"
    prompt: "What are affordable CRM options for startups?"

  # SMB segment
  - id: "smb-best-crm"
    prompt: "What's the best CRM for small businesses?"

  - id: "smb-easy-setup"
    prompt: "What's the easiest CRM to set up for a 20-person team?"

  # Enterprise segment
  - id: "enterprise-best-crm"
    prompt: "What's the best enterprise CRM platform?"

  - id: "enterprise-scalable"
    prompt: "What CRM platforms scale to 1000+ users?"
```

### By Use Case

Organize intents by jobs-to-be-done:

```yaml
intents:
  # Use case: Cold email
  - id: "cold-email-best-tools"
    prompt: "What are the best tools for cold email outreach?"

  - id: "cold-email-deliverability"
    prompt: "How can I improve cold email deliverability?"

  # Use case: Account-based sales
  - id: "abs-best-tools"
    prompt: "What are the best tools for account-based sales?"

  - id: "abs-personalization"
    prompt: "What tools help personalize outreach at scale?"

  # Use case: Lead nurturing
  - id: "nurture-best-tools"
    prompt: "What are the best tools for lead nurturing?"
```

### By Competitor

Track competitive positioning:

```yaml
intents:
  # vs. Main Competitor
  - id: "vs-instantly"
    prompt: "Compare Warmly vs Instantly for email warmup"

  - id: "alternatives-to-instantly"
    prompt: "What are the best alternatives to Instantly?"

  # vs. Market Leader
  - id: "vs-hubspot"
    prompt: "Compare Warmly vs HubSpot for sales outreach"

  - id: "alternatives-to-hubspot"
    prompt: "What are the best alternatives to HubSpot for startups?"
```

## Testing Intent Prompts

### Manual Testing

Test prompts with ChatGPT/Claude before adding:

1. Ask the prompt directly
1. Check if response includes ranked lists
1. Verify brand mentions
1. Adjust prompt as needed

### A/B Testing Intents

Compare prompt variations:

```yaml
intents:
  # Variation A: Generic
  - id: "best-tools-generic"
    prompt: "What are the best email warmup tools?"

  # Variation B: Specific
  - id: "best-tools-specific"
    prompt: "What are the best email warmup tools for cold outreach in 2025?"
```

Compare results to see which prompt surfaces your brand better.

### Iteration Process

1. **Start broad**: Test generic prompts
1. **Analyze results**: Check brand mention rates
1. **Refine prompts**: Add specificity where needed
1. **Test again**: Compare refined vs. original
1. **Keep winners**: Use prompts with best brand visibility

## Intent Metrics

Track intent performance:

```sql
-- Mention rate by intent
SELECT
    intent_id,
    COUNT(DISTINCT run_id) as runs,
    SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') THEN 1 ELSE 0 END) as my_brand_mentions,
    (SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT run_id)) as mention_rate
FROM mentions
GROUP BY intent_id
ORDER BY mention_rate DESC;
```

```sql
-- Average rank by intent
SELECT
    intent_id,
    AVG(rank_position) as avg_rank,
    MIN(rank_position) as best_rank,
    COUNT(*) as total_mentions
FROM mentions
WHERE normalized_name IN ('mybrand', 'mybrand.io')
GROUP BY intent_id
ORDER BY avg_rank ASC;
```

```sql
-- Top-performing intents
SELECT
    intent_id,
    COUNT(*) as queries,
    SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') AND rank_position <= 3 THEN 1 ELSE 0 END) as top3_mentions,
    (SUM(CASE WHEN normalized_name IN ('mybrand', 'mybrand.io') AND rank_position <= 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) as top3_rate
FROM mentions
GROUP BY intent_id
ORDER BY top3_rate DESC;
```

## Best Practices

### 1. Start with 3-5 Core Intents

Begin with essential buyer questions:

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best [category] tools?"

  - id: "best-for-startups"
    prompt: "What's the best [category] tool for startups?"

  - id: "vs-main-competitor"
    prompt: "Compare [YourBrand] vs [MainCompetitor]"
```

### 2. Test Prompts Manually First

Before adding to config, test with ChatGPT/Claude:

- Does it produce ranked lists?
- Does it mention your brand?
- Is the response format consistent?

### 3. Use Natural Language

Write prompts as real users would ask:

```yaml
# ✅ Good
intents:
  - id: "improve-deliverability"
    prompt: "How can I improve my email deliverability?"

# ❌ Bad
intents:
  - id: "deliverability"
    prompt: "EMAIL_DELIVERABILITY_TOOLS_QUERY"
```

### 4. Include Ranking Signals

Ask for "best", "top", or "recommended":

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best email warmup tools?"  # "best" = ranking signal

  - id: "top-tools"
    prompt: "What are the top 5 CRM platforms?"  # "top 5" = ranking signal
```

### 5. Version Control Intents

Track intent changes with git:

```bash
git add watcher.config.yaml
git commit -m "feat: add cold-email intent for startup segment"
```

### 6. Monitor Intent Performance

Review which intents surface your brand:

```sql
SELECT intent_id, COUNT(*) as my_brand_mentions
FROM mentions
WHERE normalized_name = 'mybrand'
GROUP BY intent_id
ORDER BY my_brand_mentions DESC;
```

Focus on high-performing intents, retire low-performers.

### 7. Update Prompts Based on Results

Iterate on prompts:

```yaml
# Original (low brand mentions)
- id: "best-tools"
  prompt: "What are email tools?"

# Improved (higher brand mentions)
- id: "best-tools"
  prompt: "What are the best email warmup tools for cold outreach?"
```

## Troubleshooting

### Brand Not Mentioned

**Problem**: Your brand doesn't appear in LLM responses

**Possible causes:**

1. **Generic prompt**: Too broad, LLM focuses on market leaders
1. **Wrong segment**: Prompt targets different customer segment
1. **Outdated training data**: LLM trained before your brand existed

**Solutions:**

- Make prompt more specific to your use case
- Target your niche/segment explicitly
- Use web search-enabled models for fresh data

______________________________________________________________________

### Inconsistent Responses

**Problem**: Different responses for same intent across runs

**Cause**: LLM non-determinism (temperature > 0)

**Solution**: Use lower temperature for consistency:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    temperature: 0.0  # Deterministic
```

______________________________________________________________________

### No Ranked Lists

**Problem**: LLM doesn't provide ranked lists

**Cause**: Prompt doesn't request ranking

**Solution**: Add ranking signal:

```yaml
# ❌ Before
- id: "tools"
  prompt: "Tell me about email warmup tools"

# ✅ After
- id: "top-tools"
  prompt: "What are the top 5 email warmup tools ranked by quality?"
```

## Next Steps

- **[Brand Configuration](../brands/)**: Optimize brand detection
- **[Operations Configuration](../operations/)**: Automate post-query analysis
- **[Rank Extraction](../../features/rank-extraction/)**: Understand ranking detection
- **[HTML Reports](../../features/html-reports/)**: Visualize intent results

# Budget Configuration

Budget controls prevent runaway costs by setting spending limits before execution starts. LLM Answer Watcher validates estimated costs against your budget and aborts if limits would be exceeded.

## Why Budget Controls?

LLM API costs can add up quickly:

- **Testing**: Multiple intents × multiple models = high query volume
- **Mistakes**: Accidental loops or configuration errors
- **Provider changes**: Pricing updates or model changes
- **Experimentation**: Trying new configurations without cost awareness

Budget controls ensure you never spend more than intended.

## Basic Budget Configuration

### Enabling Budget Controls

Add a `budget` section to `run_settings`:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

  budget:
    enabled: true
    max_per_run_usd: 1.00        # Hard limit: abort if total > $1.00
    max_per_intent_usd: 0.10     # Hard limit: abort if any intent > $0.10
    warn_threshold_usd: 0.50     # Warning: log if total > $0.50 (but continue)
```

### Disabling Budget Controls

For unlimited spending:

```yaml
run_settings:
  budget:
    enabled: false  # No cost limits
```

Disabled Budgets

Only disable budgets when:

- You fully understand costs
- Running production monitoring with known costs
- Budget controls interfere with automation

**Recommendation**: Keep budgets enabled even in production.

## Budget Parameters

### `enabled` (boolean)

Enable or disable budget enforcement.

```yaml
budget:
  enabled: true  # Enforce budget limits
```

**Default**: `false` (budgets disabled)

**Recommended**: `true` for all use cases

______________________________________________________________________

### `max_per_run_usd` (float)

Maximum total cost per run (all intents × all models).

```yaml
budget:
  max_per_run_usd: 1.00  # Abort if total estimated cost > $1.00
```

**Calculation**:

```text
max_per_run = (num_intents × num_models) × avg_cost_per_query
```

**Example**:

- 3 intents × 2 models = 6 queries
- Average cost: $0.005 per query
- Total estimated cost: $0.03
- Budget limit: $1.00
- ✅ **Result**: Run proceeds

______________________________________________________________________

### `max_per_intent_usd` (float)

Maximum cost per single intent (across all models).

```yaml
budget:
  max_per_intent_usd: 0.10  # Abort if any single intent > $0.10
```

**Calculation**:

```text
max_per_intent = num_models × avg_cost_per_query
```

**Example**:

- 3 models for one intent
- Average cost: $0.005 per query
- Intent cost: $0.015
- Budget limit: $0.10
- ✅ **Result**: Intent proceeds

**Use case**: Prevent expensive intents with long prompts or web search.

______________________________________________________________________

### `warn_threshold_usd` (float)

Warning threshold (logs warning but continues).

```yaml
budget:
  warn_threshold_usd: 0.50  # Log warning if total > $0.50
```

**Behavior**:

- If `estimated_cost <= warn_threshold`: Silent execution
- If `warn_threshold < estimated_cost <= max_per_run`: Log warning, continue
- If `estimated_cost > max_per_run`: Abort execution

**Example output**:

```text
⚠️  Cost warning: Estimated run cost $0.75 exceeds warning threshold of $0.50
   Budget limit: $1.00 (OK to proceed)
   Run will execute 12 queries across 3 intents and 4 models.
```

## Budget Configuration Patterns

### Development / Testing

Strict limits for experimentation:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 0.10         # Very low limit
    max_per_intent_usd: 0.05      # Catch expensive intents early
    warn_threshold_usd: 0.05      # Warn at same level as max
```

**Use when:**

- Testing configuration changes
- Developing new intents
- Running frequent test runs
- Learning the tool

______________________________________________________________________

### Production Monitoring

Balanced limits for regular monitoring:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 5.00         # Reasonable daily limit
    max_per_intent_usd: 0.50      # Prevent runaway intent costs
    warn_threshold_usd: 2.50      # Alert if > $2.50
```

**Use when:**

- Daily/weekly monitoring
- Established configuration
- Known cost profile
- Production use

______________________________________________________________________

### CI/CD Pipelines

Conservative limits for automated runs:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 0.50         # Low limit for automated runs
    max_per_intent_usd: 0.10
    warn_threshold_usd: 0.25
```

**Use when:**

- Automated testing
- Pull request checks
- Continuous monitoring
- High-frequency runs

______________________________________________________________________

### Executive Reports

Higher limits for comprehensive analysis:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 25.00        # Higher limit for quality models
    max_per_intent_usd: 2.00
    warn_threshold_usd: 10.00
```

**Use when:**

- Monthly executive reports
- Using premium models (GPT-4, Claude Opus)
- Comprehensive competitive analysis
- Deep-dive research

______________________________________________________________________

### Warning-Only Mode

Logs warnings but never aborts:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 999999.99    # Effectively unlimited
    max_per_intent_usd: 999999.99
    warn_threshold_usd: 1.00      # But warn at $1
```

**Use when:**

- Production monitoring with known costs
- Don't want aborts to break automation
- Still want cost visibility

Use with Caution

This defeats the purpose of budget controls. Only use when you fully understand cost implications.

## Cost Estimation

LLM Answer Watcher estimates costs **before** execution using:

### Estimation Formula

```python
estimated_cost = (
    (input_tokens × input_price_per_token) +
    (output_tokens × output_price_per_token)
) × safety_buffer
```

**Parameters:**

- `input_tokens`: Estimated from prompt length (~150 tokens)
- `output_tokens`: Estimated average response (~500 tokens)
- `input_price_per_token`: From llm-prices.com (cached 24h)
- `output_price_per_token`: From llm-prices.com (cached 24h)
- `safety_buffer`: 1.2 (20% buffer for variance)

### Estimation Accuracy

Cost estimates are **approximate**:

- **Actual costs**: May vary ±20% from estimates
- **Factors affecting accuracy**:
- Prompt length (longer = higher input cost)
- Response length (varies by model and prompt)
- Web search usage (adds (10-)25 per 1k calls)
- Function calling (may increase token usage)

Estimation Accuracy

Estimates are conservative (tend to overestimate). Actual costs are typically 10-20% lower than estimated.

### Checking Estimated Costs

**Before running:**

```bash
llm-answer-watcher run --config watcher.config.yaml --dry-run
```

Output:

```text
💰 Cost Estimation:
   ├── OpenAI gpt-4o-mini: $0.0004 per query × 3 intents = $0.0012
   ├── Anthropic claude-3-5-haiku: $0.0022 per query × 3 intents = $0.0066
   ├── Safety buffer (20%): +$0.0016
   └── Total estimated cost: $0.0094

✅ Budget check passed:
   ├── Estimated cost: $0.0094
   ├── Budget limit: $1.00
   └── Remaining budget: $0.9906
```

**After running:**

Check actual costs in `run_meta.json`:

```json
{
  "run_id": "2025-11-01T08-00-00Z",
  "total_cost_usd": 0.0087,
  "estimated_cost_usd": 0.0094,
  "cost_accuracy": 92.6
}
```

## Dynamic Pricing

LLM Answer Watcher automatically loads current pricing from [llm-prices.com](https://www.llm-prices.com).

### How Dynamic Pricing Works

1. **On first run**: Fetch pricing from llm-prices.com
1. **Cache for 24 hours**: Store in `~/.cache/llm-answer-watcher/pricing.json`
1. **Auto-refresh**: Re-fetch after 24 hours
1. **Fallback**: Use hardcoded prices if API unavailable

### Viewing Current Pricing

```bash
# Show all models
llm-answer-watcher prices show

# Show specific provider
llm-answer-watcher prices show --provider openai

# Show specific model
llm-answer-watcher prices show --model gpt-4o-mini

# Export as JSON
llm-answer-watcher prices list --format json
```

Example output:

```text
💰 Current LLM Pricing (as of 2025-11-01):

OpenAI:
  gpt-4o-mini:
    Input:  $0.15 per 1M tokens
    Output: $0.60 per 1M tokens

  gpt-4o:
    Input:  $2.50 per 1M tokens
    Output: $10.00 per 1M tokens

Anthropic:
  claude-3-5-haiku-20241022:
    Input:  $0.80 per 1M tokens
    Output: $4.00 per 1M tokens
```

### Forcing Pricing Refresh

```bash
# Force refresh (ignore cache)
llm-answer-watcher prices refresh --force

# Verify pricing updated
llm-answer-watcher prices show
```

### Pricing Cache Location

- **Linux/Mac**: `~/.cache/llm-answer-watcher/pricing.json`
- **Windows**: `%LOCALAPPDATA%/llm-answer-watcher/pricing.json`

Clear cache:

```bash
rm ~/.cache/llm-answer-watcher/pricing.json
```

## Web Search Costs

Web search adds additional costs beyond token usage.

### OpenAI Web Search Pricing

| Model Tier                 | Cost per 1,000 Calls | Content Tokens  |
| -------------------------- | -------------------- | --------------- |
| Standard (all models)      | $10                  | @ model rate    |
| gpt-4o-mini, gpt-4.1-mini  | $10                  | Fixed 8k tokens |
| Preview reasoning (o1, o3) | $10                  | @ model rate    |
| Preview non-reasoning      | $25                  | **FREE**        |

### Web Search Cost Calculation

```python
# Standard model
web_search_cost = (
    (num_searches × $0.01) +  # $10 per 1k calls
    (search_tokens × input_price_per_token)
)

# Mini models (fixed 8k tokens)
web_search_cost = (
    (num_searches × $0.01) +
    (8000 × input_price_per_token)
)
```

### Estimating Web Search Costs

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
```

**Estimated cost per query** (with web search):

- Base query: $0.0004 (tokens)
- Web search call: $0.01 (per call)
- Web search content: $0.0012 (8k tokens @ $0.15/1M)
- **Total**: ~$0.0116 per query

See [Web Search Configuration](../web-search/) for details.

### Perplexity Request Fees

Perplexity charges **request fees** in addition to token costs:

- Basic searches: ~$0.005 per request
- Complex searches: ~(0.01-)0.03 per request

Perplexity Costs Not Fully Estimated

Request fees are **not yet included** in cost estimates. Budget accordingly when using Perplexity:

```yaml
budget:
  max_per_run_usd: 2.00  # Higher buffer for Perplexity
```

## Budget Enforcement Behavior

### Pre-Execution Validation

Budget validation happens **before** any LLM calls:

1. Load configuration
1. Estimate total cost
1. Check against budget limits
1. If budget exceeded: **Abort immediately**
1. If budget OK: Proceed with execution

**No LLM calls are made if budget would be exceeded.**

### Abort on Budget Exceeded

When budget is exceeded:

```bash
llm-answer-watcher run --config watcher.config.yaml
```

Output:

```text
❌ Budget exceeded: Estimated run cost $1.25 exceeds max_per_run_usd budget of $1.00.

   Run would execute 12 queries:
   ├── 3 intents × 4 models = 12 queries
   ├── Estimated cost: $1.25
   ├── Budget limit: $1.00
   └── Overage: $0.25

   Options:
   1. Reduce number of models or intents
   2. Increase budget limit in watcher.config.yaml
   3. Use --force to override budget (not recommended)
```

**Exit code**: `1` (configuration error)

### Force Override

Override budget limits (use with caution):

```bash
llm-answer-watcher run --config watcher.config.yaml --force
```

Output:

```text
⚠️  Budget check OVERRIDDEN with --force flag

   Estimated cost: $1.25
   Budget limit: $1.00
   Overage: $0.25

   Proceeding anyway...
```

Force Override

Only use `--force` when:

- You understand exact costs
- Budget limit is incorrect
- Emergency production run

**Never** use `--force` in automated scripts.

### Warning Threshold Behavior

When cost exceeds warning threshold (but not max):

```bash
llm-answer-watcher run --config watcher.config.yaml
```

Output:

```text
⚠️  Cost warning: Estimated run cost $0.75 exceeds warning threshold of $0.50

   ├── Estimated cost: $0.75
   ├── Warning threshold: $0.50
   ├── Budget limit: $1.00
   └── Status: OK to proceed

   Run will execute 12 queries. Continue? [Y/n]
```

**Behavior**:

- In human mode: Prompt for confirmation
- With `--yes` flag: Continue automatically
- In agent mode: Continue automatically (warning logged)

## Cost Tracking

### Per-Run Cost Summary

After each run, check `run_meta.json`:

```json
{
  "run_id": "2025-11-01T08-00-00Z",
  "timestamp_utc": "2025-11-01T08:00:00Z",
  "total_cost_usd": 0.0142,
  "estimated_cost_usd": 0.0168,
  "cost_accuracy_percent": 84.5,
  "queries_completed": 6,
  "queries_failed": 0,
  "cost_by_provider": {
    "openai": 0.0048,
    "anthropic": 0.0094
  },
  "cost_by_model": {
    "gpt-4o-mini": 0.0048,
    "claude-3-5-haiku-20241022": 0.0094
  }
}
```

### Historical Cost Analysis

Query SQLite database:

```sql
-- Total spending
SELECT SUM(total_cost_usd) as total_spent
FROM runs;

-- Spending by week
SELECT
    strftime('%Y-W%W', timestamp_utc) as week,
    SUM(total_cost_usd) as weekly_cost,
    COUNT(*) as runs
FROM runs
GROUP BY week
ORDER BY week DESC;

-- Spending by model
SELECT
    model_name,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost_per_query
FROM answers_raw
GROUP BY model_name
ORDER BY total_cost DESC;

-- Spending by intent
SELECT
    intent_id,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost_per_query
FROM answers_raw
GROUP BY intent_id
ORDER BY total_cost DESC;
```

### Monthly Budget Tracking

Track spending vs. monthly budget:

```sql
-- Current month spending
SELECT SUM(total_cost_usd) as month_to_date
FROM runs
WHERE strftime('%Y-%m', timestamp_utc) = strftime('%Y-%m', 'now');

-- Monthly trend
SELECT
    strftime('%Y-%m', timestamp_utc) as month,
    SUM(total_cost_usd) as monthly_cost,
    COUNT(*) as runs,
    AVG(total_cost_usd) as avg_cost_per_run
FROM runs
GROUP BY month
ORDER BY month DESC;
```

## Best Practices

### 1. Always Enable Budgets

Even in production:

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 10.00  # Reasonable safety limit
```

### 2. Set Conservative Limits

Start low, increase as needed:

```yaml
# Week 1: Very conservative
budget:
  max_per_run_usd: 0.10

# Week 2: Based on actual usage
budget:
  max_per_run_usd: 0.50

# Production: 2x observed average
budget:
  max_per_run_usd: 1.00
```

### 3. Use Warning Thresholds

Get alerts before hitting limits:

```yaml
budget:
  max_per_run_usd: 1.00
  warn_threshold_usd: 0.50  # Alert at 50% of limit
```

### 4. Separate Budgets by Environment

Different limits for different environments:

```yaml
# dev.config.yaml
run_settings:
  budget:
    max_per_run_usd: 0.10

# prod.config.yaml
run_settings:
  budget:
    max_per_run_usd: 5.00
```

### 5. Monitor Actual vs. Estimated Costs

Track estimation accuracy:

```sql
SELECT
    AVG(total_cost_usd / estimated_cost_usd) as avg_accuracy,
    MIN(total_cost_usd / estimated_cost_usd) as min_accuracy,
    MAX(total_cost_usd / estimated_cost_usd) as max_accuracy
FROM runs
WHERE estimated_cost_usd > 0;
```

Adjust safety buffer if needed.

### 6. Account for Web Search Costs

Budget higher when using web search:

```yaml
# Without web search
budget:
  max_per_run_usd: 0.50

# With web search (10x higher)
budget:
  max_per_run_usd: 5.00
```

### 7. Use Dry Runs

Check costs before running:

```bash
llm-answer-watcher run --config watcher.config.yaml --dry-run
```

## Troubleshooting

### Budget Always Exceeded

**Problem**: Every run exceeds budget

**Possible causes:**

1. Too many intents or models
1. Budget limit too low
1. Expensive models (GPT-4, Claude Opus)
1. Web search enabled

**Solutions:**

```yaml
# Reduce intents
intents:
  - id: "primary-intent"
    prompt: "Most important question"

# Reduce models
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # Cheapest option

# Increase budget
budget:
  max_per_run_usd: 2.00  # Higher limit

# Disable web search
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    # Remove tools section
```

______________________________________________________________________

### Estimated Costs Inaccurate

**Problem**: Actual costs differ significantly from estimates

**Possible causes:**

1. Longer/shorter responses than expected
1. Web search usage not estimated correctly
1. Pricing data outdated
1. Function calling adds tokens

**Solutions:**

```bash
# Refresh pricing
llm-answer-watcher prices refresh --force

# Check estimation accuracy
# (in run_meta.json after run)
cat output/2025-11-01T08-00-00Z/run_meta.json | jq '.cost_accuracy_percent'

# Adjust safety buffer if needed (future feature)
```

______________________________________________________________________

### Budget Blocks Valid Runs

**Problem**: Budget blocks run that should be allowed

**Cause**: Budget limit too conservative

**Solution**: Increase limit based on historical data:

```sql
-- Check average run cost
SELECT AVG(total_cost_usd) as avg_cost, MAX(total_cost_usd) as max_cost
FROM runs;
```

Set budget to 2x average or 1.5x max:

```yaml
budget:
  max_per_run_usd: 0.50  # 2x average of $0.25
```

## Next Steps

- **[Cost Management](../../features/cost-management/)**: Deep dive into cost tracking
- **[Web Search Configuration](../web-search/)**: Understand web search costs
- **[Model Configuration](../models/)**: Choose cost-effective models
- **[Automation](../../usage/automation/)**: Budget controls in CI/CD

# Web Search Configuration

Web search enables LLMs to access real-time information from the web, providing current data beyond their training cutoff dates. This is crucial for monitoring brand visibility in fresh, up-to-date LLM responses.

## Why Use Web Search?

### Benefits

**Fresh Data**: Access information after LLM training cutoff

- Track recent product launches
- Monitor current competitive landscape
- Detect real-time ranking changes
- Capture latest industry trends

**Accurate Information**: Grounded in current web sources

- Real-time pricing and features
- Current company positioning
- Latest product updates
- Active competitor status

**Citations**: Transparent source attribution (Perplexity)

- See exactly which sources LLMs used
- Verify information accuracy
- Understand ranking drivers
- Track source patterns

### Trade-offs

**Higher Costs**: Web search adds significant costs

- OpenAI: +(10-)25 per 1,000 calls
- Perplexity: +(0.005-)0.03 per request
- 10-30x cost increase vs. base queries

**Slower Responses**: Web search takes longer

- Base query: ~1-2 seconds
- With web search: ~3-10 seconds
- May impact automation pipelines

**Variability**: Results can change frequently

- Web content changes constantly
- Less reproducible than static responses
- Harder to track trends

## Supported Providers

### OpenAI Web Search

OpenAI offers web search through the Responses API with the `web_search` tool.

**Configuration**:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

**How it works**:

1. LLM receives user prompt
1. Decides whether to use web search (if `tool_choice: auto`)
1. Searches the web if needed
1. Incorporates search results into response
1. Returns answer with web context

**Pricing** (per 1,000 calls):

| Model Tier                 | Cost | Content Tokens  |
| -------------------------- | ---- | --------------- |
| Standard (all models)      | $10  | @ model rate    |
| gpt-4o-mini, gpt-4.1-mini  | $10  | Fixed 8k tokens |
| Preview reasoning (o1, o3) | $10  | @ model rate    |
| Preview non-reasoning      | $25  | **FREE**        |

______________________________________________________________________

### Perplexity (Native Web Search)

Perplexity models have built-in web search - no configuration needed.

**Configuration**:

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

**How it works**:

1. Every query automatically searches the web
1. LLM synthesizes answer from sources
1. Returns response with citations
1. Provides source URLs for verification

**Models**:

- `sonar`: Fast, web-grounded ((1/)1 per 1M tokens + request fees)
- `sonar-pro`: High-quality grounded ((3/)15 per 1M tokens + request fees)
- `sonar-reasoning`: Enhanced reasoning ((1/)5 per 1M tokens + request fees)
- `sonar-deep-research`: In-depth analysis ((3/)15 per 1M tokens + request fees)

**Pricing**: Token costs + request fees (~(0.005-)0.03 per request)

Perplexity Request Fees

Request fees are **not yet included** in cost estimates. Budget accordingly.

______________________________________________________________________

### Google Search Grounding

Google Gemini models support Google Search grounding, which enables the LLM to search the web and ground responses in current information.

**Configuration**:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"  # Recommended
    tools:
      - google_search: {}  # Enable Google Search
```

**How it works**:

1. LLM receives user prompt
1. Gemini automatically decides if search is needed
1. Performs Google Search if beneficial
1. Grounds response in search results
1. Returns answer with grounding metadata

**Models**:

- `gemini-2.0-flash-lite`: Not supported (no grounding)
- `gemini-2.0-flash-exp`: Supported (experimental)
- `gemini-2.5-flash`: Supported (best for grounding)
- `gemini-2.5-flash-lite`: Not supported
- `gemini-2.5-pro`: Supported (highest quality)

**Pricing**: Base model token costs (no additional fees for grounding)

Configuration Format Difference

Google uses `google_search: {}` (dictionary format) while OpenAI uses `type: "web_search"` (typed format). This reflects different provider API specifications. See [detailed configuration](#google-search-grounding-configuration) below.

## OpenAI Web Search Configuration

### Basic Configuration

Enable web search with automatic activation:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"  # Let model decide
```

### Tool Choice Options

Control when web search is used:

**`auto` (Recommended)**: Model decides when to search

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

**Use when**: You want LLM to determine if fresh data is needed.

______________________________________________________________________

**`required`**: Force web search for every query

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "required"
```

**Use when**: You always want current information.

**Warning**: Significantly increases costs (every query uses web search).

______________________________________________________________________

**`none`**: Disable web search for specific queries

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    # No tools specified - web search disabled
```

**Use when**: Training data is sufficient, cost optimization priority.

### Comparing With and Without Web Search

Test impact of web search:

```yaml
models:
  # With web search
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"

  # Without web search (control)
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    # No tools
```

Compare results to see web search impact on brand visibility.

### Web Search Metadata

LLM Answer Watcher tracks web search usage:

```json
{
  "intent_id": "best-email-tools",
  "model_provider": "openai",
  "model_name": "gpt-4o-mini",
  "answer_text": "The best email warmup tools are...",
  "web_search_used": true,
  "web_search_count": 3,
  "web_search_results": [
    {
      "url": "https://example.com/best-email-tools",
      "title": "Top Email Warmup Tools 2025",
      "snippet": "..."
    }
  ],
  "usage_meta": {
    "prompt_tokens": 150,
    "completion_tokens": 520,
    "web_search_tokens": 8000
  }
}
```

## Perplexity Configuration

### Basic Configuration

Use Perplexity for automatic web grounding:

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

**No additional configuration needed** - web search is automatic.

### Perplexity Model Selection

Choose model based on use case:

**`sonar`**: Fast, cost-effective

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar"
    env_api_key: "PERPLEXITY_API_KEY"
```

- **Cost**: (1/)1 per 1M tokens + ~$0.005 per request
- **Speed**: ~2-4 seconds per query
- **Use when**: Daily monitoring, high-volume queries

______________________________________________________________________

**`sonar-pro`**: High-quality grounded answers

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

- **Cost**: (3/)15 per 1M tokens + ~$0.01 per request
- **Speed**: ~3-6 seconds per query
- **Use when**: Weekly reports, competitive analysis

______________________________________________________________________

**`sonar-reasoning`**: Enhanced reasoning with web search

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-reasoning"
    env_api_key: "PERPLEXITY_API_KEY"
```

- **Cost**: (1/)5 per 1M tokens + ~$0.015 per request
- **Speed**: ~4-8 seconds per query
- **Use when**: Complex queries, deep analysis

______________________________________________________________________

**`sonar-deep-research`**: Comprehensive research mode

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-deep-research"
    env_api_key: "PERPLEXITY_API_KEY"
```

- **Cost**: (3/)15 per 1M tokens + ~(0.02-)0.03 per request
- **Speed**: ~8-15 seconds per query
- **Use when**: Monthly executive reports, thorough research

### Perplexity Citations

Perplexity provides source citations:

```json
{
  "intent_id": "best-email-tools",
  "model_provider": "perplexity",
  "model_name": "sonar-pro",
  "answer_text": "The best email warmup tools are...",
  "citations": [
    {
      "index": 1,
      "url": "https://www.g2.com/categories/email-warmup",
      "title": "Best Email Warmup Software 2025",
      "used_in_response": true
    },
    {
      "index": 2,
      "url": "https://blog.competitor.com/warmup-guide",
      "title": "Email Warmup Best Practices",
      "used_in_response": true
    }
  ]
}
```

Citation Analysis

Track which sources influence LLM recommendations:

- Identify key industry publications
- Monitor competitor content
- Find content opportunities
- Track source diversity

## Google Search Grounding Configuration

### Basic Configuration

Enable Google Search grounding for Gemini models:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"
    tools:
      - google_search: {}
```

**Key configuration points**:

- **`model_name`**: Must be a grounding-capable model (see [supported models](#supported-models) below)
- **`system_prompt`**: Use `"google/gemini-grounding"` for optimized grounding behavior
- **`tools`**: Use `google_search: {}` format (Google API specification)

### Configuration Format

Google uses a different tools format than OpenAI:

**Google format** (dictionary with tool name as key):

```yaml
tools:
  - google_search: {}
```

**OpenAI format** (dictionary with `type` field):

```yaml
tools:
  - type: "web_search"
tool_choice: "auto"
```

**Why the difference?**

- Each provider has different API specifications
- OpenAI uses typed tool specification with `tool_choice` control
- Google uses named tool objects with automatic decision-making
- The config does direct passthrough to each provider's API

No Tool Choice

Google Gemini automatically decides when to use Google Search based on the query. There's no `tool_choice` parameter - the model intelligently determines when grounding would improve the response.

### Supported Models

Not all Gemini models support Google Search grounding:

| Model                   | Grounding Support | Best For                                 |
| ----------------------- | ----------------- | ---------------------------------------- |
| `gemini-2.0-flash-lite` | ❌ No             | Fast, non-grounded queries               |
| `gemini-2.0-flash-exp`  | ⚠️ Experimental   | Testing new features                     |
| `gemini-2.5-flash`      | ✅ Yes            | **Recommended** - balanced speed/quality |
| `gemini-2.5-flash-lite` | ❌ No             | Fast, non-grounded queries               |
| `gemini-2.5-pro`        | ✅ Yes            | Highest quality grounding                |

**Recommendation**: Use `gemini-2.5-flash` for production. It provides excellent grounding quality at reasonable cost.

### System Prompt Optimization

Use the specialized `google/gemini-grounding` system prompt:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"  # Optimized for grounding
    tools:
      - google_search: {}
```

**What it does**:

- Instructs Gemini to use Google Search when beneficial
- Emphasizes grounding responses in search results
- Requests comprehensive source coverage
- Improves answer quality for brand monitoring

**Default system prompt** (`google/default.json`) also works but is less optimized for web search use cases.

### Configuration Examples

**With grounding** (recommended for brand monitoring):

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"
    tools:
      - google_search: {}

intents:
  - id: "email-warmup-tools"
    prompt: "What are the best email warmup tools in 2025?"
```

**Without grounding** (faster, uses only training data):

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash-lite"
    env_api_key: "GEMINI_API_KEY"
    # No tools or system_prompt specified

intents:
  - id: "email-warmup-tools"
    prompt: "What are the best email warmup tools?"
```

### Grounding Metadata

When Google Search is used, the response includes grounding metadata:

```json
{
  "intent_id": "email-warmup-tools",
  "model_provider": "google",
  "model_name": "gemini-2.5-flash",
  "answer_text": "Based on current research, the best email warmup tools are...",
  "web_search_results": {
    "web_search_queries": [
      "best email warmup tools 2025",
      "email warmup service comparison"
    ],
    "grounding_chunks": [
      {
        "web_source": "https://www.g2.com/categories/email-warmup",
        "retrieved_context": "Top-rated email warmup tools include..."
      }
    ],
    "grounding_supports": [
      {
        "segment": {
          "start_index": 150,
          "end_index": 200,
          "text": "Warmly is a leading email warmup solution"
        },
        "grounding_chunk_indices": [0, 2],
        "confidence_scores": [0.95, 0.88]
      }
    ]
  },
  "web_search_count": 2
}
```

**Key fields**:

- **`web_search_queries`**: Google Search queries Gemini performed
- **`grounding_chunks`**: Source URLs and retrieved context
- **`grounding_supports`**: Which text segments were grounded in which sources
- **`confidence_scores`**: How confident Gemini is in the grounding (0.0-1.0)

### Pricing

**Good news**: Google Search grounding has **no additional per-request fees**.

You only pay standard token costs:

| Model              | Input Cost        | Output Cost       |
| ------------------ | ----------------- | ----------------- |
| `gemini-2.5-flash` | $0.04 / 1M tokens | $0.12 / 1M tokens |
| `gemini-2.5-pro`   | $0.60 / 1M tokens | $1.80 / 1M tokens |

**Example cost** (email warmup query with grounding):

```text
Query: 100 tokens input
Response: 300 tokens output (with grounding context)

gemini-2.5-flash cost:
= (100 × $0.04/1M) + (300 × $0.12/1M)
= $0.000004 + $0.000036
= $0.00004 per query
```

**vs. OpenAI with web search**:

```text
OpenAI gpt-4o-mini with web_search:
= $0.0116 per query (~290x more expensive)
```

Cost Advantage

Google Search grounding is significantly cheaper than OpenAI web search for high-volume monitoring. Grounding tokens are included in base pricing.

### When to Use Google Search Grounding

**Use Google Search Grounding when**:

- ✅ You need current, real-time information
- ✅ You want Google's search quality and coverage
- ✅ You're running high-volume monitoring (cost-effective)
- ✅ You want automatic search decision-making
- ✅ You need grounding metadata with source attribution

**Use OpenAI web search when**:

- ✅ You need explicit `tool_choice` control (force or disable search)
- ✅ You prefer OpenAI's LLM reasoning quality
- ✅ You're already invested in OpenAI ecosystem

**Use Perplexity when**:

- ✅ You need explicit source citations with URLs
- ✅ You want always-on web search (no configuration)
- ✅ You prefer Perplexity's citation format

### Complete Example Configuration

Multi-provider comparison with side-by-side grounding:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    # Google with grounding (cost-effective, automatic)
    - provider: "google"
      model_name: "gemini-2.5-flash"
      env_api_key: "GEMINI_API_KEY"
      system_prompt: "google/gemini-grounding"
      tools:
        - google_search: {}

    # OpenAI with controlled web search
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"
      tools:
        - type: "web_search"
      tool_choice: "auto"

    # Perplexity with always-on citations
    - provider: "perplexity"
      model_name: "sonar-pro"
      env_api_key: "PERPLEXITY_API_KEY"

brands:
  mine:
    - "Warmly"
    - "Lemlist"
  competitors:
    - "HubSpot"
    - "Instantly"

intents:
  - id: "best-email-tools-2025"
    prompt: "What are the best email warmup tools in 2025?"
```

**This configuration enables**:

- Google: Automatic grounding with lowest cost
- OpenAI: LLM-controlled web search with reasoning
- Perplexity: Always-on search with explicit citations

Compare results across all three to understand:

- How each provider uses web search
- Cost vs. quality trade-offs
- Grounding vs. citation differences

## Cost Management for Web Search

### Web Search Cost Breakdown

**OpenAI gpt-4o-mini with web search**:

```text
Base query: $0.0004 (tokens only)
+ Web search call: $0.01 (per 1k calls)
+ Web search content: $0.0012 (8k tokens @ $0.15/1M)
= Total: ~$0.0116 per query
```

**Perplexity sonar-pro**:

```text
Base tokens: $0.0050 (500 output tokens @ $3/$15 per 1M)
+ Request fee: $0.01 (varies by complexity)
= Total: ~$0.015 per query
```

### Budget Configuration for Web Search

Adjust budgets to account for higher costs:

```yaml
run_settings:
  # Without web search
  budget:
    max_per_run_usd: 0.50

  # With web search (10-30x higher)
  budget:
    max_per_run_usd: 5.00
```

**Example calculation**:

- 3 intents × 2 models with web search = 6 queries
- ~$0.015 per query
- Total: $0.09 per run
- Recommended budget: $0.50 (5x safety margin)

### Optimizing Web Search Costs

**1. Use `auto` tool choice**:

```yaml
tools:
  - type: "web_search"
tool_choice: "auto"  # Only search when needed
```

Model only uses web search when beneficial, reducing unnecessary searches.

______________________________________________________________________

**2. Mix web and non-web models**:

```yaml
models:
  # Web-grounded for fresh data
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"

  # Base model for comparison
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    # No web search
```

Compare web vs. non-web responses to validate web search value.

______________________________________________________________________

**3. Use web search selectively**:

```yaml
intents:
  # Fresh data needed
  - id: "current-best-tools"
    prompt: "What are the best email tools in 2025?"
    # Use web search models for this intent

  # Historical query
  - id: "email-warmup-concept"
    prompt: "What is email warmup?"
    # No web search needed
```

Separate configs for different intent types.

______________________________________________________________________

**4. Track web search usage**:

```sql
-- Web search usage rate
SELECT
    model_name,
    COUNT(*) as total_queries,
    SUM(web_search_used) as web_searches,
    (SUM(web_search_used) * 100.0 / COUNT(*)) as usage_rate,
    AVG(estimated_cost_usd) as avg_cost
FROM answers_raw
WHERE model_provider = 'openai'
GROUP BY model_name;
```

Optimize based on actual usage patterns.

## Use Cases for Web Search

### 1. Recent Product Launches

Track brand visibility after launches:

```yaml
intents:
  - id: "best-tools-2025"
    prompt: "What are the best email warmup tools launched in 2025?"

models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

Web search ensures LLM knows about recent launches.

### 2. Current Competitive Landscape

Monitor live market positioning:

```yaml
intents:
  - id: "current-market-leaders"
    prompt: "Who are the current market leaders in email warmup?"

models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "required"  # Always search
```

### 3. Pricing and Features

Track current pricing mentions:

```yaml
intents:
  - id: "pricing-comparison"
    prompt: "Compare pricing for email warmup tools in 2025"

models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

### 4. News and Events

Monitor impact of news on brand visibility:

```yaml
intents:
  - id: "post-acquisition"
    prompt: "What are the best email tools after HubSpot's recent acquisition?"

models:
  - provider: "perplexity"
    model_name: "sonar-reasoning"
    env_api_key: "PERPLEXITY_API_KEY"
```

### 5. Trend Analysis

Track emerging trends:

```yaml
intents:
  - id: "ai-email-tools"
    prompt: "What are the best AI-powered email warmup tools?"

models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

## Analyzing Web Search Results

### Web Search Metadata

Check if web search was used:

```python
import json

with open("output/2025-11-01T08-00-00Z/intent_best-tools_raw_openai_gpt-4o-mini.json") as f:
    data = json.load(f)
    print(f"Web search used: {data.get('web_search_used')}")
    print(f"Searches performed: {data.get('web_search_count')}")
```

### Citation Analysis (Perplexity)

Extract and analyze citations:

```python
import json

with open("output/2025-11-01T08-00-00Z/intent_best-tools_raw_perplexity_sonar-pro.json") as f:
    data = json.load(f)
    for citation in data.get('citations', []):
        print(f"{citation['index']}: {citation['title']}")
        print(f"   {citation['url']}\n")
```

### Source Patterns

Track which sources LLMs cite:

```sql
-- Citation frequency (future feature)
SELECT
    citation_domain,
    COUNT(*) as citation_count,
    COUNT(DISTINCT intent_id) as intents_cited_in
FROM citations
GROUP BY citation_domain
ORDER BY citation_count DESC
LIMIT 10;
```

Citation Tracking

Full citation tracking is planned for a future release. Currently, citations are stored in JSON artifacts.

## Best Practices

### 1. Test With and Without Web Search

Compare to measure impact:

```yaml
models:
  # Baseline (no web search)
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  # Test (with web search)
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"
```

### 2. Use Auto Tool Choice

Let model decide when to search:

```yaml
tools:
  - type: "web_search"
tool_choice: "auto"  # More cost-effective
```

### 3. Budget Appropriately

Account for 10-30x cost increase:

```yaml
budget:
  max_per_run_usd: 5.00  # vs. $0.50 without web search
```

### 4. Use for Time-Sensitive Queries

Enable web search when freshness matters:

- Recent product launches
- Current pricing
- Latest competitive moves
- Industry news impact

### 5. Track Citation Sources

Monitor which sources influence rankings:

- Identify key industry publications
- Find content gaps
- Track competitor content
- Understand ranking factors

### 6. Combine Providers

Use multiple web search approaches:

```yaml
models:
  # OpenAI: Selective web search
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"

  # Perplexity: Always web-grounded
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

## Troubleshooting

### Web Search Not Working

**Problem**: Web search tool not being used

**Check**:

1. Tool configuration is correct:

   ```yaml
   tools:
     - type: "web_search"  # Correct
   # Not: tool_type or search_tool
   ```

1. Tool choice is set:

   ```yaml
   tool_choice: "auto"  # or "required"
   ```

1. Model supports web search:

1. OpenAI: All chat models

1. Perplexity: All models (automatic)

______________________________________________________________________

### High Costs

**Problem**: Web search costs higher than expected

**Solutions**:

1. Check tool choice:

   ```yaml
   tool_choice: "auto"  # Not "required"
   ```

1. Monitor usage:

   ```sql
   SELECT
       COUNT(*) as total,
       SUM(web_search_used) as searches,
       AVG(estimated_cost_usd) as avg_cost
   FROM answers_raw;
   ```

1. Use cheaper models:

   ```yaml
   models:
     - provider: "openai"
       model_name: "gpt-4o-mini"  # Cheapest with web search
   ```

______________________________________________________________________

### Inconsistent Results

**Problem**: Results vary between runs

**Cause**: Web content changes frequently

**Expected behavior**: Web-grounded responses will vary as web content updates.

**Mitigation**:

- Run multiple queries, average results
- Track trends over time vs. point-in-time snapshots
- Use non-web models for baseline comparison

## Next Steps

- **[Model Configuration](../models/)**: Choose models with web search
- **[Budget Configuration](../budget/)**: Budget for web search costs
- **[Cost Management](../../features/cost-management/)**: Track web search spending
- **[HTML Reports](../../features/html-reports/)**: View web search metadata

# Post-Intent Operations

Post-intent operations allow you to execute custom actions after each intent query completes. This advanced feature enables dynamic workflows like discovering competitors mentioned by LLMs.

## Overview

Operations are defined per-intent and execute after the LLM response is received:

```yaml
intents:
  - id: "best-tools"
    prompt: "What are the best tools?"
    operations:
      - type: "extract_competitors"
        save_to: "discovered_competitors"
```

## Supported Operation Types

### `extract_competitors`

Automatically extracts brand names mentioned in LLM responses that aren't in your configured brand lists.

**Use Case**: Discover new competitors you weren't tracking.

**Configuration**:

```yaml
intents:
  - id: "market-research"
    prompt: "What are all the tools in this category?"
    operations:
      - type: "extract_competitors"
        save_to: "discovered_brands"
        params:
          min_confidence: 0.7
          exclude_generic_terms: true
```

**Parameters**:

- `save_to` (required): Variable name to store results
- `min_confidence`: Minimum confidence threshold (0.0-1.0)
- `exclude_generic_terms`: Filter out generic words

**Output**:

Results saved to `intent_*_operation_extract_competitors.json`:

```json
{
  "operation_type": "extract_competitors",
  "discovered_brands": [
    {"name": "NewCompetitor", "confidence": 0.95},
    {"name": "EmergingTool", "confidence": 0.82}
  ]
}
```

## Operation Chaining

Execute multiple operations in sequence:

```yaml
intents:
  - id: "comprehensive-analysis"
    prompt: "Analyze the market landscape"
    operations:
      # Step 1: Extract competitors
      - type: "extract_competitors"
        save_to: "new_competitors"

      # Step 2: Could add more operations in future
      # - type: "sentiment_analysis"
      #   depends_on: "new_competitors"
```

Operations execute in order and can depend on previous results.

## Real-World Examples

### Market Discovery

Find competitors you didn't know about:

```yaml
intents:
  - id: "discover-market"
    prompt: "List all email marketing tools you know"
    operations:
      - type: "extract_competitors"
        save_to: "market_scan"
        params:
          min_confidence: 0.8
```

### Quarterly Expansion

Update your competitor list quarterly:

```yaml
intents:
  - id: "q1-market-scan"
    prompt: "What are the top 20 tools in our category as of Q1 2025?"
    operations:
      - type: "extract_competitors"
        save_to: "q1_competitors"
```

Then review `q1_competitors.json` and add new brands to your config.

## Best Practices

### 1. Use High Confidence Thresholds

Avoid false positives:

```yaml
params:
  min_confidence: 0.8  # Only very confident extractions
```

### 2. Review Before Adding to Config

Operations discover candidates - manually review before adding to your brand list.

### 3. Separate Discovery Intents

Create dedicated intents for competitor discovery:

```yaml
intents:
  # Regular monitoring
  - id: "best-tools"
    prompt: "What are the best tools?"

  # Discovery (run monthly)
  - id: "market-discovery"
    prompt: "Comprehensive list of all tools in category"
    operations:
      - type: "extract_competitors"
        save_to: "monthly_scan"
```

## Accessing Operation Results

Results are stored in the output directory:

```text
output/2025-11-05T14-30-00Z/
├── intent_market-discovery_operation_extract_competitors.json
└── ...
```

Also queryable from SQLite:

```sql
SELECT operation_type, operation_results
FROM intent_operations
WHERE intent_id = 'market-discovery';
```

## Future Operation Types

Planned for future releases:

- `sentiment_analysis`: Analyze tone of brand mentions
- `feature_extraction`: Extract mentioned features/capabilities
- `pricing_detection`: Detect pricing information
- `use_case_mapping`: Map brands to specific use cases

## Limitations

- Operations run synchronously (no parallel execution yet)
- Limited to extraction tasks (no API calls or external actions)
- Results require manual review before acting on them

## Next Steps

- [Learn about intent configuration](../intents/)
- [See complete examples](../../../examples/ci-cd-integration/)
# Core Features

# Brand Mention Detection

Brand mention detection is the core feature of LLM Answer Watcher. It uses word-boundary regex matching to accurately identify brand mentions while preventing false positives.

## How It Works

### Word-Boundary Matching

The system uses **word-boundary regex** (`\b`) to ensure accurate matching:

```python
# Pattern: \bHubSpot\b
# Matches: "I use HubSpot daily"
# Doesn't match: "I use HubSpotter" or "hub" in "GitHub"
```

This prevents common false positives:

- ✅ "HubSpot" matches "HubSpot" exactly
- ❌ "Hub" does NOT match "HubSpot"
- ❌ "Spot" does NOT match "HubSpot"
- ❌ "hub" does NOT match "GitHub"

### Case-Insensitive Matching

All matching is case-insensitive:

```python
# All these match "HubSpot"
"HubSpot", "hubspot", "HUBSPOT", "HuBsPoT"
```

### Brand Aliases

Configure multiple aliases for each brand:

```yaml
brands:
  mine:
    - "Warmly"
    - "Warmly.io"
    - "Warmly AI"

  competitors:
    - "HubSpot"
    - "HubSpot CRM"
    - "Instantly"
    - "Instantly.ai"
```

## Configuration

### Basic Brand Configuration

Minimal configuration with your brand and competitors:

```yaml
brands:
  mine:
    - "YourBrand"

  competitors:
    - "CompetitorA"
    - "CompetitorB"
```

### Advanced Brand Configuration

Include all variations and common misspellings:

```yaml
brands:
  mine:
    - "Acme Corp"
    - "Acme"
    - "AcmeCorp"
    - "Acme.io"
    - "Acme Software"

  competitors:
    # Direct competitors
    - "Competitor One"
    - "CompetitorOne"
    - "Competitor1"

    # Market leaders
    - "Industry Leader"
    - "Big Player Inc"

    # Adjacent competitors
    - "Alternative Tool"
```

### Brand Normalization

Brands are normalized for storage and analysis:

```python
"HubSpot CRM" → "hubspot-crm"
"Instantly.ai" → "instantly-ai"
"Apollo.io" → "apollo-io"
```

This ensures consistent matching across different formats.

## Detection Methods

### Method 1: Regex (Default)

Fast, free, pattern-based detection.

**Advantages:**

- Zero cost (no API calls)
- Instant results
- 100% consistent
- Works offline

**Limitations:**

- May miss contextual mentions
- Requires exact alias match
- No semantic understanding

**Configuration:**

```yaml
run_settings:
  use_llm_rank_extraction: false
```

### Method 2: Function Calling

LLM-assisted detection using function calling for higher accuracy.

**Advantages:**

- Understands context
- Catches variations
- Semantic understanding
- Confidence scores

**Limitations:**

- Costs money per query
- Slower than regex
- Requires extraction model

**Configuration:**

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7
```

### Method 3: Hybrid

Combines regex and function calling for best results.

**How it works:**

1. Try regex first (fast, free)
1. If regex fails, use function calling
1. Merge results with de-duplication

**Configuration:**

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "hybrid"
  fallback_to_regex: true
  min_confidence: 0.7
```

## Detection Results

### Mention Object

Each detected mention includes:

```json
{
  "brand": "HubSpot",
  "normalized_name": "hubspot",
  "is_mine": false,
  "rank_position": 1,
  "snippet": "...I recommend HubSpot for CRM needs...",
  "confidence": 1.0,
  "detection_method": "regex"
}
```

### My Brands vs Competitors

Mentions are categorized:

```json
{
  "my_mentions": [
    {
      "brand": "Warmly",
      "is_mine": true,
      "rank_position": 2
    }
  ],
  "competitor_mentions": [
    {
      "brand": "HubSpot",
      "is_mine": false,
      "rank_position": 1
    },
    {
      "brand": "Instantly",
      "is_mine": false,
      "rank_position": 3
    }
  ]
}
```

## Common Detection Patterns

### Pattern 1: Exact Brand Name

**LLM Response:**

> "The best email warmup tools are Warmly, Instantly, and Lemwarm."

**Detected:**

- ✅ Warmly
- ✅ Instantly
- ✅ Lemwarm

### Pattern 2: Brand with TLD

**LLM Response:**

> "Check out Warmly.io for email warmup."

**Detected:**

- ✅ Warmly.io

**Note:** Add both "Warmly" and "Warmly.io" as aliases to catch both.

### Pattern 3: Brand in Context

**LLM Response:**

> "Many sales teams use HubSpot CRM to manage leads."

**Detected:**

- ✅ HubSpot CRM
- ✅ HubSpot (if both aliases configured)

### Pattern 4: Case Variations

**LLM Response:**

> "HUBSPOT and hubspot are the same product."

**Detected:**

- ✅ HubSpot (both instances)

## Preventing False Positives

### Use Word Boundaries

**❌ Bad - Substring Matching:**

```yaml
brands:
  mine:
    - "Hub"  # Matches "GitHub", "HubSpot", "hub"
```

This creates false positives.

**✅ Good - Full Word Matching:**

```yaml
brands:
  mine:
    - "HubSpot"  # Only matches "HubSpot"
```

Word boundaries prevent substring matches.

### Avoid Overly Generic Names

**❌ Bad:**

```yaml
brands:
  competitors:
    - "AI"  # Too generic
    - "The"
    - "Pro"
```

**✅ Good:**

```yaml
brands:
  competitors:
    - "OpenAI"
    - "The Sales Platform"
    - "Pro CRM"
```

### Test Your Aliases

```bash
# Validate configuration
llm-answer-watcher validate --config watcher.config.yaml

# Run with example intents
llm-answer-watcher run --config watcher.config.yaml
```

## Detection Accuracy

### Evaluation Metrics

LLM Answer Watcher tracks detection accuracy:

| Metric        | Description                           | Target |
| ------------- | ------------------------------------- | ------ |
| **Precision** | Correct mentions / Total detected     | ≥ 90%  |
| **Recall**    | Correct mentions / Expected mentions  | ≥ 80%  |
| **F1 Score**  | Harmonic mean of precision and recall | ≥ 85%  |

### Run Evaluations

```bash
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml
```

See [Evaluation Framework](../../../evaluation/overview/) for details.

## Advanced Detection

### Special Characters

Escape special characters in brand names:

```yaml
brands:
  mine:
    - "Brand (TM)"  # Automatically escaped
    - "Brand.io"
    - "Brand-Name"
```

The system handles escaping automatically.

### Multi-Word Brands

```yaml
brands:
  competitors:
    - "Acme Corp"
    - "Big Company Inc"
    - "The Sales Platform"
```

Word boundaries work across multiple words.

### Abbreviations

Add both full name and abbreviation:

```yaml
brands:
  competitors:
    - "Customer Relationship Management"
    - "CRM"
    - "HubSpot CRM"
```

## Debugging Detection Issues

### Issue: Brand Not Detected

**Problem:** Your brand appears in response but isn't detected.

**Solutions:**

1. Check brand alias spelling:

```bash
# View raw response
cat output/2025-11-05T14-30-00Z/intent_*_raw_*.json | jq '.answer_text'
```

1. Add alias variation:

```yaml
brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"
    - "Your Brand"  # Add this
```

1. Check for special formatting:

```json
"Check out **YourBrand**"  // Bold formatting
"Visit `YourBrand.io`"     // Code formatting
```

### Issue: False Positives

**Problem:** Unrelated words are detected as brand mentions.

**Solutions:**

1. Remove overly generic aliases:

```yaml
# ❌ Remove this
brands:
  mine:
    - "AI"

# ✅ Use this instead
brands:
  mine:
    - "YourBrand AI"
```

1. Check word boundaries are working:

```bash
# Test with evaluation suite
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml
```

### Issue: Case Sensitivity

**Problem:** Brand detected with wrong capitalization.

**Solution:** Matching is already case-insensitive, but display preserves original case from LLM response.

```python
# All match the same brand
"HubSpot" → normalized to "hubspot"
"hubspot" → normalized to "hubspot"
"HUBSPOT" → normalized to "hubspot"
```

## Best Practices

### 1. Start with Core Aliases

```yaml
brands:
  mine:
    - "YourBrand"      # Exact name
    - "YourBrand.io"   # With TLD
```

### 2. Add Variations Incrementally

Run monitoring, review results, add missing aliases:

```yaml
brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"
    - "YourBrand AI"    # Added after reviewing results
    - "YB"              # Abbreviation if commonly used
```

### 3. Limit Competitor List

Track 10-20 key competitors:

```yaml
brands:
  competitors:
    # Top 5 direct competitors
    - "Competitor A"
    - "Competitor B"
    # Top 3 market leaders
    - "Market Leader"
```

### 4. Monitor Detection Metrics

```sql
-- Check detection rates
SELECT
    brand,
    COUNT(*) as total_mentions,
    COUNT(DISTINCT run_id) as runs_appeared,
    COUNT(*) * 100.0 / (SELECT COUNT(*) FROM runs) as appearance_rate
FROM mentions
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY brand
ORDER BY total_mentions DESC;
```

### 5. Use Evaluation Suite

```bash
# Test detection before deploying
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml

# Add custom test cases for your brands
# See: evals/testcases/fixtures.yaml
```

## Next Steps

- **Rank Extraction**

  ______________________________________________________________________

  Learn how ranking positions are extracted

  [Rank Extraction →](../rank-extraction/)

- **Function Calling**

  ______________________________________________________________________

  Use LLM-assisted detection for higher accuracy

  [Function Calling →](../function-calling/)

- **Evaluation Framework**

  ______________________________________________________________________

  Test and validate detection accuracy

  [Evaluation Guide →](../../../evaluation/overview/)

- **Brand Configuration**

  ______________________________________________________________________

  Deep dive into brand configuration strategies

  [Brand Config →](../../configuration/brands/)

# Rank Extraction

Rank extraction identifies where brands appear in ranked lists within LLM responses. This feature helps track competitive positioning and brand visibility.

## Overview

When LLMs generate lists like "The best tools are:", rank extraction determines:

1. **Position**: Where each brand appears (1st, 2nd, 3rd, etc.)
1. **Context**: Whether it's an explicit ranking or casual mention
1. **Competitors**: How your brand ranks against competitors

## How It Works

### Pattern-Based Extraction (Default)

Uses regex patterns to detect numbered or bulleted lists:

**Supported Patterns:**

```text
# Numbered lists
1. HubSpot
2. Salesforce
3. Pipedrive

# With periods
1) HubSpot
2) Salesforce

# With dashes
- HubSpot
- Salesforce

# With asterisks
* HubSpot
* Salesforce

# With letters
a. HubSpot
b. Salesforce
```

### Ranking Algorithm

1. **Detect List Structure**: Find numbered/bulleted lists in response
1. **Extract Brand Names**: Match brands within list items
1. **Assign Positions**: Number brands sequentially (1, 2, 3...)
1. **Handle Ties**: Brands in same list item get same rank

## Configuration

### Use Regex Extraction (Free)

Default method - no additional configuration needed:

```yaml
run_settings:
  use_llm_rank_extraction: false  # Use pattern-based extraction
```

**Advantages:**

- ✅ Zero cost
- ✅ Fast
- ✅ Deterministic
- ✅ Works offline

**Limitations:**

- ❌ May miss implicit rankings
- ❌ Requires explicit list structure
- ❌ No semantic understanding

### Use LLM Extraction (Paid)

LLM-assisted extraction for complex rankings:

```yaml
run_settings:
  use_llm_rank_extraction: true  # Use LLM for extraction

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
  method: "function_calling"
  min_confidence: 0.7
```

**Advantages:**

- ✅ Understands context
- ✅ Extracts implicit rankings
- ✅ Handles complex formats
- ✅ Semantic understanding

**Limitations:**

- ❌ Costs money per query
- ❌ Slower than regex
- ❌ May be inconsistent

## Ranking Examples

### Example 1: Simple Numbered List

**LLM Response:**

```text
The best email warmup tools are:
1. Instantly
2. Warmly
3. Lemwarm
```

**Extracted Rankings:**

```json
[
  {"brand": "Instantly", "rank_position": 1},
  {"brand": "Warmly", "rank_position": 2},
  {"brand": "Lemwarm", "rank_position": 3}
]
```

### Example 2: Descriptive List

**LLM Response:**

```text
Top CRM tools:
1. HubSpot - Great for startups
2. Salesforce - Enterprise solution
3. Pipedrive - Sales-focused
```

**Extracted Rankings:**

```json
[
  {"brand": "HubSpot", "rank_position": 1},
  {"brand": "Salesforce", "rank_position": 2},
  {"brand": "Pipedrive", "rank_position": 3}
]
```

### Example 3: Multiple Brands Per Item

**LLM Response:**

```text
Best tools for sales teams:
1. HubSpot and Salesforce for enterprise
2. Pipedrive for small teams
```

**Extracted Rankings:**

```json
[
  {"brand": "HubSpot", "rank_position": 1},
  {"brand": "Salesforce", "rank_position": 1},
  {"brand": "Pipedrive", "rank_position": 2}
]
```

### Example 4: Bulleted List

**LLM Response:**

```text
- Instantly: Best for cold email
- Warmly: Great for personalization
- Lemwarm: Simple and effective
```

**Extracted Rankings:**

```json
[
  {"brand": "Instantly", "rank_position": 1},
  {"brand": "Warmly", "rank_position": 2},
  {"brand": "Lemwarm", "rank_position": 3}
]
```

### Example 5: Prose (No Ranking)

**LLM Response:**

```text
I've used HubSpot, Salesforce, and Pipedrive. They're all good options.
```

**Extracted Rankings:**

```json
[
  {"brand": "HubSpot", "rank_position": null},
  {"brand": "Salesforce", "rank_position": null},
  {"brand": "Pipedrive", "rank_position": null}
]
```

**Note:** Mentions detected but no ranking assigned (not in a list).

## Rank Position Meanings

### Position 1

**Highest visibility** - First recommendation.

```sql
-- Count #1 rankings
SELECT brand, COUNT(*) as first_place_count
FROM mentions
WHERE rank_position = 1
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY brand
ORDER BY first_place_count DESC;
```

### Positions 2-5

**High visibility** - Listed in top recommendations.

### Positions 6-10

**Medium visibility** - Included in comprehensive lists.

### Position NULL

**Mentioned but not ranked** - Appears in prose or examples.

## Analyzing Rankings

### Average Rank Position

Lower is better (1 is best):

```sql
SELECT
    brand,
    AVG(rank_position) as avg_rank,
    COUNT(*) as mentions,
    COUNT(CASE WHEN rank_position = 1 THEN 1 END) as first_place_count
FROM mentions
WHERE rank_position IS NOT NULL
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY brand
ORDER BY avg_rank ASC;
```

### Rank Distribution

See where brands typically appear:

```sql
SELECT
    rank_position,
    COUNT(*) as mention_count,
    COUNT(DISTINCT brand) as unique_brands
FROM mentions
WHERE rank_position IS NOT NULL
GROUP BY rank_position
ORDER BY rank_position;
```

### Competitor Comparison

Compare your rank against competitors:

```sql
SELECT
    m1.run_id,
    m1.intent_id,
    my_brand.rank_position as my_rank,
    competitor.brand as competitor_name,
    competitor.rank_position as competitor_rank
FROM mentions m1
JOIN mentions my_brand ON m1.run_id = my_brand.run_id
  AND m1.intent_id = my_brand.intent_id
  AND my_brand.is_mine = 1
JOIN mentions competitor ON m1.run_id = competitor.run_id
  AND m1.intent_id = competitor.intent_id
  AND competitor.is_mine = 0
WHERE m1.timestamp_utc >= datetime('now', '-7 days')
ORDER BY m1.timestamp_utc DESC;
```

## Rank Trends

Track how rankings change over time:

```sql
SELECT
    DATE(timestamp_utc) as date,
    brand,
    AVG(rank_position) as avg_rank,
    COUNT(*) as mentions
FROM mentions
WHERE rank_position IS NOT NULL
  AND brand = 'YourBrand'
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc), brand
ORDER BY date DESC;
```

## Common Ranking Patterns

### Pattern 1: Direct Recommendation

**Prompt:** "What's the best CRM?"

**Response:** "I recommend HubSpot for most teams."

**Rank:** Position 1 (single recommendation)

### Pattern 2: Top 3 List

**Prompt:** "Top 3 CRM tools?"

**Response:**

```text
1. HubSpot
2. Salesforce
3. Pipedrive
```

**Rank:** Explicit positions 1-3

### Pattern 3: Comprehensive List

**Prompt:** "List all major CRM tools"

**Response:** Lists 10+ tools

**Rank:** All assigned positions, less emphasis on specific rank

### Pattern 4: Categorized Lists

**Prompt:** "Best CRM by company size?"

**Response:**

```text
For startups:
1. HubSpot
2. Pipedrive

For enterprise:
1. Salesforce
2. Microsoft Dynamics
```

**Rank:** Multiple brands at position 1 (different categories)

## Debugging Ranking Issues

### Issue: No Rankings Detected

**Problem:** Brands detected but `rank_position` is `null`.

**Cause:** Response doesn't contain explicit lists.

**Example Response:**

```text
I've used HubSpot and Salesforce. Both are great options.
```

**Solution:**

1. Update intent prompts to encourage rankings:

```yaml
# ❌ Generic
prompt: "Tell me about CRM tools"

# ✅ Ranking-focused
prompt: "What are the top 5 CRM tools ranked by popularity?"
```

1. Enable LLM rank extraction:

```yaml
run_settings:
  use_llm_rank_extraction: true
```

### Issue: Incorrect Rankings

**Problem:** Rankings don't match actual LLM response order.

**Debugging:**

```bash
# View raw response
cat output/2025-11-05T14-30-00Z/intent_*_raw_*.json | jq '.answer_text'

# View extracted rankings
cat output/2025-11-05T14-30-00Z/intent_*_parsed_*.json | jq '.ranked_list'
```

**Solutions:**

1. Check for unusual list formatting
1. Enable LLM rank extraction
1. Add evaluation test case

### Issue: All Brands Ranked #1

**Problem:** Multiple brands get `rank_position: 1`.

**Cause:** Brands appear in separate lists or categories.

**Example:**

```text
Best for startups: HubSpot
Best for enterprise: Salesforce
```

Both get rank 1 (different contexts).

**This is correct behavior** - each is #1 in its category.

## Best Practices

### 1. Design Ranking-Friendly Prompts

```yaml
intents:
  # ✅ Good - Encourages ranking
  - id: "top-5-crm-tools"
    prompt: "What are the top 5 CRM tools ranked by market share?"

  # ✅ Good - Specific ranking criteria
  - id: "best-for-startups"
    prompt: "Rank the best CRM tools for early-stage startups"

  # ❌ Bad - No ranking signal
  - id: "crm-info"
    prompt: "Tell me about CRM software"
```

### 2. Use Regex First, LLM as Fallback

```yaml
extraction_settings:
  method: "hybrid"  # Try regex, fallback to LLM
  fallback_to_regex: true
```

### 3. Track Rank Changes

```sql
-- Alert when rank drops
WITH latest_ranks AS (
  SELECT brand, AVG(rank_position) as current_avg
  FROM mentions
  WHERE timestamp_utc >= datetime('now', '-7 days')
    AND brand = 'YourBrand'
  GROUP BY brand
),
previous_ranks AS (
  SELECT brand, AVG(rank_position) as previous_avg
  FROM mentions
  WHERE timestamp_utc >= datetime('now', '-14 days')
    AND timestamp_utc < datetime('now', '-7 days')
    AND brand = 'YourBrand'
  GROUP BY brand
)
SELECT
  l.brand,
  p.previous_avg as previous_rank,
  l.current_avg as current_rank,
  l.current_avg - p.previous_avg as rank_change
FROM latest_ranks l
JOIN previous_ranks p ON l.brand = p.brand
WHERE l.current_avg > p.previous_avg;  -- Rank got worse (higher number)
```

### 4. Analyze by Intent

Some intents may favor certain brands:

```sql
SELECT
    intent_id,
    brand,
    AVG(rank_position) as avg_rank,
    COUNT(*) as mentions
FROM mentions
WHERE rank_position IS NOT NULL
GROUP BY intent_id, brand
ORDER BY intent_id, avg_rank;
```

### 5. Monitor First-Place Wins

```sql
-- Track #1 rankings over time
SELECT
    DATE(timestamp_utc) as date,
    COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_first_place,
    COUNT(CASE WHEN is_mine = 0 THEN 1 END) as competitor_first_place
FROM mentions
WHERE rank_position = 1
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

## Next Steps

- **Brand Detection**

  ______________________________________________________________________

  Learn how brands are detected

  [Brand Detection →](../brand-detection/)

- **Function Calling**

  ______________________________________________________________________

  Use LLM-assisted ranking extraction

  [Function Calling →](../function-calling/)

- **Query Examples**

  ______________________________________________________________________

  SQL queries for ranking analysis

  [Query Examples →](../../../data-analytics/query-examples/)

- **Trends Analysis**

  ______________________________________________________________________

  Track ranking changes over time

  [Trends Analysis →](../../../data-analytics/trends-analysis/)

# Function Calling for Extraction

Function calling uses LLMs to extract structured data from responses with higher accuracy than regex-based extraction. This feature enables semantic understanding of brand mentions and rankings.

## Overview

Function calling instructs the LLM to output structured JSON matching a specific schema, ensuring consistent, parseable extraction results.

### When to Use

✅ **Use function calling when:**

- Regex extraction misses complex mentions
- You need contextual understanding
- Rankings are implicit (not in explicit lists)
- Budget allows for additional API calls

❌ **Skip function calling when:**

- Regex works well for your use case
- Optimizing for cost (regex is free)
- Brand names are simple and unambiguous
- Running frequent monitoring (hourly/daily)

## Configuration

### Basic Setup

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7
```

### Advanced Configuration

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"  # Fast, cheap extraction model
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  # Extraction method
  method: "function_calling"  # Options: function_calling, regex, hybrid

  # Fall back to regex if function calling fails
  fallback_to_regex: true

  # Minimum confidence threshold (0.0-1.0)
  min_confidence: 0.7

  # Maximum extraction attempts
  max_retries: 2
```

## Extraction Methods

### Method 1: Function Calling Only

Use LLM for all extraction:

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "function_calling"
  fallback_to_regex: false  # Don't fall back
```

**Cost:** ~$0.001-0.003 per extraction

### Method 2: Regex Only

Use pattern matching (no LLM):

```yaml
run_settings:
  use_llm_rank_extraction: false

# No extraction_settings needed
```

**Cost:** Free

### Method 3: Hybrid (Recommended)

Try regex first, use LLM as fallback:

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "hybrid"
  fallback_to_regex: true
```

**Cost:** Variable (free for regex hits, paid for LLM fallback)

## Function Schema

### Competitor Detection Function

```json
{
  "name": "extract_competitor_mentions",
  "description": "Extract mentions of competitor brands from LLM response",
  "parameters": {
    "type": "object",
    "properties": {
      "competitors": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "brand": {
              "type": "string",
              "description": "Exact brand name as mentioned"
            },
            "rank_position": {
              "type": "integer",
              "description": "Position in ranked list (1=first, null=not ranked)"
            },
            "confidence": {
              "type": "number",
              "description": "Confidence score 0.0-1.0"
            },
            "context": {
              "type": "string",
              "description": "Surrounding context of the mention"
            }
          },
          "required": ["brand", "confidence"]
        }
      }
    },
    "required": ["competitors"]
  }
}
```

### Example LLM Response

**Input (LLM answer):**

```text
The best email warmup tools are:
1. Instantly - Great for cold email
2. Warmly - Excellent personalization
3. Lemwarm - Simple and effective
```

**Function Call Output:**

```json
{
  "competitors": [
    {
      "brand": "Instantly",
      "rank_position": 1,
      "confidence": 0.95,
      "context": "Great for cold email"
    },
    {
      "brand": "Warmly",
      "rank_position": 2,
      "confidence": 0.95,
      "context": "Excellent personalization"
    },
    {
      "brand": "Lemwarm",
      "rank_position": 3,
      "confidence": 0.90,
      "context": "Simple and effective"
    }
  ]
}
```

## Confidence Scores

### Confidence Threshold

Only accept extractions above confidence threshold:

```yaml
extraction_settings:
  min_confidence: 0.7  # Reject extractions < 70% confidence
```

### Confidence Levels

| Range     | Quality  | Action                    |
| --------- | -------- | ------------------------- |
| 0.90-1.00 | High     | Accept automatically      |
| 0.70-0.89 | Medium   | Accept with review        |
| 0.50-0.69 | Low      | Reject or flag for review |
| 0.00-0.49 | Very Low | Reject                    |

### Interpreting Confidence

**High confidence (0.9+):**

- Clear, unambiguous mention
- Explicit ranking
- Standard brand name

**Medium confidence (0.7-0.9):**

- Slight ambiguity
- Implicit ranking
- Brand name variation

**Low confidence (\<0.7):**

- Ambiguous mention
- Unclear ranking
- Possible false positive

## Cost Management

### Extraction Costs

Function calling adds extra API calls:

| Model            | Cost per 1K tokens          | Typical Extraction Cost |
| ---------------- | --------------------------- | ----------------------- |
| gpt-4o-mini      | $0.15 input / $0.60 output  | $0.001-0.002            |
| gpt-4o           | $2.50 input / $10.00 output | $0.010-0.020            |
| claude-3-5-haiku | $0.80 input / $4.00 output  | $0.003-0.005            |

### Cost Optimization

**1. Use cheap extraction models:**

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"  # Cheapest option
```

**2. Use hybrid method:**

```yaml
extraction_settings:
  method: "hybrid"  # Free regex first, LLM fallback
```

**3. Cache extraction results:**

Extraction results are stored in SQLite and reused.

**4. Limit extraction to important intents:**

```yaml
intents:
  - id: "high-priority"
    prompt: "..."
    use_extraction: true  # Enable for this intent

  - id: "low-priority"
    prompt: "..."
    use_extraction: false  # Skip for this intent
```

## Advantages Over Regex

### 1. Semantic Understanding

**Regex:**

```text
"I recommend HubSpot" → Detected
"HubSpot is not recommended" → Detected (false positive)
```

**Function Calling:**

```text
"I recommend HubSpot" → Detected with positive context
"HubSpot is not recommended" → Not detected (understands negation)
```

### 2. Implicit Rankings

**LLM Response:**

```text
"While Salesforce is the market leader, I prefer HubSpot for startups."
```

**Regex:** No ranking detected (no list structure)

**Function Calling:** Detects HubSpot as preferred (rank 1)

### 3. Context Extraction

Function calling extracts surrounding context:

```json
{
  "brand": "HubSpot",
  "rank_position": 1,
  "context": "Great for startups with limited budget",
  "confidence": 0.92
}
```

### 4. Handles Variations

**LLM mentions:** "HS CRM", "HubSpot's CRM", "HubSpot platform"

**Regex:** Misses variations

**Function Calling:** Normalizes all to "HubSpot"

## Debugging Function Calling

### View Function Call Logs

```bash
# Enable verbose logging
export LOG_LEVEL=DEBUG

llm-answer-watcher run --config watcher.config.yaml --verbose
```

### Check Extraction Results

```bash
# View parsed results
cat output/2025-11-05T14-30-00Z/intent_*_parsed_*.json | jq '.extraction_method'
# Output: "function_calling" or "regex"
```

### Common Issues

**Issue: Low confidence scores**

**Solution:** Adjust threshold:

```yaml
extraction_settings:
  min_confidence: 0.6  # Lower threshold
```

**Issue: High costs**

**Solution:** Switch to hybrid:

```yaml
extraction_settings:
  method: "hybrid"  # Use regex when possible
```

**Issue: Inconsistent results**

**Solution:** Use specific system prompt:

```yaml
extraction_settings:
  extraction_model:
    system_prompt: "openai/extraction-strict"  # More consistent
```

## Best Practices

### 1. Start with Regex

Test regex extraction first:

```yaml
run_settings:
  use_llm_rank_extraction: false
```

If accuracy is insufficient, enable function calling.

### 2. Use Hybrid Method

Best of both worlds:

```yaml
extraction_settings:
  method: "hybrid"
  fallback_to_regex: true
```

### 3. Monitor Extraction Costs

```sql
SELECT
    DATE(timestamp_utc) as date,
    SUM(estimated_cost_usd) as total_cost,
    COUNT(*) as extractions
FROM answers_raw
WHERE extraction_method = 'function_calling'
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc);
```

### 4. Test with Eval Suite

```bash
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml
```

### 5. Use Dedicated Extraction Model

Don't use expensive models for extraction:

```yaml
# ❌ Bad - expensive
extraction_model:
  model_name: "gpt-4o"

# ✅ Good - cheap and fast
extraction_model:
  model_name: "gpt-4o-mini"
```

## Next Steps

- **Brand Detection**

  ______________________________________________________________________

  Understanding brand mention detection

  [Brand Detection →](../brand-detection/)

- **Rank Extraction**

  ______________________________________________________________________

  How rankings are extracted

  [Rank Extraction →](../rank-extraction/)

- **Cost Management**

  ______________________________________________________________________

  Managing LLM costs

  [Cost Management →](../cost-management/)

- **Evaluation**

  ______________________________________________________________________

  Test extraction accuracy

  [Evaluation →](../../../evaluation/overview/)

# Sentiment Analysis & Intent Classification

Advanced analysis features that extract sentiment, context, and intent from brand mentions and user queries using LLM function calling.

New in v0.1.0

These features were added to enhance brand mention analysis and enable prioritization of high-value queries.

## Overview

LLM Answer Watcher includes two powerful analysis features:

1. **Sentiment Analysis**: Analyzes the tone and context of each brand mention
1. **Intent Classification**: Determines the user's intent and buyer journey stage for each query

Both features use OpenAI's function calling API for accurate, structured extraction.

## Sentiment Analysis

### What It Analyzes

For each brand mention, the system extracts:

**Sentiment** - Emotional tone:

- `positive`: Brand recommended or praised
- `neutral`: Brand mentioned without judgment
- `negative`: Brand criticized or not recommended

**Mention Context** - How the brand was mentioned:

- `primary_recommendation`: Brand is the top recommendation
- `alternative_listing`: Brand listed as one of several options
- `competitor_negative`: Brand mentioned as inferior to others
- `competitor_neutral`: Brand compared without negative bias
- `passing_reference`: Brief mention without detail

### Example

Query: *"What are the best email warmup tools?"*

LLM Response: *"The best tools are Lemwarm for automated warmup and Instantly for cold outreach. HubSpot is also an option but quite expensive."*

**Extracted Sentiments:**

| Brand     | Sentiment  | Context                  | Reasoning                              |
| --------- | ---------- | ------------------------ | -------------------------------------- |
| Lemwarm   | `positive` | `primary_recommendation` | Listed first with positive qualifier   |
| Instantly | `positive` | `primary_recommendation` | Listed alongside Lemwarm with use case |
| HubSpot   | `neutral`  | `alternative_listing`    | Mentioned as option with cost caveat   |

### Configuration

Enable sentiment analysis in `extraction_settings`:

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "function_calling"

  # Enable sentiment analysis (default: true)
  enable_sentiment_analysis: true
```

Function Calling Required

Sentiment analysis only works with `method: "function_calling"`. Regex extraction does not support sentiment analysis (fields will be `None`).

### Cost Impact

Sentiment analysis is integrated into function calling extraction:

- **No extra LLM calls** - sentiment extracted in same call as brand mentions
- **Cost increase**: ~33% per extraction call due to larger response schema
- **Example**: $0.0002 → $0.00027 per extraction with gpt-4o-mini

### Database Storage

Sentiments are stored in the `mentions` table:

```sql
SELECT brand, sentiment, mention_context, timestamp_utc
FROM mentions
WHERE sentiment = 'positive'
  AND mention_context = 'primary_recommendation'
ORDER BY timestamp_utc DESC;
```

Schema:

```sql
ALTER TABLE mentions ADD COLUMN sentiment TEXT;
ALTER TABLE mentions ADD COLUMN mention_context TEXT;
```

## Intent Classification

### What It Classifies

For each user query, the system determines:

**Intent Type** - What the user wants:

- `transactional`: Ready to buy/use a tool
- `commercial_investigation`: Researching options before purchase
- `informational`: Learning about a topic
- `navigational`: Looking for a specific brand/site

**Buyer Journey Stage** - Where they are in the purchase process:

- `awareness`: Learning about the category
- `consideration`: Evaluating options
- `decision`: Ready to choose/purchase

**Urgency Signal** - How urgent is the need:

- `high`: Immediate need ("now", "urgent", "today")
- `medium`: Near-term need ("soon", "this week")
- `low`: Future or casual exploration

**Classification Confidence** - How confident the model is (0.0-1.0)

**Reasoning** - Explanation of why it was classified this way

### Examples

#### High-Value Query

Query: *"What are the best email warmup tools to buy now for my outreach campaign?"*

Classification:

```json
{
  "intent_type": "transactional",
  "buyer_stage": "decision",
  "urgency_signal": "high",
  "classification_confidence": 0.95,
  "reasoning": "Query contains 'buy now' and specific use case, indicating ready-to-purchase intent with high urgency"
}
```

#### Research Query

Query: *"How do email warmup tools work?"*

Classification:

```json
{
  "intent_type": "informational",
  "buyer_stage": "awareness",
  "urgency_signal": "low",
  "classification_confidence": 0.92,
  "reasoning": "Query seeks explanation, indicating learning phase without purchase intent"
}
```

#### Comparison Query

Query: *"Compare Lemwarm vs Instantly for cold email"*

Classification:

```json
{
  "intent_type": "commercial_investigation",
  "buyer_stage": "consideration",
  "urgency_signal": "medium",
  "classification_confidence": 0.88,
  "reasoning": "Direct comparison of specific brands indicates evaluation phase before purchase decision"
}
```

### Configuration

Enable intent classification in `extraction_settings`:

```yaml
extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  # Enable intent classification (default: true)
  enable_intent_classification: true
```

### Cost Impact

Intent classification adds one extra LLM call per unique query:

- **Cost**: ~$0.00012 per query with gpt-4o-mini
- **When**: Before extracting brand mentions
- **Caching**: Results are cached by query hash, so repeated queries are free

**Example cost breakdown:**

- 3 intents × 1 model = 3 queries
- Intent classification: 3 × $0.00012 = $0.00036
- Extraction: 3 × $0.0002 = (0.0006
- \*\*Total\*\*: ~)0.001 per run

### Database Storage

Intent classifications are stored in `intent_classifications` table:

```sql
SELECT intent_id, intent_type, buyer_stage, urgency_signal, reasoning
FROM intent_classifications
WHERE buyer_stage = 'decision'
  AND urgency_signal = 'high'
ORDER BY classification_confidence DESC;
```

Schema:

```sql
CREATE TABLE intent_classifications (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    query_text TEXT NOT NULL,
    query_hash TEXT NOT NULL,
    intent_type TEXT NOT NULL,
    buyer_stage TEXT NOT NULL,
    urgency_signal TEXT NOT NULL,
    classification_confidence REAL NOT NULL,
    reasoning TEXT,
    timestamp_utc TEXT NOT NULL,
    UNIQUE(run_id, intent_id)
);
```

### Query Hash Caching

Intent classifications are cached by query hash:

```python
# Normalized query → hash
"What are the best email warmup tools?"
→ "5d41402abc4b2a76b9719d911017c592..."

# Same hash for semantically identical queries
"  what are the BEST email warmup tools?  "
→ "5d41402abc4b2a76b9719d911017c592..." (same hash)
```

Caching benefits:

- **Saves API calls**: Repeated queries use cached results
- **Normalizes variations**: Whitespace/case differences don't matter
- **Persistent cache**: Stored in database across runs

## Use Cases

### 1. Prioritize High-Value Queries

Focus on queries with high buyer intent:

```sql
SELECT m.brand, ic.intent_type, ic.buyer_stage, ic.urgency_signal
FROM mentions m
JOIN intent_classifications ic ON m.intent_id = ic.intent_id
WHERE ic.intent_type = 'transactional'
  AND ic.buyer_stage = 'decision'
  AND ic.urgency_signal = 'high'
  AND m.sentiment = 'positive';
```

### 2. Track Sentiment Trends

Monitor how sentiment changes over time:

```sql
SELECT DATE(timestamp_utc) as date,
       sentiment,
       COUNT(*) as mentions
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY DATE(timestamp_utc), sentiment
ORDER BY date DESC;
```

### 3. Identify Context Patterns

See how your brand is typically mentioned:

```sql
SELECT mention_context,
       COUNT(*) as count,
       ROUND(AVG(CASE sentiment
           WHEN 'positive' THEN 1.0
           WHEN 'neutral' THEN 0.5
           WHEN 'negative' THEN 0.0
       END), 2) as sentiment_score
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY mention_context
ORDER BY count DESC;
```

### 4. ROI Analysis

Calculate value of brand mentions by intent:

```sql
SELECT ic.buyer_stage,
       COUNT(DISTINCT m.brand) as brands_mentioned,
       COUNT(*) as total_mentions
FROM mentions m
JOIN intent_classifications ic ON m.intent_id = ic.intent_id
WHERE m.is_mine = 1
GROUP BY ic.buyer_stage
ORDER BY CASE ic.buyer_stage
    WHEN 'decision' THEN 1
    WHEN 'consideration' THEN 2
    WHEN 'awareness' THEN 3
END;
```

## Disabling Features

### Disable Sentiment Analysis

```yaml
extraction_settings:
  enable_sentiment_analysis: false
```

**Result**: `sentiment` and `mention_context` fields will be `None` in database.

### Disable Intent Classification

```yaml
extraction_settings:
  enable_intent_classification: false
```

**Result**: No rows in `intent_classifications` table, queries classified as `None`.

### Disable Both

```yaml
extraction_settings:
  enable_sentiment_analysis: false
  enable_intent_classification: false
```

**Benefit**: Reduces costs by ~33% for extraction calls and eliminates intent classification calls.

## Limitations

### Function Calling Only

Both features require `method: "function_calling"`:

```yaml
extraction_settings:
  method: "function_calling"  # Required
  enable_sentiment_analysis: true
  enable_intent_classification: true
```

Regex extraction does not support these features.

### Provider Support

Currently only OpenAI supports function calling for extraction:

```yaml
extraction_model:
  provider: "openai"  # Required
  model_name: "gpt-4o-mini"
```

Anthropic, Mistral, and other providers coming soon.

### Confidence Thresholds

Low confidence classifications may be inaccurate:

```sql
-- Filter by confidence
SELECT *
FROM intent_classifications
WHERE classification_confidence >= 0.8;
```

## Best Practices

### 1. Enable for High-Value Monitoring

Use sentiment/intent for business-critical queries:

```yaml
# Production config - full analysis
extraction_settings:
  method: "function_calling"
  enable_sentiment_analysis: true
  enable_intent_classification: true
```

### 2. Disable for Cost Optimization

Skip for budget-constrained or high-frequency monitoring:

```yaml
# Cost-optimized config
extraction_settings:
  method: "regex"  # No function calling
  enable_sentiment_analysis: false
  enable_intent_classification: false
```

### 3. Review Classification Reasoning

Check why queries were classified:

```sql
SELECT query_text, intent_type, buyer_stage, reasoning
FROM intent_classifications
WHERE classification_confidence < 0.8;
```

### 4. Track Sentiment Distribution

Monitor the health of your brand's mentions:

```sql
SELECT sentiment,
       COUNT(*) as mentions,
       ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY sentiment;
```

**Healthy distribution**: 70%+ positive, \<10% negative

## Next Steps

- **Function Calling**

  ______________________________________________________________________

  Learn how function calling works

  [Function Calling →](../function-calling/)

- **Query Examples**

  ______________________________________________________________________

  SQL queries for sentiment analysis

  [Query Examples →](../../../data-analytics/query-examples/)

- **Cost Management**

  ______________________________________________________________________

  Understand cost implications

  [Cost Management →](../cost-management/)

- **Trends Analysis**

  ______________________________________________________________________

  Track sentiment over time

  [Trends →](../../../data-analytics/trends-analysis/)

# Historical Tracking

LLM Answer Watcher stores all query results in a local SQLite database for historical trend analysis.

## Features

- **Long-term Storage**: All responses saved indefinitely
- **Trend Analysis**: Track brand visibility over time
- **Comparative Analysis**: Compare performance across dates
- **Data Export**: Query via SQL or export to CSV

## Database Location

```text
./output/watcher.db
```

## Querying Historical Data

```sql
-- Brand mentions over time
SELECT DATE(timestamp_utc) as date,
       COUNT(*) as mentions
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

See [SQLite Database](../../../data-analytics/sqlite-database/) for more queries.

# Cost Management

Control and monitor LLM API costs with built-in budget protection.

## Features

- Pre-run cost estimation
- Budget limits (per run, per intent)
- Real-time cost tracking
- Cost breakdowns by provider/model

## Budget Configuration

```yaml
run_settings:
  budget:
    enabled: true
    max_per_run_usd: 1.00
    max_per_intent_usd: 0.10
    warn_threshold_usd: 0.50
```

## Cost Estimation

Before running, the tool estimates costs based on:

- Number of intents
- Number of models
- Average tokens per query
- Provider pricing

See [Budget Controls](../../configuration/budget/) for detailed configuration.

# HTML Reports

Auto-generated interactive HTML reports for each monitoring run.

## Features

- Brand mention visualization
- Rank distribution charts
- Cost breakdowns
- Raw response inspection
- Historical trends (if multiple runs)

## Report Location

```text
output/YYYY-MM-DDTHH-MM-SSZ/report.html
```

## Opening Reports

```bash
# macOS
open output/2025-11-05T14-30-00Z/report.html

# Linux
xdg-open output/2025-11-05T14-30-00Z/report.html
```

## Report Sections

1. **Summary**: Costs, queries, brands found
1. **Brand Mentions**: Detailed mention tables
1. **Rank Distribution**: Visual charts
1. **Historical Trends**: Performance over time
1. **Raw Responses**: Full LLM outputs
# CLI Usage

# CLI Commands

Complete reference for all LLM Answer Watcher CLI commands.

## Command Structure

```bash
llm-answer-watcher [COMMAND] [OPTIONS]
```

## Global Options

Available for all commands:

| Option      | Description              |
| ----------- | ------------------------ |
| `--help`    | Show help message        |
| `--version` | Show version information |

## Commands

### `run`

Execute a monitoring run with configured models and intents.

**Usage**:

```bash
llm-answer-watcher run --config CONFIG_PATH [OPTIONS]
```

**Required Arguments**:

- `--config PATH` - Path to YAML configuration file

**Options**:

| Option                          | Default | Description               |
| ------------------------------- | ------- | ------------------------- |
| `--format [human\|json\|quiet]` | `human` | Output format             |
| `--yes, -y`                     | `false` | Skip confirmation prompts |
| `--force`                       | `false` | Override budget limits    |
| `--verbose, -v`                 | `false` | Enable verbose logging    |

**Examples**:

```bash
# Human-friendly output (default)
llm-answer-watcher run --config config.yaml

# JSON output for automation
llm-answer-watcher run --config config.yaml --format json

# Quiet mode for scripts
llm-answer-watcher run --config config.yaml --quiet

# Auto-confirm (no prompts)
llm-answer-watcher run --config config.yaml --yes

# Override budget limits
llm-answer-watcher run --config config.yaml --force

# Verbose logging
llm-answer-watcher run --config config.yaml --verbose
```

**Exit Codes**:

- `0`: Success
- `1`: Configuration error
- `2`: Database error
- `3`: Partial failure
- `4`: Complete failure

### `validate`

Validate configuration file without running queries.

**Usage**:

```bash
llm-answer-watcher validate --config CONFIG_PATH [OPTIONS]
```

**Required Arguments**:

- `--config PATH` - Path to YAML configuration file

**Options**:

| Option          | Default | Description              |
| --------------- | ------- | ------------------------ |
| `--verbose, -v` | `false` | Show detailed validation |

**Examples**:

```bash
# Basic validation
llm-answer-watcher validate --config config.yaml

# Detailed validation
llm-answer-watcher validate --config config.yaml --verbose
```

**Output**:

```text
✅ Configuration valid
├── Models: 2 configured (openai gpt-4o-mini, anthropic claude-3-5-haiku)
├── Brands: 3 mine, 8 competitors
├── Intents: 4 queries
└── Estimated cost: $0.024 (8 queries total)
```

### `eval`

Run evaluation framework to test extraction accuracy.

**Usage**:

```bash
llm-answer-watcher eval --fixtures FIXTURES_PATH [OPTIONS]
```

**Required Arguments**:

- `--fixtures PATH` - Path to test fixtures YAML file

**Options**:

| Option                   | Default             | Description              |
| ------------------------ | ------------------- | ------------------------ |
| `--db PATH`              | `./eval_results.db` | Evaluation database path |
| `--format [human\|json]` | `human`             | Output format            |

**Examples**:

```bash
# Run evaluation suite
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml

# Custom database
llm-answer-watcher eval --fixtures fixtures.yaml --db my_evals.db

# JSON output for CI/CD
llm-answer-watcher eval --fixtures fixtures.yaml --format json
```

**Exit Codes**:

- `0`: All tests passed
- `1`: Tests failed (below thresholds)
- `2`: Configuration error

### `prices show`

Display current LLM pricing information.

**Usage**:

```bash
llm-answer-watcher prices show [OPTIONS]
```

**Options**:

| Option                   | Description        |
| ------------------------ | ------------------ |
| `--provider NAME`        | Filter by provider |
| `--format [human\|json]` | Output format      |

**Examples**:

```bash
# Show all pricing
llm-answer-watcher prices show

# OpenAI pricing only
llm-answer-watcher prices show --provider openai

# JSON format
llm-answer-watcher prices show --format json
```

### `prices refresh`

Refresh pricing cache from llm-prices.com.

**Usage**:

```bash
llm-answer-watcher prices refresh [OPTIONS]
```

**Options**:

| Option    | Description                  |
| --------- | ---------------------------- |
| `--force` | Force refresh (ignore cache) |

**Examples**:

```bash
# Refresh if cache expired
llm-answer-watcher prices refresh

# Force refresh
llm-answer-watcher prices refresh --force
```

### `prices list`

List all available models with pricing.

**Usage**:

```bash
llm-answer-watcher prices list [OPTIONS]
```

**Options**:

| Option                   | Description        |
| ------------------------ | ------------------ |
| `--provider NAME`        | Filter by provider |
| `--format [human\|json]` | Output format      |

**Examples**:

```bash
# List all models
llm-answer-watcher prices list

# Anthropic models only
llm-answer-watcher prices list --provider anthropic

# Export as JSON
llm-answer-watcher prices list --format json > models.json
```

## Output Modes

### Human Mode (Default)

Beautiful Rich-formatted output with colors, spinners, and tables.

```bash
llm-answer-watcher run --config config.yaml
```

**Features**:

- Progress spinners
- Colored status indicators
- Formatted tables
- Visual charts

**Best for**: Interactive terminal use

### JSON Mode

Structured JSON output for programmatic consumption.

```bash
llm-answer-watcher run --config config.yaml --format json
```

**Features**:

- Valid JSON output
- No ANSI codes
- Machine-readable
- Complete metadata

**Best for**: AI agents, scripts, APIs

### Quiet Mode

Minimal tab-separated output.

```bash
llm-answer-watcher run --config config.yaml --quiet
```

**Output format**:

```text
RUN_ID  STATUS  QUERIES COST    OUTPUT_DIR
```

**Best for**: Shell scripts, pipelines

## Common Workflows

### Development

```bash
# Validate config
llm-answer-watcher validate --config dev.yaml

# Run with verbose logging
llm-answer-watcher run --config dev.yaml --verbose
```

### Production

```bash
# Auto-confirm, JSON output
llm-answer-watcher run --config prod.yaml --yes --format json
```

### CI/CD

```bash
# Quiet mode with exit code checking
llm-answer-watcher run --config ci.yaml --quiet --yes
if [ $? -eq 0 ]; then
    echo "Success"
else
    echo "Failed" && exit 1
fi
```

## Next Steps

- [Learn about output modes](../output-modes/)
- [Understand exit codes](../exit-codes/)
- [Automate runs](../automation/)

# Output Modes

LLM Answer Watcher supports three output modes to serve different use cases: humans, AI agents, and shell scripts.

## Human Mode (Default)

Beautiful Rich-formatted output designed for interactive terminal use.

### Usage

```bash
llm-answer-watcher run --config config.yaml
# or explicitly:
llm-answer-watcher run --config config.yaml --format human
```

### Features

- **Progress Spinners**: Real-time progress indication
- **Colors**: Status indicators (✅ green success, ❌ red errors)
- **Tables**: Formatted data presentation
- **Panels**: Organized information display
- **Live Updates**: Dynamic progress tracking

### Example Output

```text
🔍 Running LLM Answer Watcher...
├── Configuration loaded from config.yaml
├── Models: 2 configured
├── Intents: 3 queries
└── Estimated cost: $0.012

📤 Query 1/3: "What are the best email warmup tools?"
├── Provider: OpenAI (gpt-4o-mini)
├── Sending request... ⏳
├── ✅ Response received (1.2s)
├── Tokens: 145 input, 387 output
├── Cost: $0.004
└── Brands detected: 3 found

✅ Run completed successfully!

📊 Summary:
├── Run ID: 2025-11-05T14-30-00Z
├── Queries: 3/3 completed (100%)
├── Total cost: $0.012
└── Output: ./output/2025-11-05T14-30-00Z/
```

## JSON Mode

Structured JSON output for programmatic consumption and AI agent automation.

### Usage

```bash
llm-answer-watcher run --config config.yaml --format json
```

### Features

- **Valid JSON**: Parseable by any JSON library
- **No ANSI Codes**: Clean output for parsing
- **Complete Metadata**: All run information included
- **Deterministic**: Same format every time

### Output Structure

```json
{
  "run_id": "2025-11-05T14-30-00Z",
  "status": "success",
  "timestamp_utc": "2025-11-05T14:30:00Z",
  "queries_completed": 3,
  "queries_failed": 0,
  "total_cost_usd": 0.012,
  "output_dir": "./output/2025-11-05T14-30-00Z",
  "brands_detected": {
    "mine": ["Lemwarm", "Lemlist"],
    "competitors": ["Instantly", "HubSpot", "Apollo.io"]
  },
  "per_intent_results": [
    {
      "intent_id": "best-email-warmup-tools",
      "status": "success",
      "cost_usd": 0.004,
      "brands_found": ["Lemwarm", "Instantly", "HubSpot"]
    }
  ]
}
```

### Use Cases

#### AI Agent Automation

```python
import subprocess
import json

result = subprocess.run([
    "llm-answer-watcher", "run",
    "--config", "config.yaml",
    "--format", "json",
    "--yes"
], capture_output=True, text=True)

data = json.loads(result.stdout)

if data["status"] == "success":
    print(f"Found {len(data['brands_detected']['mine'])} of our brands")
```

#### CI/CD Integration

```yaml
# .github/workflows/brand-monitoring.yml
- name: Run Brand Monitoring
  id: monitor
  run: |
    OUTPUT=$(llm-answer-watcher run --config config.yaml --format json --yes)
    echo "result=$OUTPUT" >> $GITHUB_OUTPUT

- name: Check Results
  run: |
    STATUS=$(echo '${{ steps.monitor.outputs.result }}' | jq -r '.status')
    if [ "$STATUS" != "success" ]; then
      exit 1
    fi
```

## Quiet Mode

Minimal tab-separated output for shell scripts and pipelines.

### Usage

```bash
llm-answer-watcher run --config config.yaml --quiet
```

### Output Format

```text
RUN_ID  STATUS  QUERIES_COMPLETED   COST_USD    OUTPUT_DIR
```

### Example Output

```text
2025-11-05T14-30-00Z    success 3   0.012   ./output/2025-11-05T14-30-00Z
```

### Use Cases

#### Shell Scripts

```bash
#!/bin/bash
OUTPUT=$(llm-answer-watcher run --config config.yaml --quiet --yes)

RUN_ID=$(echo "$OUTPUT" | cut -f1)
STATUS=$(echo "$OUTPUT" | cut -f2)
COST=$(echo "$OUTPUT" | cut -f4)

echo "Run $RUN_ID completed with status $STATUS (cost: \$$ $COST)"
```

#### CSV Export

```bash
# Append to CSV file
echo "timestamp,run_id,status,queries,cost" > monitoring_log.csv
llm-answer-watcher run --config config.yaml --quiet --yes >> monitoring_log.csv
```

#### Pipeline Processing

```bash
# Process multiple configs
for config in configs/*.yaml; do
    llm-answer-watcher run --config "$config" --quiet --yes | \
        awk '{print $1 "\t" $2 "\t" $4}'
done
```

## Comparing Output Modes

| Feature                 | Human       | JSON       | Quiet   |
| ----------------------- | ----------- | ---------- | ------- |
| **Colors/Emojis**       | ✅ Yes      | ❌ No      | ❌ No   |
| **Progress Indicators** | ✅ Yes      | ❌ No      | ❌ No   |
| **Machine Parseable**   | ❌ No       | ✅ Yes     | ✅ Yes  |
| **Size**                | Large       | Medium     | Minimal |
| **Use Case**            | Interactive | Automation | Scripts |
| **ANSI Codes**          | ✅ Yes      | ❌ No      | ❌ No   |

## Verbose Logging

Enable verbose logging in any mode:

```bash
llm-answer-watcher run --config config.yaml --verbose
```

Adds detailed logging information:

```text
[2025-11-05 14:30:00] INFO: Loading configuration from config.yaml
[2025-11-05 14:30:00] DEBUG: Validating YAML schema
[2025-11-05 14:30:00] DEBUG: Resolving environment variables
[2025-11-05 14:30:00] INFO: API key loaded for provider: openai
[2025-11-05 14:30:01] DEBUG: Sending request to OpenAI API
[2025-11-05 14:30:02] DEBUG: Response received: 200 OK
```

## Mode Selection Guide

### Choose Human Mode When:

- Running manually in terminal
- Debugging configuration issues
- Watching progress in real-time
- Presenting to stakeholders

### Choose JSON Mode When:

- Integrating with AI agents
- Building dashboards/UIs
- Processing results programmatically
- CI/CD automation

### Choose Quiet Mode When:

- Shell script automation
- Logging to files
- CSV/TSV export
- Minimal bandwidth/storage

## Next Steps

- [Learn about exit codes](../exit-codes/)
- [Automate monitoring runs](../automation/)
- [See CI/CD examples](../../../examples/ci-cd-integration/)

# Exit Codes

LLM Answer Watcher uses standardized exit codes for automation and error handling.

## Exit Code Reference

| Code  | Status              | Meaning                            | When It Occurs                                       |
| ----- | ------------------- | ---------------------------------- | ---------------------------------------------------- |
| **0** | Success             | All queries completed successfully | No errors encountered                                |
| **1** | Configuration Error | Invalid configuration              | YAML syntax errors, missing API keys, invalid schema |
| **2** | Database Error      | Cannot access database             | SQLite file locked, permissions issue, disk full     |
| **3** | Partial Failure     | Some queries failed                | LLM API errors, rate limits, timeouts                |
| **4** | Complete Failure    | No queries succeeded               | All queries failed, fatal errors                     |

## Exit Code 0: Success

All queries completed without errors.

**When**:

- All LLM API calls succeeded
- All brands extracted successfully
- All data written to database
- Reports generated

**Example**:

```bash
llm-answer-watcher run --config config.yaml
echo $?  # Prints: 0
```

## Exit Code 1: Configuration Error

Configuration file has issues.

**When**:

- YAML syntax errors
- Missing required fields
- Invalid provider names
- API keys not found in environment
- Invalid model names
- Budget misconfiguration

**Examples**:

```yaml
# Missing required field
run_settings:
  # output_dir missing!
  models:
    - provider: "openai"
```

```yaml
# Invalid provider
models:
  - provider: "invalid_provider"  # Not supported
```

**Handling**:

```bash
llm-answer-watcher run --config config.yaml
if [ $? -eq 1 ]; then
    echo "Configuration error - check your YAML file"
    exit 1
fi
```

## Exit Code 2: Database Error

Cannot create or access SQLite database.

**When**:

- Database file locked by another process
- Insufficient disk space
- Permission denied on output directory
- Corrupted database file

**Handling**:

```bash
llm-answer-watcher run --config config.yaml
case $? in
    2)
        echo "Database error - check permissions and disk space"
        # Try to fix permissions
        chmod 755 output/
        # Retry
        llm-answer-watcher run --config config.yaml
        ;;
esac
```

## Exit Code 3: Partial Failure

Some queries succeeded, others failed.

**When**:

- Rate limits hit mid-run
- Network timeouts
- Invalid API responses
- Model-specific errors

**Example Scenario**:

```text
3 intents × 2 models = 6 total queries
✅ 4 succeeded
❌ 2 failed (rate limit)
Exit code: 3 (partial failure)
```

**Handling**:

```bash
llm-answer-watcher run --config config.yaml --format json > result.json
if [ $? -eq 3 ]; then
    echo "⚠️ Partial failure - some queries failed"
    # Check which queries failed
    jq '.per_intent_results[] | select(.status=="failed")' result.json
    # Continue with successful results
fi
```

**Best Practice**: Accept partial failures in production. The succeeded queries still provide value.

## Exit Code 4: Complete Failure

All queries failed.

**When**:

- All API keys invalid
- Network completely down
- All models unreachable
- Severe runtime errors

**Handling**:

```bash
llm-answer-watcher run --config config.yaml
if [ $? -eq 4 ]; then
    echo "❌ Complete failure - no queries succeeded"
    # Alert on-call engineer
    # Don't continue pipeline
    exit 1
fi
```

## Practical Examples

### Basic Error Handling

```bash
#!/bin/bash
llm-answer-watcher run --config config.yaml --yes

case $? in
    0)
        echo "✅ Success - all queries completed"
        ;;
    1)
        echo "❌ Configuration error - fix YAML file"
        exit 1
        ;;
    2)
        echo "❌ Database error - check permissions"
        exit 1
        ;;
    3)
        echo "⚠️ Partial failure - continuing"
        # Partial success is OK
        ;;
    4)
        echo "❌ Complete failure - aborting"
        exit 1
        ;;
esac
```

### Retry Logic

```bash
#!/bin/bash
MAX_RETRIES=3
RETRY_COUNT=0

while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
    llm-answer-watcher run --config config.yaml --yes
    EXIT_CODE=$?

    case $EXIT_CODE in
        0|3)
            # Success or partial success
            exit 0
            ;;
        1|2)
            # Config or DB error - don't retry
            exit $EXIT_CODE
            ;;
        4)
            # Complete failure - retry
            RETRY_COUNT=$((RETRY_COUNT + 1))
            echo "Retry $RETRY_COUNT/$MAX_RETRIES after complete failure"
            sleep $((2 ** RETRY_COUNT))  # Exponential backoff
            ;;
    esac
done

echo "Max retries exceeded"
exit 4
```

### CI/CD Integration

```yaml
# .github/workflows/monitoring.yml
- name: Run Monitoring
  id: monitor
  run: |
    llm-answer-watcher run --config config.yaml --format json --yes
    echo "exit_code=$?" >> $GITHUB_OUTPUT
  continue-on-error: true

- name: Handle Result
  run: |
    case ${{ steps.monitor.outputs.exit_code }} in
      0)
        echo "✅ Success"
        ;;
      1)
        echo "❌ Configuration error"
        exit 1
        ;;
      2)
        echo "❌ Database error"
        exit 1
        ;;
      3)
        echo "⚠️ Partial failure (acceptable)"
        ;;
      4)
        echo "❌ Complete failure"
        exit 1
        ;;
    esac
```

### Alerting Based on Exit Codes

```bash
#!/bin/bash
llm-answer-watcher run --config config.yaml --yes
EXIT_CODE=$?

if [ $EXIT_CODE -eq 4 ]; then
    # Send alert for complete failure
    curl -X POST https://alerts.example.com/webhook \
        -d '{"alert": "LLM monitoring complete failure", "severity": "critical"}'
elif [ $EXIT_CODE -eq 3 ]; then
    # Log partial failure (no alert)
    echo "$(date): Partial failure" >> /var/log/monitoring.log
fi
```

## Testing Exit Codes

### Simulate Errors

Test your error handling:

```bash
# Force configuration error
llm-answer-watcher run --config nonexistent.yaml
echo $?  # Should be 1

# Invalid API key
export OPENAI_API_KEY=invalid
llm-answer-watcher run --config config.yaml
echo $?  # Should be 1 or 4
```

### Validation Testing

```bash
# This should exit 0 (validation success)
llm-answer-watcher validate --config config.yaml
echo $?
```

## Best Practices

### 1. Always Check Exit Codes

```bash
# ❌ Bad - ignores errors
llm-answer-watcher run --config config.yaml

# ✅ Good - checks exit code
llm-answer-watcher run --config config.yaml
if [ $? -ne 0 ]; then
    handle_error
fi
```

### 2. Differentiate Error Types

Don't treat all non-zero exits the same:

```bash
# ✅ Good - handles each error type
case $? in
    1|2) exit 1 ;;      # Fatal - abort
    3) continue ;;      # Partial - OK
    4) retry ;;         # Complete - retry
esac
```

### 3. Log Exit Codes

```bash
EXIT_CODE=$?
echo "$(date): Exit code $EXIT_CODE" >> monitoring.log
```

### 4. Accept Partial Failures

In production, partial success is often acceptable:

```bash
if [ $EXIT_CODE -eq 0 ] || [ $EXIT_CODE -eq 3 ]; then
    echo "Run completed with usable results"
    continue_pipeline
fi
```

## Next Steps

- [Learn about output modes](../output-modes/)
- [Automate monitoring runs](../automation/)
- [See CI/CD examples](../../../examples/ci-cd-integration/)

# Automation

Automate LLM Answer Watcher runs with cron, GitHub Actions, or custom schedulers.

## Quick Start

```bash
# Run with no prompts
llm-answer-watcher run --config config.yaml --yes --format json
```

## Cron Jobs

### Basic Cron Setup

Edit crontab:

```bash
crontab -e
```

Add scheduled job:

```text
# Run daily at 9 AM
0 9 * * * /path/to/.venv/bin/llm-answer-watcher run --config /path/to/config.yaml --yes --quiet >> /var/log/monitoring.log 2>&1

# Run weekly on Monday
0 9 * * 1 /path/to/.venv/bin/llm-answer-watcher run --config /path/to/config.yaml --yes --format json > /path/to/results/$(date +\%Y-\%m-\%d).json
```

### Production Cron Script

```bash
#!/bin/bash
# /usr/local/bin/run-monitoring.sh

set -euo pipefail

# Configuration
CONFIG="/home/user/monitoring/config.yaml"
VENV="/home/user/llm-answer-watcher/.venv"
LOG_DIR="/var/log/monitoring"

# Load environment
source "$VENV/bin/activate"
source /home/user/.env  # API keys

# Run with error handling
"$VENV/bin/llm-answer-watcher" run \
    --config "$CONFIG" \
    --yes \
    --format json \
    > "$LOG_DIR/$(date +%Y-%m-%d).json" 2>&1

EXIT_CODE=$?

# Alert on failure
if [ $EXIT_CODE -eq 4 ]; then
    echo "Monitoring failed" | mail -s "Alert: Monitoring Failure" ops@example.com
fi

exit $EXIT_CODE
```

Make executable and schedule:

```bash
chmod +x /usr/local/bin/run-monitoring.sh

# Add to crontab
0 9 * * * /usr/local/bin/run-monitoring.sh
```

## GitHub Actions

### Basic Workflow

`.github/workflows/brand-monitoring.yml`:

```yaml
name: Brand Monitoring

on:
  schedule:
    # Run daily at 9 AM UTC
    - cron: '0 9 * * *'
  workflow_dispatch:  # Manual trigger

jobs:
  monitor:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: |
          pip install uv
          uv sync

      - name: Run monitoring
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          uv run llm-answer-watcher run \
            --config configs/production.yaml \
            --yes \
            --format json \
            > results.json

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: monitoring-results
          path: |
            results.json
            output/

      - name: Commit database
        run: |
          git config --local user.email "bot@example.com"
          git config --local user.name "Monitoring Bot"
          git add output/watcher.db
          git commit -m "Update monitoring data"
          git push
```

### Advanced Workflow with Notifications

```yaml
name: Advanced Brand Monitoring

on:
  schedule:
    - cron: '0 9 * * *'

jobs:
  monitor:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install
        run: |
          pip install uv
          uv sync

      - name: Run monitoring
        id: monitor
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          uv run llm-answer-watcher run \
            --config config.yaml \
            --yes \
            --format json | tee results.json
          echo "exit_code=$?" >> $GITHUB_OUTPUT
        continue-on-error: true

      - name: Parse results
        id: parse
        run: |
          COST=$(jq -r '.total_cost_usd' results.json)
          BRANDS=$(jq -r '.brands_detected.mine | length' results.json)
          echo "cost=$COST" >> $GITHUB_OUTPUT
          echo "brands_found=$BRANDS" >> $GITHUB_OUTPUT

      - name: Slack notification
        if: steps.monitor.outputs.exit_code == '0'
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "✅ Brand monitoring completed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Brand Monitoring Results*\n• Cost: $$${{ steps.parse.outputs.cost }}\n• Brands found: ${{ steps.parse.outputs.brands_found }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

      - name: Alert on failure
        if: steps.monitor.outputs.exit_code == '4'
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ Brand monitoring failed completely"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
```

## Docker Automation

### Dockerfile

```dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy project
COPY . .

# Install dependencies
RUN uv sync

# Set entrypoint
ENTRYPOINT ["uv", "run", "llm-answer-watcher"]
CMD ["run", "--config", "config.yaml", "--yes", "--format", "json"]
```

### Docker Compose

```yaml
# docker-compose.yml
version: '3.8'

services:
  monitoring:
    build: .
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./output:/app/output
      - ./configs:/app/configs
    command: run --config configs/production.yaml --yes --format json
```

Run:

```bash
docker-compose up
```

## Best Practices

### 1. Use --yes Flag

Skip confirmation prompts:

```bash
llm-answer-watcher run --config config.yaml --yes
```

### 2. Use JSON or Quiet Mode

For parsing:

```bash
llm-answer-watcher run --config config.yaml --yes --format json
```

### 3. Handle Exit Codes

```bash
llm-answer-watcher run --config config.yaml --yes
case $? in
    0|3) echo "Success or partial success" ;;
    *) echo "Error occurred" && exit 1 ;;
esac
```

### 4. Secure API Keys

Never hardcode API keys:

```bash
# ✅ Good - from environment
export OPENAI_API_KEY=sk-...

# ✅ Good - from secrets management
OPENAI_API_KEY=$(aws secretsmanager get-secret-value --secret-id openai-key)
```

### 5. Log Output

```bash
llm-answer-watcher run --config config.yaml --yes \
    --format json \
    > /var/log/monitoring/$(date +%Y-%m-%d).json 2>&1
```

### 6. Rotate Logs

```bash
# Keep last 30 days
find /var/log/monitoring -name "*.json" -mtime +30 -delete
```

## Next Steps

- [See CI/CD examples](../../../examples/ci-cd-integration/)
- [Learn about output modes](../output-modes/)
- [Understand exit codes](../exit-codes/)
# Supported Providers

# Provider Overview

LLM Answer Watcher supports 6 major LLM providers with a unified interface. Choose providers based on cost, performance, and feature requirements.

> **🌐 New in v0.2.0**: Browser Runners - Access ChatGPT and Perplexity via web UI automation to capture the true user experience. See [Browser vs API Access](#browser-vs-api-access) below.

## Supported Providers

| Provider        | Models                         | Cost Range             | Web Search  | Best For                    |
| --------------- | ------------------------------ | ---------------------- | ----------- | --------------------------- |
| **OpenAI**      | gpt-4o-mini, gpt-4o, more      | (0.15-)10/1M tokens    | ✅ Yes      | General use, cost-effective |
| **Anthropic**   | Claude 3.5 Haiku, Sonnet, Opus | (0.80-)75/1M tokens    | ❌ No       | High-quality responses      |
| **Mistral**     | mistral-large, mistral-small   | (0.30-)2/1M tokens     | ❌ No       | European alternative        |
| **X.AI (Grok)** | grok-beta, grok-2, grok-3      | (2-)25/1M tokens       | ❌ No       | X platform integration      |
| **Google**      | Gemini 2.0 Flash               | (0.075-)0.30/1M tokens | ❌ No       | Low-cost option             |
| **Perplexity**  | Sonar, Sonar Pro               | (1-)15/1M tokens       | ✅ Built-in | Grounded responses          |

## Quick Configuration

### Single Provider

```yaml
run_settings:
  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"
```

### Multiple Providers

```yaml
run_settings:
  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

    - provider: "perplexity"
      model_name: "sonar-pro"
      env_api_key: "PERPLEXITY_API_KEY"
```

## Provider Selection Guide

### By Budget

**Ultra Low Cost (\<$0.005 per query):**

- Google Gemini 2.0 Flash
- OpenAI gpt-4o-mini

**Low Cost ($0.005-0.01 per query):**

- Mistral mistral-small
- Anthropic Claude 3.5 Haiku

**Medium Cost ($0.01-0.05 per query):**

- OpenAI gpt-4o
- Anthropic Claude 3.5 Sonnet
- Perplexity Sonar Pro

**High Cost (>$0.05 per query):**

- Anthropic Claude 3.5 Opus
- Grok grok-3
- OpenAI gpt-4-turbo

### By Feature

**Web Search Required:**

- ✅ OpenAI (with tools configuration)
- ✅ Perplexity (built-in)

**No Web Search:**

- Anthropic, Mistral, Grok, Google

**Grounded Responses:**

- ✅ Perplexity (best)
- ✅ OpenAI with web search

**High Quality:**

- Anthropic Claude 3.5 Sonnet/Opus
- OpenAI gpt-4o
- Perplexity Sonar Pro

**Fast Response:**

- OpenAI gpt-4o-mini
- Google Gemini Flash
- Mistral mistral-small

### By Use Case

**Cost-Optimized Monitoring:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
```

**High-Quality Analysis:**

```yaml
models:
  - provider: "anthropic"
    model_name: "claude-3-5-sonnet-20241022"
```

**Multi-Provider Comparison:**

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"
  - provider: "perplexity"
    model_name: "sonar"
```

**Web Search Required:**

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
```

## API Key Setup

### OpenAI

```bash
export OPENAI_API_KEY=sk-your-openai-key-here
```

Get key: https://platform.openai.com/api-keys

### Anthropic

```bash
export ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
```

Get key: https://console.anthropic.com/

### Mistral

```bash
export MISTRAL_API_KEY=your-mistral-key-here
```

Get key: https://console.mistral.ai/

### X.AI (Grok)

```bash
export XAI_API_KEY=xai-your-grok-key-here
```

Get key: https://console.x.ai/

### Google Gemini

```bash
export GOOGLE_API_KEY=AIza-your-google-api-key-here
```

Get key: https://aistudio.google.com/apikey

### Perplexity

```bash
export PERPLEXITY_API_KEY=pplx-your-perplexity-key-here
```

Get key: https://www.perplexity.ai/settings/api

## Provider Comparison

### Response Quality

**Best to Good:**

1. Anthropic Claude 3.5 Opus
1. Anthropic Claude 3.5 Sonnet
1. OpenAI gpt-4o
1. Perplexity Sonar Pro
1. Mistral mistral-large
1. Grok grok-3
1. Anthropic Claude 3.5 Haiku
1. OpenAI gpt-4o-mini
1. Google Gemini 2.0 Flash
1. Mistral mistral-small

### Cost Efficiency

**Best value (quality per dollar):**

1. OpenAI gpt-4o-mini
1. Google Gemini 2.0 Flash
1. Anthropic Claude 3.5 Haiku
1. Mistral mistral-small
1. Perplexity Sonar

### Speed

**Fastest to Slowest:**

1. Google Gemini Flash
1. OpenAI gpt-4o-mini
1. Mistral models
1. Perplexity Sonar
1. Anthropic Haiku
1. OpenAI gpt-4o
1. Anthropic Sonnet
1. Grok models
1. Anthropic Opus

## Rate Limits

Default rate limits (check provider docs for current limits):

| Provider   | Requests/Min | Tokens/Min |
| ---------- | ------------ | ---------- |
| OpenAI     | 500          | 90,000     |
| Anthropic  | 50           | 100,000    |
| Mistral    | 5-60         | Varies     |
| X.AI       | 60           | 120,000    |
| Google     | 15           | 32,000     |
| Perplexity | 20           | Varies     |

**Recommendation:** Add delays between queries if hitting rate limits:

```yaml
run_settings:
  delay_between_queries: 2  # seconds
```

## Provider-Specific Features

### OpenAI

- ✅ Web search via tools
- ✅ Function calling
- ✅ JSON mode
- ✅ Vision support (not used)

See [OpenAI Provider](../openai/)

### Anthropic

- ✅ Extended context (200K tokens)
- ✅ Function calling
- ✅ JSON mode
- ✅ Thinking mode (not used)

See [Anthropic Provider](../anthropic/)

### Mistral

- ✅ European data residency
- ✅ Function calling
- ✅ JSON mode
- ✅ Competitive pricing

See [Mistral Provider](../mistral/)

### X.AI (Grok)

- ✅ X platform integration
- ✅ OpenAI-compatible API
- ✅ Real-time information
- ⚠️ Limited model selection

See [Grok Provider](../grok/)

### Google

- ✅ Very low cost
- ✅ Fast responses
- ✅ Long context (1M tokens)
- ⚠️ Newer platform

See [Google Provider](../google/)

### Perplexity

- ✅ Built-in web search
- ✅ Grounded responses
- ✅ Citations included
- ✅ Real-time information
- ⚠️ Request fees (not in cost estimate)

See [Perplexity Provider](../perplexity/)

## Multi-Provider Strategies

### Strategy 1: Cost vs Quality

Cheap model for volume, expensive for quality:

```yaml
models:
  # High volume, low cost
  - provider: "openai"
    model_name: "gpt-4o-mini"

  # Occasional high-quality check
  - provider: "anthropic"
    model_name: "claude-3-5-sonnet-20241022"
    enabled_for: ["critical-intent"]
```

### Strategy 2: Provider Diversity

Avoid single-provider dependency:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"

  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"

  - provider: "google"
    model_name: "gemini-2.0-flash-exp"
```

### Strategy 3: Web Search + Standard

```yaml
models:
  # Standard queries
  - provider: "openai"
    model_name: "gpt-4o-mini"

  # Web-search enabled
  - provider: "perplexity"
    model_name: "sonar-pro"
```

## Common Issues

### API Key Errors

```text
❌ API key not found: OPENAI_API_KEY
```

**Solution:**

```bash
export OPENAI_API_KEY=sk-your-key-here
```

### Rate Limit Exceeded

```text
⚠️ Rate limit exceeded for openai/gpt-4o-mini
```

**Solutions:**

1. Add delay: `delay_between_queries: 2`
1. Reduce concurrent requests
1. Upgrade API tier

### Model Not Found

```text
❌ Model not found: gpt-4-mini
```

**Solution:** Use correct model name: `gpt-4o-mini`

See provider docs for valid models.

### Authentication Failed

```text
❌ Authentication failed: Invalid API key
```

**Solutions:**

1. Check key spelling
1. Regenerate key at provider console
1. Verify key has correct permissions

## Browser vs API Access

### Two Ways to Access Providers

Starting in v0.2.0, LLM Answer Watcher supports **two access methods** for supported providers:

| Access Method             | Providers           | How It Works                       | Use Cases                                         |
| ------------------------- | ------------------- | ---------------------------------- | ------------------------------------------------- |
| **API Access**            | All 6 providers     | Direct API calls with your API key | Production monitoring, cost-optimized, fast       |
| **Browser Access (BETA)** | ChatGPT, Perplexity | Headless browser via Steel API     | True user experience, screenshots, web UI testing |

### Key Differences

**API Access:**

- ✅ Faster (no browser overhead)
- ✅ Accurate cost tracking
- ✅ Token usage metrics
- ✅ Programmatic control
- ❌ May differ from web UI behavior
- ❌ No visual evidence

**Browser Access:**

- ✅ Captures actual user experience
- ✅ Screenshots and HTML snapshots
- ✅ Tests web UI behavior
- ✅ Free tier usage (no API costs)
- ❌ Slower (10-30s overhead)
- ❌ No cost tracking yet (shows $0.00)
- ❌ Subject to UI changes

### When to Use Each

**Use API Access when:**

- You need fast, automated monitoring
- Cost tracking is important
- You're running high-volume queries
- You need programmatic control

**Use Browser Access when:**

- You want to verify web UI behavior
- You need visual evidence (screenshots)
- You're testing free tier experience
- You want to see what actual users see

### Example: Comparing Both

```yaml
runners:
  # API access for production monitoring
  - runner_plugin: "api"
    config:
      provider: "openai"
      model_name: "gpt-4o-mini"
      api_key: "${OPENAI_API_KEY}"

  # Browser access to verify web UI
  - runner_plugin: "steel-chatgpt"
    config:
      steel_api_key: "${STEEL_API_KEY}"
      take_screenshots: true
```

This configuration runs the same query through both methods, letting you compare:

- Does the API response match what users see in ChatGPT?
- Are citations/sources displayed differently?
- Does the web UI recommend different brands?

See [Browser Runners Guide](../../BROWSER_RUNNERS/) for complete details.

## Next Steps

- **OpenAI**

  ______________________________________________________________________

  Complete OpenAI provider guide

  [OpenAI Provider →](../openai/)

- **Anthropic**

  ______________________________________________________________________

  Claude models documentation

  [Anthropic Provider →](../anthropic/)

- **Perplexity**

  ______________________________________________________________________

  Grounded LLMs with web search

  [Perplexity Provider →](../perplexity/)

- **Browser Runners**

  ______________________________________________________________________

  Web UI automation guide

  [Browser Runners →](../../BROWSER_RUNNERS/)

# OpenAI Provider

Integration with OpenAI's GPT models.

## Supported Models

- `gpt-4o` - Latest GPT-4 Optimized
- `gpt-4o-mini` - Cost-effective model (recommended)
- `gpt-4-turbo` - Fast GPT-4
- `gpt-3.5-turbo` - Legacy model

## Configuration

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
```

## Getting API Key

1. Visit [platform.openai.com](https://platform.openai.com/api-keys)
1. Create new secret key
1. Export: `export OPENAI_API_KEY=sk-your-key`

## Pricing

- **gpt-4o-mini**: $0.15/1M input, $0.60/1M output
- **gpt-4o**: $2.50/1M input, $10/1M output

## Web Search Tool

OpenAI supports web search through the `web_search` tool in the Responses API.

### Basic Web Search Configuration

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    tools:
      - type: "web_search"
    tool_choice: "auto"  # Model decides when to search
```

### Tool Choice Options

- **`auto`** (recommended): Model decides when web search is needed
- **`required`**: Force web search for every query
- **`none`**: Disable web search

### Web Search Pricing

Web search adds **$10 per 1,000 calls** plus content token costs.

**Example cost** (gpt-4o-mini):

```text
Base query: $0.0004 (tokens only)
+ Web search call: $0.01
+ Search content: $0.0012 (8k tokens @ $0.15/1M)
= Total: ~$0.0116 per query
```

### When to Use OpenAI Web Search

**Use OpenAI when**:

- ✅ Need explicit `tool_choice` control
- ✅ Prefer OpenAI's LLM reasoning quality
- ✅ Already invested in OpenAI ecosystem

**Consider alternatives**:

- **Google Gemini grounding**: 290x cheaper (~$0.00004 vs $0.0116)
- **Perplexity**: Built-in citations, always-on search

See [Web Search Configuration](../../user-guide/configuration/web-search/) for detailed setup and comparison.

## Further Reading

- [Web Search Configuration](../../user-guide/configuration/web-search/) - Detailed web search setup
- [Model Configuration](../../user-guide/configuration/models/) - Model selection guide
- [Providers Overview](../overview/) - Compare all providers

# Anthropic Provider

Integration with Anthropic's Claude models.

## Supported Models

- `claude-3-5-sonnet-20241022` - Latest Sonnet
- `claude-3-5-haiku-20241022` - Fast and affordable
- `claude-3-opus-20240229` - Most capable

## Configuration

```yaml
models:
  - provider: "anthropic"
    model_name: "claude-3-5-haiku-20241022"
    env_api_key: "ANTHROPIC_API_KEY"
```

## Getting API Key

1. Visit [console.anthropic.com](https://console.anthropic.com/)
1. Get your API key
1. Export: `export ANTHROPIC_API_KEY=sk-ant-your-key`

## Pricing

- **Haiku**: $0.80/1M input, $4/1M output
- **Sonnet**: $3/1M input, $15/1M output

See [Providers Overview](../overview/) for comparison.

# Mistral AI Provider

Integration with Mistral's models.

## Supported Models

- `mistral-large-latest`
- `mistral-small-latest`

## Configuration

```yaml
models:
  - provider: "mistral"
    model_name: "mistral-small-latest"
    env_api_key: "MISTRAL_API_KEY"
```

## Getting API Key

1. Visit [console.mistral.ai](https://console.mistral.ai/)
1. Generate API key
1. Export: `export MISTRAL_API_KEY=your-key`

See [Providers Overview](../overview/) for comparison.

# X.AI Grok Provider

Integration with X.AI's Grok models.

## Supported Models

- `grok-beta`
- `grok-2-1212`
- `grok-2-latest`
- `grok-3`
- `grok-3-mini`

## Configuration

```yaml
models:
  - provider: "grok"
    model_name: "grok-2-1212"
    env_api_key: "XAI_API_KEY"
```

## Getting API Key

1. Visit [x.ai/api](https://x.ai/api)
1. Get API access
1. Export: `export XAI_API_KEY=xai-your-key`

See [Providers Overview](../overview/) for comparison.

# Google Gemini Provider

Integration with Google's Gemini models, including support for Google Search grounding.

## Overview

Google Gemini is a family of multimodal AI models that excels at understanding and generating text. Gemini models are available through Google AI Studio and support **Google Search grounding** for real-time web information.

**Key Features**:

- **Google Search Grounding**: Access real-time web data with no additional per-request fees
- **Competitive Pricing**: Among the most cost-effective LLMs with high quality
- **Automatic Search Decision**: Gemini intelligently decides when to use Google Search
- **Grounding Metadata**: Rich attribution showing which sources influenced responses

## Supported Models

### Gemini 2.5 Series (Recommended)

| Model                   | Speed   | Quality | Grounding | Best For                                     |
| ----------------------- | ------- | ------- | --------- | -------------------------------------------- |
| `gemini-2.5-flash`      | Fast    | High    | ✅ Yes    | **Production** - balanced speed/quality/cost |
| `gemini-2.5-flash-lite` | Fastest | Medium  | ❌ No     | High-volume, non-grounded queries            |
| `gemini-2.5-pro`        | Slower  | Highest | ✅ Yes    | Complex reasoning, highest quality           |

### Gemini 2.0 Series

| Model                   | Speed   | Quality | Grounding       | Best For                   |
| ----------------------- | ------- | ------- | --------------- | -------------------------- |
| `gemini-2.0-flash-exp`  | Fast    | High    | ⚠️ Experimental | Testing new features       |
| `gemini-2.0-flash-lite` | Fastest | Medium  | ❌ No           | Fast, non-grounded queries |

### Legacy Models (Not Recommended)

- `gemini-1.5-pro` - Superseded by 2.5-pro
- `gemini-1.5-flash` - Superseded by 2.5-flash

**Recommendation**: Use `gemini-2.5-flash` for production workloads. It provides excellent performance with Google Search grounding support at competitive pricing.

## Basic Configuration

### Without Google Search Grounding

Standard Gemini usage with training data only:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash-lite"
    env_api_key: "GEMINI_API_KEY"
```

**Use when**:

- You don't need real-time information
- Faster response times are critical
- Cost optimization is priority

### With Google Search Grounding

Enable real-time web information:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"
    tools:
      - google_search: {}
```

**Use when**:

- Brand monitoring requires current data
- Tracking real-time competitive landscape
- Need to detect recent changes
- Want Google's search quality

## Google Search Grounding

### Configuration Format

Google uses a unique tools configuration format:

```yaml
tools:
  - google_search: {}  # Dictionary with tool name as key
```

This differs from OpenAI's format:

```yaml
tools:
  - type: "web_search"  # Dictionary with 'type' field
tool_choice: "auto"
```

**Why the difference?** Each provider has different API specifications. Google uses named tool objects, OpenAI uses typed specifications. The config does direct passthrough to each API.

### Supported Models for Grounding

| Model                   | Grounding Support            |
| ----------------------- | ---------------------------- |
| `gemini-2.5-flash`      | ✅ **Yes** (recommended)     |
| `gemini-2.5-flash-lite` | ❌ No                        |
| `gemini-2.5-pro`        | ✅ **Yes** (highest quality) |
| `gemini-2.0-flash-exp`  | ⚠️ Experimental              |
| `gemini-2.0-flash-lite` | ❌ No                        |

### System Prompt Optimization

Use the specialized `google/gemini-grounding` system prompt for best results:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.5-flash"
    env_api_key: "GEMINI_API_KEY"
    system_prompt: "google/gemini-grounding"  # Optimized for grounding
    tools:
      - google_search: {}
```

This prompt:

- Instructs Gemini to use Google Search when beneficial
- Emphasizes grounding responses in search results
- Requests comprehensive source coverage
- Improves answer quality for brand monitoring

### How Grounding Works

1. Gemini receives your query prompt
1. **Automatically decides** if Google Search would improve the answer
1. Performs search if beneficial (no `tool_choice` parameter needed)
1. Grounds response in search results
1. Returns answer with grounding metadata

**No explicit control**: Unlike OpenAI's `tool_choice: "required"`, Gemini intelligently determines when grounding helps. This is intentional - Gemini optimizes for quality and cost.

### Grounding Metadata

Responses include rich grounding attribution:

```json
{
  "web_search_results": {
    "web_search_queries": ["best email warmup tools 2025"],
    "grounding_chunks": [
      {
        "web_source": "https://www.g2.com/categories/email-warmup",
        "retrieved_context": "Top email warmup tools..."
      }
    ],
    "grounding_supports": [
      {
        "segment": {
          "text": "Warmly is a leading solution"
        },
        "grounding_chunk_indices": [0],
        "confidence_scores": [0.95]
      }
    ]
  },
  "web_search_count": 1
}
```

**Key fields**:

- `web_search_queries`: What Gemini searched for
- `grounding_chunks`: Source URLs and context
- `grounding_supports`: Which text segments came from which sources
- `confidence_scores`: How confident Gemini is (0.0-1.0)

## Pricing

### Token Costs

| Model                   | Input             | Output            |
| ----------------------- | ----------------- | ----------------- |
| `gemini-2.5-flash`      | $0.04 / 1M tokens | $0.12 / 1M tokens |
| `gemini-2.5-flash-lite` | $0.02 / 1M tokens | $0.06 / 1M tokens |
| `gemini-2.5-pro`        | $0.60 / 1M tokens | $1.80 / 1M tokens |

### Google Search Grounding Costs

**Good news**: No additional fees for grounding. You only pay token costs.

**Example** (email warmup query with grounding):

```text
Input: 100 tokens @ $0.04/1M = $0.000004
Output: 300 tokens @ $0.12/1M = $0.000036
Total: $0.00004 per query
```

**Comparison**:

- **Gemini with grounding**: $0.00004 per query
- **OpenAI web search**: $0.0116 per query (~290x more)
- **Perplexity sonar-pro**: (0.005-)0.03 per query (125-750x more)

Cost Advantage

Google Search grounding is **significantly cheaper** than alternatives. Grounding tokens are included in base pricing with no per-request fees.

## Complete Configuration Example

### Multi-Model Strategy

Use different models for different use cases:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    # High-volume: Fast + cheap without grounding
    - provider: "google"
      model_name: "gemini-2.5-flash-lite"
      env_api_key: "GEMINI_API_KEY"

    # Brand monitoring: Balanced with grounding
    - provider: "google"
      model_name: "gemini-2.5-flash"
      env_api_key: "GEMINI_API_KEY"
      system_prompt: "google/gemini-grounding"
      tools:
        - google_search: {}

    # Premium: Highest quality with grounding
    - provider: "google"
      model_name: "gemini-2.5-pro"
      env_api_key: "GEMINI_API_KEY"
      system_prompt: "google/gemini-grounding"
      tools:
        - google_search: {}

brands:
  mine:
    - "Warmly"
  competitors:
    - "HubSpot"
    - "Instantly"

intents:
  - id: "email-warmup-tools"
    prompt: "What are the best email warmup tools in 2025?"
```

## Getting API Key

1. Visit [Google AI Studio](https://aistudio.google.com/app/apikey)

1. Sign in with your Google account

1. Click "Create API key"

1. Copy the key (format: `AIza...`)

1. Export to environment:

   ```bash
   export GEMINI_API_KEY=AIza-your-key-here
   ```

API Key Security

- Never commit API keys to version control
- Use environment variables or secret management
- Rotate keys periodically
- Monitor usage in AI Studio dashboard

## When to Use Gemini

### Choose Gemini When:

- ✅ **Cost optimization**: Among the cheapest high-quality LLMs
- ✅ **Google Search quality**: Want Google's search coverage and accuracy
- ✅ **High-volume monitoring**: Grounding with no per-request fees
- ✅ **Automatic search decision**: Trust Gemini to decide when to ground
- ✅ **Grounding metadata**: Need detailed source attribution

### Choose Other Providers When:

**OpenAI**:

- Need explicit `tool_choice` control (force/disable search)
- Prefer OpenAI's reasoning quality
- Already invested in OpenAI ecosystem

**Perplexity**:

- Need explicit source URLs in every response
- Want always-on web search with citations
- Prefer Perplexity's citation format

**Anthropic**:

- Need longest context windows (200K+)
- Prefer Claude's reasoning style
- Don't need web search

## Best Practices

### 1. Use Appropriate Model Tiers

```yaml
# High-volume, non-grounded queries
- model_name: "gemini-2.5-flash-lite"

# Production brand monitoring (recommended)
- model_name: "gemini-2.5-flash"
  tools: [google_search: {}]

# Premium quality for critical queries
- model_name: "gemini-2.5-pro"
  tools: [google_search: {}]
```

### 2. Enable Grounding for Brand Monitoring

```yaml
# ✅ GOOD - Grounding for current brand data
- provider: "google"
  model_name: "gemini-2.5-flash"
  system_prompt: "google/gemini-grounding"
  tools:
    - google_search: {}

intents:
  - id: "current-tools"
    prompt: "What are the best email tools in 2025?"
```

### 3. Skip Grounding for Historical/Generic Queries

```yaml
# ✅ GOOD - No grounding for general knowledge
- provider: "google"
  model_name: "gemini-2.5-flash-lite"

intents:
  - id: "email-best-practices"
    prompt: "What are email deliverability best practices?"
```

### 4. Use Grounding-Optimized System Prompt

```yaml
# ✅ GOOD
system_prompt: "google/gemini-grounding"  # Optimized

# ❌ SUBOPTIMAL
# (no system_prompt or using "google/default")
```

### 5. Monitor Grounding Usage

Track when Gemini uses grounding:

```python
# Check if grounding was used
if result["web_search_count"] > 0:
    print(f"Grounding used: {result['web_search_count']} searches")
    print(f"Queries: {result['web_search_results']['web_search_queries']}")
```

## Troubleshooting

### Grounding Not Working

**Problem**: `web_search_count` is always 0

**Solutions**:

1. Check you're using a grounding-capable model:

```yaml
# ✅ Grounding supported
model_name: "gemini-2.5-flash"

# ❌ Grounding NOT supported
model_name: "gemini-2.5-flash-lite"
```

1. Verify tools configuration format:

   ```yaml
   # ✅ Correct
   tools:
     - google_search: {}

   # ❌ Wrong (OpenAI format)
   tools:
     - type: "web_search"
   ```

1. Use grounding-optimized system prompt:

   ```yaml
   system_prompt: "google/gemini-grounding"
   ```

### API Authentication Errors

**Problem**: `401 Unauthorized` or `403 Forbidden`

**Solutions**:

1. Verify API key is correct:

```bash
echo $GEMINI_API_KEY  # Should show AIza...
```

1. Check key is active in [AI Studio](https://aistudio.google.com/app/apikey)
1. Verify key has correct permissions

### Rate Limiting

**Problem**: `429 Too Many Requests`

**Solutions**:

1. Reduce `max_concurrent_requests` in config:

```yaml
run_settings:
  max_concurrent_requests: 3  # Google limit
```

1. Add delay between requests
1. Upgrade to higher quota tier in AI Studio

## Further Reading

- [Web Search Configuration](../../user-guide/configuration/web-search/) - Detailed grounding setup
- [Model Configuration](../../user-guide/configuration/models/) - Model selection guide
- [Providers Overview](../overview/) - Compare all providers
- [Google AI Studio](https://aistudio.google.com) - Official documentation

# Perplexity Provider

Integration with Perplexity's search-grounded models.

## Supported Models

- `sonar`
- `sonar-pro`
- `sonar-reasoning`
- `sonar-reasoning-pro`
- `sonar-deep-research`

## Configuration

```yaml
models:
  - provider: "perplexity"
    model_name: "sonar-pro"
    env_api_key: "PERPLEXITY_API_KEY"
```

## Getting API Key

1. Visit [perplexity.ai/settings/api](https://www.perplexity.ai/settings/api)
1. Generate API key
1. Export: `export PERPLEXITY_API_KEY=pplx-your-key`

## Features

- Built-in web search
- Real-time information
- Citations included

See [Providers Overview](../overview/) for comparison.
# Examples

# Basic Monitoring Example

A complete, production-ready guide for monitoring brand visibility across multiple LLM providers.

## Quick Start

The easiest way to get started is with the pre-built examples:

### 1. Minimal Example (First-Time Users)

**File**: [`examples/01-quickstart/minimal.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart/minimal.config.yaml)

```bash
# Set API key
export OPENAI_API_KEY="sk-..."

# Run minimal example
llm-answer-watcher run --config examples/01-quickstart/minimal.config.yaml

# View results
open ./output/*/report.html
```

**Cost**: ~$0.001 | **Time**: ~5 seconds

### 2. Real-World SaaS Monitoring

**File**: [`examples/07-real-world/saas-brand-monitoring.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/saas-brand-monitoring.config.yaml)

This template demonstrates complete production monitoring with:

- Multiple providers for comprehensive coverage
- Buyer-intent queries across different use cases
- Budget controls and cost management
- Competitor tracking

```bash
# Set required API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Run monitoring
llm-answer-watcher run --config examples/07-real-world/saas-brand-monitoring.config.yaml
```

**Cost**: ~$0.05-0.20 per run depending on providers and intents

## Configuration Overview

For a detailed explanation of each configuration option, see:

- [`examples/01-quickstart/explained.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart/explained.config.yaml) - Same minimal config with inline comments

## Use Case Examples

The examples directory includes ready-to-use templates:

| Use Case                      | Example Config                                       | Description                      |
| ----------------------------- | ---------------------------------------------------- | -------------------------------- |
| **Quick Testing**             | `01-quickstart/minimal.config.yaml`                  | Single provider, single intent   |
| **Multi-Provider Comparison** | `02-providers/multi-provider-comparison.config.yaml` | Compare all 6 providers          |
| **Real-Time Data**            | `03-web-search/websearch-comparison.config.yaml`     | Web search across providers      |
| **High Accuracy**             | `04-extraction/function-calling.config.yaml`         | LLM-based brand extraction       |
| **Automated Insights**        | `05-operations/content-strategy.config.yaml`         | Generate content recommendations |
| **Budget Controls**           | `06-advanced/budget-controls.config.yaml`            | Cost management features         |
| **Production Ready**          | `07-real-world/saas-brand-monitoring.config.yaml`    | Complete monitoring setup        |

## Environment Setup

Copy the environment template:

```bash
cp examples/.env.example .env
```

Edit `.env` and add your API keys:

```bash
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
MISTRAL_API_KEY=...
GROK_API_KEY=xai-...
PERPLEXITY_API_KEY=pplx-...
```

## Understanding Output

Each run creates a timestamped directory with:

```text
output/2025-11-05T14-30-00Z/
├── run_meta.json                    # Run summary and stats
├── report.html                      # Interactive HTML report
├── intent_*_raw_*.json             # Raw LLM responses
├── intent_*_parsed_*.json          # Extracted brand mentions
└── intent_*_error_*.json           # Error details (if any)
```

### HTML Report

Open the report in your browser:

```bash
open ./output/2025-11-05T14-30-00Z/report.html
```

**Report includes:**

- Summary statistics (costs, queries, mentions)
- Brand mention tables with ranks
- Rank distribution charts
- Cost breakdown by provider
- Raw LLM responses for verification

### JSON Results

View structured output:

```bash
# View run summary
cat ./output/2025-11-05T14-30-00Z/run_meta.json | jq '.'

# View specific intent results
cat ./output/*/intent_best-email-warmup-tools_parsed_openai_gpt-4o-mini.json | jq '.'
```

### SQLite Database

All data is stored in SQLite for historical tracking:

```bash
sqlite3 ./output/watcher.db

# View latest run
SELECT * FROM runs ORDER BY timestamp_utc DESC LIMIT 1;

# View your brand mentions
SELECT * FROM mentions WHERE is_mine = 1 ORDER BY timestamp_utc DESC;

# Compare competitors
SELECT brand, COUNT(*) as mentions, AVG(rank_position) as avg_rank
FROM mentions
WHERE is_mine = 0 AND rank_position IS NOT NULL
GROUP BY brand
ORDER BY mentions DESC;
```

See [SQLite Database Guide](../../data-analytics/sqlite-database/) for more queries.

## Analyzing Results

### Check Brand Visibility

```sql
-- Did we appear in any responses?
SELECT
    intent_id,
    model_provider,
    model_name,
    brand,
    rank_position
FROM mentions
WHERE is_mine = 1
  AND run_id = '2025-11-05T14-30-00Z'
ORDER BY intent_id, rank_position;
```

### Compare vs Competitors

```sql
-- How do we rank vs competitors?
SELECT
    brand,
    COUNT(*) as total_mentions,
    COUNT(DISTINCT intent_id) as intents_appeared,
    AVG(rank_position) as avg_rank,
    MIN(rank_position) as best_rank
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND rank_position IS NOT NULL
GROUP BY brand
ORDER BY total_mentions DESC, avg_rank ASC;
```

### Identify Gaps

```sql
-- Which intents didn't mention us?
SELECT DISTINCT intent_id
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND intent_id NOT IN (
    SELECT DISTINCT intent_id
    FROM mentions
    WHERE run_id = '2025-11-05T14-30-00Z'
      AND is_mine = 1
  );
```

## Schedule Regular Monitoring

### Daily Cron Job

See [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py) for a complete automation script.

Basic cron setup:

```bash
# Run daily at 9 AM
0 9 * * * cd /path/to/llm-answer-watcher && ./venv/bin/llm-answer-watcher run --config examples/07-real-world/saas-brand-monitoring.config.yaml --yes --quiet >> logs/monitoring.log 2>&1
```

See [Automation Guide](../../user-guide/usage/automation/) for more options.

## Cost Analysis

### Actual Costs

```sql
-- Total cost last 30 days
SELECT SUM(total_cost_usd) as total_cost
FROM runs
WHERE timestamp_utc >= datetime('now', '-30 days');

-- Cost by provider
SELECT
    model_provider,
    SUM(estimated_cost_usd) as provider_cost,
    COUNT(*) as queries
FROM answers_raw
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider;
```

### Cost Optimization

**Budget Examples:**

- **Minimal**: Use `01-quickstart/minimal.config.yaml` (~$0.001 per run)
- **Budget-Constrained**: Use `06-advanced/budget-controls.config.yaml` (~$0.01 per run)
- **Production**: Use `07-real-world/saas-brand-monitoring.config.yaml` (~$0.05-0.20 per run)

See [Budget Controls](../../user-guide/configuration/budget/) for more details.

## Troubleshooting

### No Brand Mentions

**Problem:** Your brand never appears

**Solutions:**

1. Check brand aliases in your config:

```yaml
brands:
  mine:
    - "YourBrand"
    - "YourBrand.io"
    - "YourBrand AI"
    - "yourbrand.com"  # Add domain variations
```

1. View raw responses to verify:

```bash
cat output/*/intent_*_raw_*.json | jq '.answer_text' | grep -i "yourbrand"
```

1. Try more specific prompts:

```yaml
intents:
  - id: "branded-comparison"
    prompt: "Compare YourBrand vs Competitor for [use case]"
```

### High Costs

**Problem:** Costs exceed budget

**Solutions:**

1. Use the budget-controls example:

```bash
llm-answer-watcher run --config examples/06-advanced/budget-controls.config.yaml
```

1. Switch to cheaper models:

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # Cheapest option
```

1. Reduce intent count or providers

### Rate Limiting

**Problem:** API rate limits hit

**Solution:** Reduce concurrency:

```yaml
run_settings:
  max_concurrent_requests: 1  # Sequential processing
  delay_between_queries: 2     # 2 second delay
```

## Next Steps

- **Multi-Provider Comparison**

  ______________________________________________________________________

  Compare multiple LLM providers side-by-side

  [Multi-Provider Example →](../multi-provider/)

- **Competitor Analysis**

  ______________________________________________________________________

  Deep dive into competitor positioning

  [Competitor Analysis →](../competitor-analysis/)

- **Historical Trends**

  ______________________________________________________________________

  Track changes over time with SQL

  [Trends Analysis →](../../data-analytics/trends-analysis/)

- **Automation**

  ______________________________________________________________________

  Set up scheduled monitoring with cron or CI/CD

  [Automation Guide →](../../user-guide/usage/automation/)

## Additional Resources

- **[Examples Directory](https://github.com/nibzard/llm-answer-watcher/tree/main/examples)** - All configuration examples
- **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python automation scripts
- **[Configuration Reference](../../reference/configuration-schema/)** - Complete config schema
- **[Database Schema](../../reference/database-schema/)** - SQLite database structure

# Multi-Provider Monitoring

Compare how different LLM providers represent your brand.

## Quick Start

The easiest way to compare multiple providers is with the pre-built multi-provider example:

**File**: [`examples/02-providers/multi-provider-comparison.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers/multi-provider-comparison.config.yaml)

```bash
# Set API keys for providers you want to test
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="..."
export MISTRAL_API_KEY="..."
export GROK_API_KEY="xai-..."
export PERPLEXITY_API_KEY="pplx-..."

# Run comparison across all 6 providers
llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml
```

**Cost**: ~$0.037 for 3 intents × 6 providers = 18 queries

## Supported Providers

All 6 providers are demonstrated in the `examples/02-providers/` directory:

| Provider       | Example Config              | Model                | Cost/Query | Notes                |
| -------------- | --------------------------- | -------------------- | ---------- | -------------------- |
| **OpenAI**     | `openai.config.yaml`        | gpt-4o-mini          | ~$0.0008   | Fastest, cheapest    |
| **Anthropic**  | `anthropic.config.yaml`     | claude-3-5-haiku     | ~$0.002    | Great quality/price  |
| **Google**     | `google-gemini.config.yaml` | gemini-2.0-flash-exp | ~$0.0005   | Very fast, free tier |
| **Mistral**    | `mistral.config.yaml`       | mistral-large-latest | ~$0.003    | European provider    |
| **Grok**       | `grok.config.yaml`          | grok-beta            | ~$0.005    | X.AI model           |
| **Perplexity** | `perplexity.config.yaml`    | sonar                | ~$0.001    | Built-in citations   |

See the [Providers README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers) for detailed documentation.

## Individual Provider Examples

### Test a Single Provider

Each provider has its own example config for isolated testing:

```bash
# OpenAI (recommended for first test)
llm-answer-watcher run --config examples/02-providers/openai.config.yaml

# Anthropic (Claude)
llm-answer-watcher run --config examples/02-providers/anthropic.config.yaml

# Google Gemini
llm-answer-watcher run --config examples/02-providers/google-gemini.config.yaml

# Mistral
llm-answer-watcher run --config examples/02-providers/mistral.config.yaml

# Grok
llm-answer-watcher run --config examples/02-providers/grok.config.yaml

# Perplexity
llm-answer-watcher run --config examples/02-providers/perplexity.config.yaml
```

## Configuration Example

Here's a simplified multi-provider configuration:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    # Fast and cheap
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

    # High quality
    - provider: "anthropic"
      model_name: "claude-3-5-haiku-20241022"
      env_api_key: "ANTHROPIC_API_KEY"

    # Free tier available
    - provider: "google"
      model_name: "gemini-2.0-flash-exp"
      env_api_key: "GEMINI_API_KEY"

brands:
  mine: ["YourBrand"]
  competitors: ["CompetitorA", "CompetitorB"]

intents:
  - id: "best-tools"
    prompt: "What are the best tools in this category?"
```

## Benefits of Multi-Provider Monitoring

- **See which providers favor your brand** - Different LLMs have different training data and biases
- **Identify provider-specific biases** - Track which providers consistently rank competitors higher
- **Optimize for specific LLM platforms** - If your users primarily use ChatGPT, focus on OpenAI optimization
- **Comprehensive coverage** - Different users use different LLMs, monitor them all

## Analyzing Multi-Provider Results

### Compare Brand Mentions Across Providers

```sql
-- How often does each provider mention us?
SELECT
    model_provider,
    COUNT(*) as total_queries,
    SUM(CASE WHEN EXISTS (
        SELECT 1 FROM mentions m
        WHERE m.run_id = answers_raw.run_id
          AND m.intent_id = answers_raw.intent_id
          AND m.model_provider = answers_raw.model_provider
          AND m.is_mine = 1
    ) THEN 1 ELSE 0 END) as queries_with_brand,
    ROUND(100.0 * SUM(CASE WHEN EXISTS (
        SELECT 1 FROM mentions m
        WHERE m.run_id = answers_raw.run_id
          AND m.intent_id = answers_raw.intent_id
          AND m.model_provider = answers_raw.model_provider
          AND m.is_mine = 1
    ) THEN 1 ELSE 0 END) / COUNT(*), 2) as mention_rate_pct
FROM answers_raw
WHERE run_id = '2025-11-05T14-30-00Z'
GROUP BY model_provider
ORDER BY mention_rate_pct DESC;
```

### Compare Average Rankings by Provider

```sql
-- Which provider ranks us highest?
SELECT
    model_provider,
    AVG(rank_position) as avg_rank,
    MIN(rank_position) as best_rank,
    COUNT(*) as total_mentions
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND is_mine = 1
  AND rank_position IS NOT NULL
GROUP BY model_provider
ORDER BY avg_rank ASC;
```

### Provider Cost Comparison

```sql
-- Cost efficiency by provider
SELECT
    model_provider,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost_per_query,
    SUM(tokens_used) as total_tokens
FROM answers_raw
WHERE run_id = '2025-11-05T14-30-00Z'
GROUP BY model_provider
ORDER BY total_cost ASC;
```

## Which Providers Should You Use?

### For Testing/Development

- Use **OpenAI gpt-4o-mini** or **Google gemini-flash** (fastest, cheapest)

### For Production Monitoring

- Use **multi-provider comparison** to see all perspectives
- Track which providers consistently mention your brand

### For Specific Needs

- **Best quality**: Anthropic claude-3-5-sonnet, OpenAI gpt-4o
- **Cheapest**: Google gemini-flash, OpenAI gpt-4o-mini
- **Fastest**: Google gemini-flash
- **Citations**: Perplexity sonar
- **European data**: Mistral
- **Real-time data**: Grok (Twitter/X context)

## Provider-Specific Features

See the [Providers README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers) for detailed documentation on each provider's unique features:

- **OpenAI**: Web search via Responses API
- **Anthropic**: Tool use, 200K context
- **Google**: Search grounding
- **Mistral**: Function calling
- **Grok**: Twitter/X integration
- **Perplexity**: Built-in web search

## Next Steps

- **Add Web Search**

  ______________________________________________________________________

  Enable real-time web search for current data

  [Web Search Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/03-web-search)

- **Compare Results**

  ______________________________________________________________________

  Analyze differences across providers

  [Basic Monitoring →](../basic-monitoring/)

- **Provider Guides**

  ______________________________________________________________________

  Deep dive into each provider's features

  [Provider Documentation →](../../providers/overview/)

- **Advanced Config**

  ______________________________________________________________________

  Budget controls, operations, extraction

  [Advanced Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced)

## Additional Resources

- **[Examples Directory](https://github.com/nibzard/llm-answer-watcher/tree/main/examples)** - All configuration examples
- **[Provider Comparison](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers)** - Detailed provider documentation
- **[Provider Guides](../../providers/overview/)** - Complete provider reference docs

# Competitor Analysis

Track competitors comprehensively across multiple queries and LLM providers.

## Quick Start

The best example for comprehensive competitive intelligence:

**File**: [`examples/07-real-world/competitive-intelligence.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/competitive-intelligence.config.yaml)

```bash
# Set API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Run competitive intelligence monitoring
llm-answer-watcher run --config examples/07-real-world/competitive-intelligence.config.yaml
```

This template demonstrates:

- Comprehensive competitor tracking across multiple providers
- Diverse buyer-intent queries
- Competitive positioning analysis
- Rank comparison

## Use Case Templates

The `examples/07-real-world/` directory includes several competitive analysis templates:

| Template                     | Use Case                                                  | File                                   |
| ---------------------------- | --------------------------------------------------------- | -------------------------------------- |
| **Competitive Intelligence** | Monitor how competitors are positioned                    | `competitive-intelligence.config.yaml` |
| **Content Gap Analysis**     | Find opportunities where competitors appear but you don't | `content-gap-analysis.config.yaml`     |
| **Brand Monitoring**         | Track your brand vs competitors                           | `saas-brand-monitoring.config.yaml`    |
| **LLM SEO**                  | Optimize for LLM visibility                               | `llm-seo-optimization.config.yaml`     |

See the [Real-World Examples README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world) for details.

## Example Configuration

Here's a simplified competitive analysis config:

```yaml
run_settings:
  output_dir: "./output"
  sqlite_db_path: "./output/watcher.db"

  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

brands:
  mine: ["YourBrand"]

  # Comprehensive competitor list
  competitors:
    - "TopCompetitor"        # Direct competitor #1
    - "RisingStartup"        # Emerging threat
    - "IndustryLeader"       # Established player
    - "NichePlayer"          # Specialized competitor
    - "AlternativeTool"      # Adjacent category
    - "LegacyProvider"       # Traditional option

intents:
  # General category query
  - id: "best-overall"
    prompt: "What are the best tools in the category?"

  # Segment-specific queries
  - id: "for-startups"
    prompt: "Best tools for startups?"

  - id: "for-enterprise"
    prompt: "Best enterprise tools?"

  # Feature-specific queries
  - id: "affordable-options"
    prompt: "Most affordable tools?"

  - id: "easiest-to-use"
    prompt: "Which tools are easiest to use?"

  # Comparison queries
  - id: "vs-leader"
    prompt: "How does YourBrand compare to TopCompetitor?"
```

## Analyzing Competitive Results

### 1. Competitor Appearance Frequency

```sql
-- How often does each competitor appear?
SELECT
    brand,
    COUNT(*) as total_mentions,
    COUNT(DISTINCT intent_id) as intents_appeared,
    ROUND(100.0 * COUNT(DISTINCT intent_id) / (
        SELECT COUNT(DISTINCT intent_id) FROM mentions WHERE run_id = '2025-11-05T14-30-00Z'
    ), 2) as coverage_pct
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND normalized_name != 'yourbrand'
  AND is_mine = 0
GROUP BY brand
ORDER BY total_mentions DESC;
```

**Example output:**

```text
TopCompetitor   | 12 | 5 | 83.33%
IndustryLeader  |  9 | 4 | 66.67%
RisingStartup   |  6 | 3 | 50.00%
YourBrand       |  5 | 3 | 50.00%
```

**Interpretation:**

- TopCompetitor appears most frequently (83% of intents)
- You're tied with RisingStartup (50% coverage)
- Opportunity: Increase visibility in missing intent categories

### 2. Average Rankings by Competitor

```sql
-- Compare average rank positions
SELECT
    brand,
    COUNT(*) as mentions,
    AVG(rank_position) as avg_rank,
    MIN(rank_position) as best_rank,
    MAX(rank_position) as worst_rank
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND rank_position IS NOT NULL
GROUP BY brand
ORDER BY avg_rank ASC;
```

**Example output:**

```text
TopCompetitor   | 12 | 1.8 | 1 | 4
YourBrand       |  5 | 2.4 | 1 | 5
IndustryLeader  |  9 | 2.9 | 1 | 6
RisingStartup   |  6 | 3.2 | 2 | 5
```

**Interpretation:**

- TopCompetitor has best average rank (1.8)
- You rank 2.4 on average (room for improvement)
- Focus on improving from #2-3 to #1

### 3. Head-to-Head Comparisons

```sql
-- When you both appear, who ranks higher?
SELECT
    m1.intent_id,
    m1.brand as your_brand,
    m1.rank_position as your_rank,
    m2.brand as competitor_brand,
    m2.rank_position as competitor_rank,
    CASE
        WHEN m1.rank_position < m2.rank_position THEN 'You win'
        WHEN m1.rank_position > m2.rank_position THEN 'Competitor wins'
        ELSE 'Tie'
    END as outcome
FROM mentions m1
JOIN mentions m2
    ON m1.run_id = m2.run_id
    AND m1.intent_id = m2.intent_id
    AND m1.model_provider = m2.model_provider
    AND m1.model_name = m2.model_name
WHERE m1.run_id = '2025-11-05T14-30-00Z'
  AND m1.is_mine = 1
  AND m2.brand = 'TopCompetitor'
  AND m1.rank_position IS NOT NULL
  AND m2.rank_position IS NOT NULL
ORDER BY m1.intent_id;
```

### 4. Identify Content Gaps

```sql
-- Which intents do competitors appear in but you don't?
SELECT
    intent_id,
    GROUP_CONCAT(DISTINCT brand) as competitors_mentioned
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND is_mine = 0
  AND intent_id NOT IN (
      SELECT DISTINCT intent_id
      FROM mentions
      WHERE run_id = '2025-11-05T14-30-00Z'
        AND is_mine = 1
  )
GROUP BY intent_id;
```

**Example output:**

```text
for-enterprise | TopCompetitor, IndustryLeader
affordable-options | RisingStartup, NichePlayer
```

**Interpretation:**

- You're missing in "enterprise" queries → Create enterprise content
- Missing in "affordable" queries → Highlight pricing

### 5. Provider-Specific Competitive Positioning

```sql
-- Which providers favor which competitors?
SELECT
    model_provider,
    brand,
    COUNT(*) as mentions,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE run_id = '2025-11-05T14-30-00Z'
  AND rank_position IS NOT NULL
GROUP BY model_provider, brand
ORDER BY model_provider, avg_rank ASC;
```

## Competitive Monitoring Strategies

### 1. Daily Competitive Tracking

Monitor key competitors daily:

```bash
# Run competitive intelligence
llm-answer-watcher run --config examples/07-real-world/competitive-intelligence.config.yaml --yes --quiet

# Analyze changes
python examples/code-examples/analyze_results.py
```

See [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py) for automation.

### 2. Weekly Deep Dives

Run comprehensive analysis weekly:

```bash
# Multi-provider comparison
llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml

# With web search for current data
llm-answer-watcher run --config examples/03-web-search/websearch-comparison.config.yaml
```

### 3. Content Gap Analysis

Identify where competitors appear but you don't:

```bash
llm-answer-watcher run --config examples/07-real-world/content-gap-analysis.config.yaml
```

### 4. Sentiment Comparison

Track how you're described vs competitors:

```bash
llm-answer-watcher run --config examples/04-extraction/sentiment-analysis.config.yaml
```

## Competitive Intelligence Dashboard

### Key Metrics to Track

1. **Mention Rate**: % of queries where you appear
1. **Win Rate**: % of head-to-head comparisons where you rank higher
1. **Average Rank**: Your mean position when mentioned
1. **Coverage Gap**: Intents where competitors appear but you don't
1. **Provider Bias**: Which LLMs favor which brands

### SQL Dashboard Query

```sql
-- Comprehensive competitive dashboard
WITH competitor_stats AS (
    SELECT
        brand,
        COUNT(*) as mentions,
        AVG(rank_position) as avg_rank,
        MIN(rank_position) as best_rank,
        COUNT(DISTINCT intent_id) as intent_coverage
    FROM mentions
    WHERE run_id = '2025-11-05T14-30-00Z'
      AND rank_position IS NOT NULL
    GROUP BY brand
)
SELECT
    brand,
    mentions,
    ROUND(avg_rank, 2) as avg_rank,
    best_rank,
    intent_coverage,
    ROUND(100.0 * intent_coverage / (
        SELECT COUNT(DISTINCT intent_id) FROM mentions WHERE run_id = '2025-11-05T14-30-00Z'
    ), 1) as coverage_pct
FROM competitor_stats
ORDER BY mentions DESC, avg_rank ASC;
```

## Actionable Insights

### If a Competitor Consistently Ranks Higher

1. **Analyze their positioning**: Read raw responses to understand why
1. **Create targeted content**: Address the specific use cases they dominate
1. **Monitor trends**: Track if gap is widening or narrowing

### If You're Missing in Key Intents

1. **Update your content**: Create pages targeting those queries
1. **Adjust brand aliases**: Add variations that LLMs might use
1. **Test different prompts**: Try alternative phrasings

### If Provider Bias Exists

1. **Optimize for specific LLMs**: If users primarily use ChatGPT, focus there
1. **Diversify content**: Different LLMs have different preferences
1. **Track changes**: Monitor if bias shifts over time

## Next Steps

- **Content Gap Analysis**

  ______________________________________________________________________

  Find opportunities where competitors appear but you don't

  [Content Gap Template →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world/content-gap-analysis.config.yaml)

- **Historical Trends**

  ______________________________________________________________________

  Track competitive position over time

  [Trends Analysis →](../../data-analytics/trends-analysis/)

- **Automate Monitoring**

  ______________________________________________________________________

  Set up daily competitive tracking

  [Automation Guide →](../../user-guide/usage/automation/)

- **Operations**

  ______________________________________________________________________

  Generate competitive insights automatically

  [Operations Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/05-operations)

## Additional Resources

- **[Real-World Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/07-real-world)** - Complete use case templates
- **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python analysis scripts
- **[Database Queries](../../data-analytics/query-examples/)** - More SQL query examples
- **[Trends Analysis](../../data-analytics/trends-analysis/)** - Historical tracking guide

# Budget-Constrained Monitoring

Minimize costs while maintaining monitoring quality.

## Quick Start

The best example for cost-optimized monitoring:

**File**: [`examples/06-advanced/budget-controls.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced/budget-controls.config.yaml)

```bash
# Set API key
export OPENAI_API_KEY="sk-..."

# Run with budget controls
llm-answer-watcher run --config examples/06-advanced/budget-controls.config.yaml
```

**Features:**

- Strict budget limits (abort if exceeded)
- Warning thresholds
- Cost-effective model selection
- Optimized configuration

## Budget Control Options

### Example Configuration

```yaml
run_settings:
  output_dir: "./output"

  models:
    # Use cheapest effective model
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

  # Regex extraction (no extra LLM calls)
  use_llm_rank_extraction: false

  # Set budget limits
  budget:
    enabled: true
    max_per_run_usd: 0.10       # Abort if total exceeds 10 cents
    warn_threshold_usd: 0.05    # Warn at 5 cents
    max_per_intent_usd: 0.02    # Abort if single intent exceeds 2 cents

brands:
  mine: ["YourBrand"]
  # Focus on top 3 competitors only
  competitors: ["TopCompetitor1", "TopCompetitor2", "TopCompetitor3"]

intents:
  # Single most valuable intent
  - id: "main-query"
    prompt: "What are the best tools in my category?"
```

## Cost Optimization Strategies

### 1. Use Cheapest Models

| Provider       | Model                | Cost per 1M input tokens | Recommended for       |
| -------------- | -------------------- | ------------------------ | --------------------- |
| **Google**     | gemini-2.0-flash-exp | Free tier available      | Testing, development  |
| **OpenAI**     | gpt-4o-mini          | $0.15                    | Production monitoring |
| **Perplexity** | sonar                | ~$0.20                   | With web search       |
| **Anthropic**  | claude-3-5-haiku     | $0.80                    | High quality needed   |

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # Best value
```

### 2. Minimize Intent Count

Focus on highest-value buyer-intent queries:

```yaml
intents:
  # Single most important query
  - id: "primary-buyer-intent"
    prompt: "What are the best [your category] tools?"

  # Optional: Add 1-2 more if budget allows
  # - id: "secondary-query"
  #   prompt: "..."
```

### 3. Use Regex Extraction (Not LLM)

Disable LLM-based rank extraction to save costs:

```yaml
run_settings:
  use_llm_rank_extraction: false  # Use regex only (~85% accuracy)
```

This eliminates extra LLM calls for rank extraction.

### 4. Reduce Providers

Start with 1-2 providers instead of all 6:

```yaml
models:
  # Single provider for budget monitoring
  - provider: "openai"
    model_name: "gpt-4o-mini"
```

### 5. Enable Budget Controls

Set strict limits to prevent cost overruns:

```yaml
budget:
  enabled: true
  max_per_run_usd: 0.05      # Hard limit
  warn_threshold_usd: 0.02   # Early warning
```

## Cost Estimates

### Ultra-Minimal Config

**Config**: `examples/01-quickstart/minimal.config.yaml`

- 1 intent × 1 model (gpt-4o-mini)
- Cost: ~$0.001 per run
- Monthly (daily): ~$0.03/month

### Budget-Constrained Config

**Config**: `examples/06-advanced/budget-controls.config.yaml`

- 3 intents × 1 model (gpt-4o-mini)
- Cost: ~$0.006 per run
- Monthly (daily): ~$0.18/month

### Moderate Budget Config

- 3 intents × 2 models (gpt-4o-mini + claude-haiku)
- Cost: ~$0.012 per run
- Monthly (daily): ~$0.36/month

## Monitoring Actual Costs

### Track Costs in Database

```sql
-- Total cost last 30 days
SELECT SUM(total_cost_usd) as total_cost
FROM runs
WHERE timestamp_utc >= datetime('now', '-30 days');

-- Cost by provider
SELECT
    model_provider,
    SUM(estimated_cost_usd) as provider_cost,
    COUNT(*) as queries,
    AVG(estimated_cost_usd) as avg_per_query
FROM answers_raw
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider;

-- Cost trend over time
SELECT
    DATE(timestamp_utc) as date,
    SUM(total_cost_usd) as daily_cost
FROM runs
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

### Set Budget Alerts

If costs exceed thresholds, the tool will:

- **Warn** at `warn_threshold_usd`
- **Abort** at `max_per_run_usd`

Example output:

```text
⚠️  Warning: Cost approaching budget limit
   Current: $0.048
   Limit: $0.05
   Queries remaining: ~2
```

## Trade-offs: Cost vs. Features

| Feature                 | Cost Impact   | Alternative                 |
| ----------------------- | ------------- | --------------------------- |
| **LLM rank extraction** | +$0.001/query | Use regex (85% accuracy)    |
| **Web search**          | +$0.01/query  | Skip for non-time-sensitive |
| **Operations**          | +$0.005/query | Run separately when needed  |
| **Multiple providers**  | ×N providers  | Use 1-2 providers           |
| **Function calling**    | +$0.001/query | Use regex extraction        |

## When to Increase Budget

Consider increasing your budget if:

1. **Low visibility**: Your brand rarely appears
1. Solution: Add more intents, try different phrasing
1. **Missing competitors**: Important competitors not tracked
1. Solution: Add more competitor brands
1. **Limited provider coverage**: Only testing 1 provider
1. Solution: Add 1-2 more providers for comparison
1. **Need real-time data**: Using stale LLM knowledge
1. Solution: Enable web search (see `examples/03-web-search/`)

## Free Tier Options

### Google Gemini Free Tier

Google offers free tier for Gemini models:

```yaml
models:
  - provider: "google"
    model_name: "gemini-2.0-flash-exp"
    env_api_key: "GEMINI_API_KEY"
```

**Free tier limits:**

- 15 requests/minute
- 1,500 requests/day
- 1 million requests/month

Perfect for testing and low-volume monitoring.

See [`examples/02-providers/google-gemini.config.yaml`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/02-providers/google-gemini.config.yaml)

## Next Steps

- **Cost Management**

  ______________________________________________________________________

  Learn more about budget controls and cost optimization

  [Cost Management Guide →](../../user-guide/features/cost-management/)

- **Start Minimal**

  ______________________________________________________________________

  Try the absolute minimum config first

  [Quickstart Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/01-quickstart)

- **Scale Up**

  ______________________________________________________________________

  When ready, add more providers and features

  [Multi-Provider Example →](../multi-provider/)

- **Track Costs**

  ______________________________________________________________________

  Query cost history in SQLite

  [Database Guide →](../../data-analytics/sqlite-database/)

## Additional Resources

- **[Budget Controls Example](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/06-advanced/budget-controls.config.yaml)** - Complete budget config
- **[Cost Management](../../user-guide/features/cost-management/)** - Full cost management documentation
- **[Provider Pricing](../../providers/overview/)** - Compare provider costs

# CI/CD Integration

Integrate brand monitoring into your continuous integration pipeline.

## Quick Start

See the automation examples:

- **[automated_monitoring.py](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py)** - Complete Python script for scheduled monitoring
- **[Code Examples README](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - All automation examples

## GitHub Actions Example

`.github/workflows/monitoring.yml`:

```yaml
name: Brand Monitoring

on:
  schedule:
    - cron: '0 9 * * *'  # Daily at 9 AM UTC
  workflow_dispatch:      # Allow manual triggers

jobs:
  monitor:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install uv
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
          echo "$HOME/.cargo/bin" >> $GITHUB_PATH

      - name: Install dependencies
        run: |
          uv sync

      - name: Run monitoring
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          uv run llm-answer-watcher run \
            --config examples/07-real-world/saas-brand-monitoring.config.yaml \
            --yes \
            --format json

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: monitoring-results-${{ github.run_id }}
          path: |
            output/
            !output/*.db

      - name: Check for visibility drops
        run: |
          # Custom script to analyze results and alert on issues
          python examples/code-examples/analyze_results.py
```

## Exit Code Handling

The CLI returns specific exit codes that can be used in CI/CD:

| Exit Code | Meaning             | Action                             |
| --------- | ------------------- | ---------------------------------- |
| `0`       | Success             | All queries completed successfully |
| `1`       | Configuration error | Fix config file or API keys        |
| `2`       | Database error      | Check database path/permissions    |
| `3`       | Partial failure     | Some queries failed, investigate   |
| `4`       | Complete failure    | All queries failed, critical issue |

### Example with Exit Code Handling

```yaml
- name: Run monitoring
  id: monitor
  run: |
    uv run llm-answer-watcher run --config config.yaml --yes --format json
    echo "exit_code=$?" >> $GITHUB_OUTPUT
  continue-on-error: true

- name: Check result
  run: |
    if [ "${{ steps.monitor.outputs.exit_code }}" == "0" ]; then
      echo "✅ Monitoring completed successfully"
    elif [ "${{ steps.monitor.outputs.exit_code }}" == "3" ]; then
      echo "⚠️ Partial failure - some queries failed"
      exit 0  # Don't fail the workflow
    else
      echo "❌ Monitoring failed with exit code ${{ steps.monitor.outputs.exit_code }}"
      exit 1
    fi
```

## Python Automation Script

Complete automation example with notifications:

```python
#!/usr/bin/env python3
"""
Automated brand monitoring with Slack notifications.

See: examples/code-examples/automated_monitoring.py for full implementation
"""

import subprocess
import json
import sqlite3
from datetime import datetime

def run_monitoring():
    """Run LLM Answer Watcher."""
    result = subprocess.run([
        "llm-answer-watcher", "run",
        "--config", "examples/07-real-world/saas-brand-monitoring.config.yaml",
        "--yes",
        "--format", "json"
    ], capture_output=True, text=True)

    return result.returncode, json.loads(result.stdout)

def check_visibility_drop(db_path, threshold=0.5):
    """Check if brand visibility has dropped."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Get recent visibility rate
    cursor.execute("""
        SELECT
            COUNT(DISTINCT CASE WHEN is_mine = 1 THEN intent_id END) * 1.0 /
            COUNT(DISTINCT intent_id) as visibility_rate
        FROM mentions
        WHERE run_id IN (
            SELECT run_id FROM runs ORDER BY timestamp_utc DESC LIMIT 1
        )
    """)

    current_rate = cursor.fetchone()[0] or 0
    conn.close()

    return current_rate < threshold

def send_slack_alert(message):
    """Send alert to Slack (implement based on your setup)."""
    # See examples/code-examples/ for Slack integration
    pass

def main():
    exit_code, results = run_monitoring()

    if exit_code == 0:
        print(f"✅ Monitoring completed: {results['run_id']}")

        # Check for visibility drops
        if check_visibility_drop(results['sqlite_db_path']):
            send_slack_alert("⚠️ Brand visibility has dropped below 50%")
    else:
        print(f"❌ Monitoring failed with exit code {exit_code}")
        send_slack_alert(f"Monitoring failed: {exit_code}")

if __name__ == "__main__":
    main()
```

**Full implementation**: [`examples/code-examples/automated_monitoring.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/automated_monitoring.py)

## Cron Job Setup

### Daily Monitoring

```bash
# Edit crontab
crontab -e

# Add this line (runs daily at 9 AM)
0 9 * * * cd /path/to/llm-answer-watcher && .venv/bin/python examples/code-examples/automated_monitoring.py >> logs/cron.log 2>&1
```

### Weekly Report

```bash
# Weekly comprehensive analysis (Mondays at 9 AM)
0 9 * * 1 cd /path/to/llm-answer-watcher && .venv/bin/llm-answer-watcher run --config examples/02-providers/multi-provider-comparison.config.yaml --yes --quiet
```

## Docker Integration

### Dockerfile

```dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install uv
RUN pip install uv

# Copy project
COPY . .

# Install dependencies
RUN uv sync

# Set entrypoint
ENTRYPOINT ["uv", "run", "llm-answer-watcher"]
CMD ["--help"]
```

### Docker Compose

```yaml
version: '3.8'

services:
  monitoring:
    build: .
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./output:/app/output
    command: >
      run
      --config examples/07-real-world/saas-brand-monitoring.config.yaml
      --yes
      --format json
```

Run with:

```bash
docker-compose run monitoring
```

## Monitoring Multiple Brands

### Matrix Strategy in GitHub Actions

```yaml
jobs:
  monitor:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        brand:
          - brand-a
          - brand-b
          - brand-c

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Run monitoring for ${{ matrix.brand }}
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          llm-answer-watcher run \
            --config configs/${{ matrix.brand }}.config.yaml \
            --yes
```

## Data Export and Analysis

### Export Results to CSV

See [`examples/code-examples/export_to_csv.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/export_to_csv.py):

```python
#!/usr/bin/env python3
"""Export monitoring results to CSV for analysis."""

import sqlite3
import csv

def export_mentions_to_csv(db_path, output_path):
    """Export mentions table to CSV."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    cursor.execute("""
        SELECT
            run_id,
            intent_id,
            model_provider,
            brand,
            rank_position,
            is_mine,
            timestamp_utc
        FROM mentions
        ORDER BY timestamp_utc DESC
    """)

    with open(output_path, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['run_id', 'intent_id', 'provider', 'brand', 'rank', 'is_mine', 'timestamp'])
        writer.writerows(cursor.fetchall())

    conn.close()

if __name__ == "__main__":
    export_mentions_to_csv("output/watcher.db", "output/mentions.csv")
```

### Analyze Results

See [`examples/code-examples/analyze_results.py`](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples/analyze_results.py):

```python
#!/usr/bin/env python3
"""Analyze monitoring results and generate insights."""

import json
import sqlite3

def analyze_latest_run(db_path):
    """Analyze the most recent monitoring run."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Get latest run
    cursor.execute("SELECT run_id FROM runs ORDER BY timestamp_utc DESC LIMIT 1")
    run_id = cursor.fetchone()[0]

    # Calculate metrics
    cursor.execute("""
        SELECT
            COUNT(DISTINCT CASE WHEN is_mine = 1 THEN intent_id END) as my_coverage,
            COUNT(DISTINCT intent_id) as total_intents,
            AVG(CASE WHEN is_mine = 1 THEN rank_position END) as my_avg_rank
        FROM mentions
        WHERE run_id = ?
    """, (run_id,))

    my_coverage, total_intents, my_avg_rank = cursor.fetchone()

    conn.close()

    print(f"📊 Analysis for {run_id}")
    print(f"  Coverage: {my_coverage}/{total_intents} intents ({my_coverage/total_intents*100:.1f}%)")
    print(f"  Average rank: {my_avg_rank:.2f}")

if __name__ == "__main__":
    analyze_latest_run("output/watcher.db")
```

## Alerting and Notifications

### Slack Webhook Integration

```python
import requests

def send_slack_notification(webhook_url, message):
    """Send notification to Slack."""
    payload = {
        "text": message,
        "blocks": [
            {
                "type": "section",
                "text": {"type": "mrkdwn", "text": message}
            }
        ]
    }
    requests.post(webhook_url, json=payload)

# Usage
if brand_visibility_dropped:
    send_slack_notification(
        os.getenv("SLACK_WEBHOOK_URL"),
        "⚠️ Brand visibility dropped to 30% (threshold: 50%)"
    )
```

### Email Alerts

```python
import smtplib
from email.message import EmailMessage

def send_email_alert(subject, body):
    """Send email alert."""
    msg = EmailMessage()
    msg['Subject'] = subject
    msg['From'] = 'monitoring@yourdomain.com'
    msg['To'] = 'team@yourdomain.com'
    msg.set_content(body)

    with smtplib.SMTP('smtp.gmail.com', 587) as smtp:
        smtp.starttls()
        smtp.login(os.getenv('EMAIL_USER'), os.getenv('EMAIL_PASS'))
        smtp.send_message(msg)
```

## Best Practices

### 1. Store Secrets Securely

Use environment variables or secret managers:

```yaml
env:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  # Never hardcode API keys!
```

### 2. Rate Limiting

Avoid hitting API rate limits:

```yaml
run_settings:
  max_concurrent_requests: 2
  delay_between_queries: 1
```

### 3. Cost Controls

Enable budget limits in CI/CD:

```yaml
budget:
  enabled: true
  max_per_run_usd: 0.50  # Prevent runaway costs
```

### 4. Artifact Retention

Upload results but manage storage:

```yaml
- name: Upload results
  uses: actions/upload-artifact@v4
  with:
    name: results
    path: output/
    retention-days: 30  # Auto-delete after 30 days
```

## Next Steps

- **Code Examples**

  ______________________________________________________________________

  Explore all automation scripts

  [Code Examples →](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)

- **Automation Guide**

  ______________________________________________________________________

  Complete automation documentation

  [Automation Guide →](../../user-guide/usage/automation/)

- **Database Queries**

  ______________________________________________________________________

  SQL examples for analysis

  [Query Examples →](../../data-analytics/query-examples/)

- **Trends Analysis**

  ______________________________________________________________________

  Track changes over time

  [Trends Guide →](../../data-analytics/trends-analysis/)

## Additional Resources

- **[Code Examples](https://github.com/nibzard/llm-answer-watcher/tree/main/examples/code-examples)** - Python automation scripts
- **[Automation Guide](../../user-guide/usage/automation/)** - Complete automation documentation
- **[CLI Reference](../../reference/cli-reference/)** - All CLI options and exit codes
- **[Python API](../../reference/python-api/)** - Programmatic usage guide
# Data & Analytics

# Output Structure

Understanding the file and directory structure of monitoring runs.

## Directory Layout

```text
output/
├── watcher.db                          # SQLite database
└── YYYY-MM-DDTHH-MM-SSZ/              # Run directory
    ├── run_meta.json                   # Run summary
    ├── report.html                     # HTML report
    ├── intent_*_raw_*.json            # Raw LLM responses
    ├── intent_*_parsed_*.json         # Extracted data
    └── intent_*_error_*.json          # Errors (if any)
```

## File Descriptions

### `run_meta.json`

Summary of the entire run with costs and stats.

### `report.html`

Interactive HTML report with visualizations.

### `intent_*_raw_*.json`

Raw LLM response with metadata.

### `intent_*_parsed_*.json`

Extracted brand mentions and ranks.

### `watcher.db`

SQLite database with all historical data.

See [SQLite Database](../sqlite-database/) for database schema.

# SQLite Database

LLM Answer Watcher stores all monitoring data in a local SQLite database for historical tracking and trend analysis.

## Database Location

Default path: `./output/watcher.db`

Configure in `watcher.config.yaml`:

```yaml
run_settings:
  sqlite_db_path: "./output/watcher.db"
```

## Schema Overview

The database has 4 main tables plus schema versioning:

```text
schema_version  → Track database migrations
runs            → One row per CLI execution
answers_raw     → Full LLM responses with metadata
mentions        → Exploded brand mentions for analysis
operations      → Post-intent operation results (optional)
```

## Schema Details

### Table: runs

One row per `llm-answer-watcher run` execution.

**Columns:**

```sql
CREATE TABLE runs (
    run_id TEXT PRIMARY KEY,              -- YYYY-MM-DDTHH-MM-SSZ
    timestamp_utc TEXT NOT NULL,          -- ISO 8601 with Z suffix
    config_file TEXT,                     -- Path to config file used
    total_cost_usd REAL NOT NULL,         -- Sum of all query costs
    queries_completed INTEGER NOT NULL,   -- Successful queries
    queries_failed INTEGER NOT NULL,      -- Failed queries
    status TEXT NOT NULL,                 -- "success", "partial", "failed"
    output_dir TEXT NOT NULL             -- Directory with run artifacts
);
```

**Example Query:**

```sql
-- View recent runs
SELECT run_id, timestamp_utc, status, total_cost_usd, queries_completed
FROM runs
ORDER BY timestamp_utc DESC
LIMIT 10;
```

### Table: answers_raw

One row per intent × model combination.

**Columns:**

```sql
CREATE TABLE answers_raw (
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    model_provider TEXT NOT NULL,         -- "openai", "anthropic", etc.
    model_name TEXT NOT NULL,             -- "gpt-4o-mini", etc.
    timestamp_utc TEXT NOT NULL,
    answer_text TEXT NOT NULL,            -- Full LLM response
    tokens_used INTEGER,                  -- Total tokens (input + output)
    estimated_cost_usd REAL,              -- Query cost
    extraction_method TEXT,               -- "regex" or "function_calling"
    web_search_count INTEGER DEFAULT 0,   -- Number of web searches
    error_message TEXT,                   -- NULL if successful

    PRIMARY KEY (run_id, intent_id, model_provider, model_name),
    FOREIGN KEY (run_id) REFERENCES runs(run_id)
);
```

**Example Query:**

```sql
-- Cost by provider
SELECT
    model_provider,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost_per_query
FROM answers_raw
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider
ORDER BY total_cost DESC;
```

### Table: mentions

One row per brand mention. Denormalized for fast queries.

**Columns:**

```sql
CREATE TABLE mentions (
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    model_provider TEXT NOT NULL,
    model_name TEXT NOT NULL,
    timestamp_utc TEXT NOT NULL,
    brand TEXT NOT NULL,                  -- Original brand name
    normalized_name TEXT NOT NULL,        -- Lowercase, hyphenated
    is_mine INTEGER NOT NULL,             -- 1 = your brand, 0 = competitor
    rank_position INTEGER,                -- 1, 2, 3... or NULL
    detection_method TEXT NOT NULL,       -- "regex" or "function_calling"
    confidence REAL DEFAULT 1.0,          -- 0.0-1.0 confidence score

    PRIMARY KEY (run_id, intent_id, model_provider, model_name, normalized_name),
    FOREIGN KEY (run_id) REFERENCES runs(run_id)
);

CREATE INDEX idx_mentions_timestamp ON mentions(timestamp_utc);
CREATE INDEX idx_mentions_brand ON mentions(brand);
CREATE INDEX idx_mentions_normalized ON mentions(normalized_name);
CREATE INDEX idx_mentions_rank ON mentions(rank_position);
CREATE INDEX idx_mentions_is_mine ON mentions(is_mine);
```

**Example Query:**

```sql
-- Brand mentions over time
SELECT
    DATE(timestamp_utc) as date,
    brand,
    COUNT(*) as mentions,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE normalized_name = 'warmly'
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc), brand
ORDER BY date DESC;
```

### Table: schema_version

Tracks database migrations.

**Columns:**

```sql
CREATE TABLE schema_version (
    version INTEGER PRIMARY KEY,
    applied_at TEXT NOT NULL
);
```

**Current version:** 3

## Common Queries

### Basic Analytics

**Your brand visibility:**

```sql
-- How often do we appear?
SELECT
    COUNT(DISTINCT run_id) as runs_appeared,
    COUNT(*) as total_mentions,
    AVG(rank_position) as average_rank
FROM mentions
WHERE is_mine = 1
  AND timestamp_utc >= datetime('now', '-30 days');
```

**Competitor comparison:**

```sql
SELECT
    brand,
    COUNT(*) as mentions,
    COUNT(DISTINCT intent_id) as intents_appeared,
    AVG(rank_position) as avg_rank,
    MIN(rank_position) as best_rank,
    COUNT(CASE WHEN rank_position = 1 THEN 1 END) as first_place_count
FROM mentions
WHERE rank_position IS NOT NULL
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY brand
ORDER BY mentions DESC;
```

### Trend Analysis

**Daily brand mentions:**

```sql
SELECT
    DATE(timestamp_utc) as date,
    COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_mentions,
    COUNT(CASE WHEN is_mine = 0 THEN 1 END) as competitor_mentions,
    COUNT(*) as total_mentions
FROM mentions
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

**Ranking trends:**

```sql
SELECT
    DATE(timestamp_utc) as date,
    AVG(CASE WHEN is_mine = 1 THEN rank_position END) as my_avg_rank,
    AVG(CASE WHEN is_mine = 0 THEN rank_position END) as competitor_avg_rank
FROM mentions
WHERE rank_position IS NOT NULL
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

### Intent Analysis

**Which intents work best for your brand?**

```sql
SELECT
    intent_id,
    COUNT(*) as total_mentions,
    COUNT(DISTINCT model_provider) as providers,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE is_mine = 1
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY intent_id
ORDER BY total_mentions DESC;
```

**Intents where you're NOT mentioned:**

```sql
-- Get all intent IDs from recent runs
WITH recent_intents AS (
    SELECT DISTINCT intent_id
    FROM answers_raw
    WHERE timestamp_utc >= datetime('now', '-7 days')
),
-- Get intents where you appeared
appeared_intents AS (
    SELECT DISTINCT intent_id
    FROM mentions
    WHERE is_mine = 1
      AND timestamp_utc >= datetime('now', '-7 days')
)
-- Find the difference
SELECT ri.intent_id
FROM recent_intents ri
LEFT JOIN appeared_intents ai ON ri.intent_id = ai.intent_id
WHERE ai.intent_id IS NULL;
```

### Cost Analysis

**Total spending:**

```sql
SELECT
    SUM(total_cost_usd) as total_spent,
    COUNT(*) as total_runs,
    AVG(total_cost_usd) as avg_cost_per_run
FROM runs
WHERE timestamp_utc >= datetime('now', '-30 days');
```

**Cost by provider:**

```sql
SELECT
    model_provider,
    model_name,
    COUNT(*) as queries,
    SUM(estimated_cost_usd) as total_cost,
    AVG(estimated_cost_usd) as avg_cost
FROM answers_raw
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider, model_name
ORDER BY total_cost DESC;
```

**Cost per brand mention:**

```sql
SELECT
    r.run_id,
    r.total_cost_usd,
    COUNT(m.brand) as mentions,
    r.total_cost_usd / COUNT(m.brand) as cost_per_mention
FROM runs r
JOIN mentions m ON r.run_id = m.run_id
WHERE r.timestamp_utc >= datetime('now', '-30 days')
  AND m.is_mine = 1
GROUP BY r.run_id, r.total_cost_usd
ORDER BY cost_per_mention ASC;
```

### Provider Comparison

**Which provider mentions you more?**

```sql
SELECT
    model_provider,
    COUNT(CASE WHEN is_mine = 1 THEN 1 END) as my_mentions,
    COUNT(*) as total_mentions,
    CAST(COUNT(CASE WHEN is_mine = 1 THEN 1 END) AS REAL) / COUNT(*) * 100 as my_mention_rate
FROM mentions
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider
ORDER BY my_mention_rate DESC;
```

**Average ranking by provider:**

```sql
SELECT
    model_provider,
    model_name,
    COUNT(*) as mentions,
    AVG(rank_position) as avg_rank
FROM mentions
WHERE is_mine = 1
  AND rank_position IS NOT NULL
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY model_provider, model_name
ORDER BY avg_rank ASC;
```

## Exporting Data

### CSV Export

```bash
# Export mentions to CSV
sqlite3 -header -csv output/watcher.db \
  "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')" \
  > mentions_30days.csv

# Export runs summary
sqlite3 -header -csv output/watcher.db \
  "SELECT * FROM runs ORDER BY timestamp_utc DESC" \
  > runs_summary.csv
```

### JSON Export

```bash
# Export as JSON Lines
sqlite3 output/watcher.db <<SQL | jq -c '.'
SELECT json_object(
  'brand', brand,
  'timestamp', timestamp_utc,
  'rank', rank_position,
  'is_mine', is_mine
) as json_data
FROM mentions
WHERE timestamp_utc >= datetime('now', '-7 days');
SQL
```

### Excel/Google Sheets

1. Export to CSV:

```bash
sqlite3 -header -csv output/watcher.db \
  "SELECT * FROM mentions" > mentions.csv
```

1. Import CSV into Excel or Google Sheets

## Database Maintenance

### Vacuum Database

Reclaim space after deletions:

```bash
sqlite3 output/watcher.db "VACUUM;"
```

### Delete Old Data

```sql
-- Delete runs older than 90 days
DELETE FROM runs
WHERE timestamp_utc < datetime('now', '-90 days');

-- Vacuum to reclaim space
VACUUM;
```

### Check Database Size

```bash
ls -lh output/watcher.db
# Example: -rw-r--r-- 1 user user 2.5M Nov 5 14:30 watcher.db
```

### Backup Database

```bash
# Simple copy
cp output/watcher.db output/watcher.backup.db

# Or use SQLite backup command
sqlite3 output/watcher.db ".backup output/watcher.backup.db"

# Compress backup
gzip output/watcher.backup.db
```

## Schema Migrations

### Check Schema Version

```sql
SELECT * FROM schema_version ORDER BY version DESC;
```

**Output:**

```text
version | applied_at
--------|---------------------
3       | 2025-11-05T14:30:00Z
2       | 2025-10-25T10:15:00Z
1       | 2025-10-20T09:00:00Z
```

### Migration Process

Migrations run automatically on startup. No manual intervention needed.

**What happens:**

1. Check current schema version
1. Compare to required version
1. Apply migrations sequentially
1. Update schema_version table

### Manual Migration (Advanced)

If needed, manually upgrade:

```python
from llm_answer_watcher.storage.db import init_db_if_needed

init_db_if_needed("./output/watcher.db")
```

## Connecting with BI Tools

### Metabase

1. Add SQLite database
1. Point to `./output/watcher.db`
1. Create dashboards

### Tableau

1. Use SQLite connector
1. Connect to `watcher.db`
1. Create visualizations

### Python/Pandas

```python
import sqlite3
import pandas as pd

# Connect to database
conn = sqlite3.connect("output/watcher.db")

# Load mentions into DataFrame
df = pd.read_sql_query(
    "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')",
    conn
)

# Analyze
print(df.groupby('brand')['rank_position'].mean())

# Close connection
conn.close()
```

### R

```r
library(DBI)
library(RSQLite)

# Connect
conn <- dbConnect(RSQLite::SQLite(), "output/watcher.db")

# Query
mentions <- dbGetQuery(conn,
  "SELECT * FROM mentions WHERE timestamp_utc >= datetime('now', '-30 days')"
)

# Analyze
aggregate(rank_position ~ brand, data=mentions, FUN=mean)

# Disconnect
dbDisconnect(conn)
```

## Performance Tips

### Indexes

Indexes already exist on:

- `timestamp_utc`
- `brand`
- `normalized_name`
- `rank_position`
- `is_mine`

### Query Optimization

**Use indexed columns in WHERE:**

```sql
-- ✅ Fast - uses index
WHERE timestamp_utc >= datetime('now', '-30 days')

-- ❌ Slow - no index
WHERE DATE(timestamp_utc) = '2025-11-05'
```

**Limit result sets:**

```sql
-- ✅ Good - only get what you need
SELECT brand, rank_position FROM mentions
WHERE is_mine = 1
LIMIT 100;

-- ❌ Bad - retrieves all columns
SELECT * FROM mentions;
```

### Analyze Query Plans

```sql
EXPLAIN QUERY PLAN
SELECT brand, COUNT(*) FROM mentions
WHERE timestamp_utc >= datetime('now', '-30 days')
GROUP BY brand;
```

## Troubleshooting

### Database Locked

**Problem:** `database is locked`

**Solution:**

```bash
# Check for locks
lsof output/watcher.db

# Kill process if safe
kill -9 <PID>

# Or wait and retry
```

### Corrupted Database

**Problem:** Database errors on queries

**Solution:**

```bash
# Check integrity
sqlite3 output/watcher.db "PRAGMA integrity_check;"

# If corrupted, restore from backup
cp output/watcher.backup.db output/watcher.db
```

### Schema Version Mismatch

**Problem:** "Schema version X is newer than expected Y"

**Solution:** Update LLM Answer Watcher to latest version:

```bash
pip install --upgrade llm-answer-watcher
```

## Next Steps

- **Query Examples**

  ______________________________________________________________________

  More SQL query examples

  [Query Examples →](../query-examples/)

- **Trends Analysis**

  ______________________________________________________________________

  Track changes over time

  [Trends Analysis →](../trends-analysis/)

- **Output Structure**

  ______________________________________________________________________

  Understand JSON output files

  [Output Structure →](../output-structure/)

- **Database Schema**

  ______________________________________________________________________

  Complete schema reference

  [Schema Reference →](../../reference/database-schema/)

# SQL Query Examples

Useful SQL queries for analyzing monitoring data.

## Brand Performance

```sql
-- Your brand's mention rate
SELECT
  COUNT(DISTINCT run_id) as total_runs,
  COUNT(*) as total_mentions,
  CAST(COUNT(*) AS FLOAT) / COUNT(DISTINCT run_id) as mentions_per_run
FROM mentions
WHERE normalized_name = 'yourbrand';
```

## Competitor Analysis

```sql
-- Top mentioned competitors
SELECT
  brand,
  COUNT(*) as mentions,
  AVG(rank_position) as avg_rank
FROM mentions
WHERE normalized_name != 'yourbrand'
GROUP BY brand
ORDER BY mentions DESC
LIMIT 10;
```

## Trends Over Time

```sql
-- Weekly mention trends
SELECT
  strftime('%Y-W%W', timestamp_utc) as week,
  COUNT(*) as mentions
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY week
ORDER BY week DESC;
```

See [SQLite Database](../sqlite-database/) for schema details.

# Trends Analysis

Analyze brand visibility trends over time.

## Time-Series Analysis

```sql
-- Daily mention count
SELECT
  DATE(timestamp_utc) as date,
  COUNT(*) as mentions,
  AVG(rank_position) as avg_rank
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY DATE(timestamp_utc)
ORDER BY date DESC;
```

## Comparative Trends

```sql
-- Your brand vs top competitor
SELECT
  DATE(m.timestamp_utc) as date,
  m.brand,
  COUNT(*) as mentions
FROM mentions m
WHERE m.normalized_name IN ('yourbrand', 'competitor')
GROUP BY DATE(m.timestamp_utc), m.brand
ORDER BY date DESC, mentions DESC;
```

## Export to CSV

```bash
sqlite3 -header -csv output/watcher.db   "SELECT * FROM mentions WHERE normalized_name = 'yourbrand'"   > brand_data.csv
```

See [Query Examples](../query-examples/) for more queries.
# Evaluation Framework

# Evaluation Framework

Quality control and accuracy testing for brand extraction.

## Purpose

The evaluation framework validates:

- Mention detection accuracy
- Rank extraction correctness
- False positive/negative rates

## Running Evaluations

```bash
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml
```

## Metrics Tracked

- **Mention Precision**: Correct mentions / total found
- **Mention Recall**: Correct mentions / expected mentions
- **Rank Accuracy**: Correctly ranked brands
- **F1 Score**: Harmonic mean of precision/recall

See [Running Evals](../running-evals/) for detailed usage.

# Running Evaluations

How to run the evaluation suite and interpret results.

## Basic Usage

```bash
llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml
```

## Command Options

```bash
llm-answer-watcher eval   --fixtures fixtures.yaml   --db eval_results.db   --format json
```

## Example Output

```text
✅ Evaluation completed
├── Test cases: 15
├── Passed: 14
├── Failed: 1
└── Pass rate: 93.3%

Metrics:
├── Mention Precision: 95.2%
├── Mention Recall: 91.8%
├── Rank Accuracy: 88.5%
└── F1 Score: 93.5%
```

See [Metrics](../metrics/) for metric definitions.

# Evaluation Metrics

Understanding evaluation metrics and thresholds.

## Core Metrics

### Mention Precision

Ratio of correct mentions to total mentions found.

**Threshold**: ≥ 90%

### Mention Recall

Ratio of correct mentions to expected mentions.

**Threshold**: ≥ 80%

### Mention F1

Harmonic mean of precision and recall.

**Threshold**: ≥ 85%

### Rank Accuracy

Percentage of correctly ranked brands.

**Threshold**: ≥ 85%

## Interpreting Results

- **High precision, low recall**: Missing mentions
- **Low precision, high recall**: False positives
- **Low both**: Extraction needs improvement

See [Test Cases](../test-cases/) for creating fixtures.

# Test Cases

Creating evaluation test fixtures.

## Fixture Format

```yaml
test_cases:
  - description: "Brand detection test"
    intent_id: "test-intent"
    llm_answer_text: |
      The best tools are:
      1. YourBrand
      2. CompetitorA

    brands_mine: ["YourBrand"]
    brands_competitors: ["CompetitorA"]

    expected_my_mentions: ["YourBrand"]
    expected_competitor_mentions: ["CompetitorA"]

    expected_ranked_list:
      - "YourBrand"
      - "CompetitorA"
```

## Running Custom Fixtures

```bash
llm-answer-watcher eval --fixtures my_tests.yaml
```

See [CI Integration](../ci-integration/) for automated testing.

# CI Integration

Run evaluations in continuous integration.

## GitHub Actions

```yaml
- name: Run Evaluation Suite
  run: |
    uv run llm-answer-watcher eval       --fixtures evals/testcases/fixtures.yaml       --format json

- name: Check Results
  run: |
    if [ $? -ne 0 ]; then
      echo "Evaluations failed"
      exit 1
    fi
```

## Exit Codes

- `0`: All tests passed
- `1`: Tests failed (below thresholds)
- `2`: Configuration error

See [Running Evals](../running-evals/) for usage details.
# Advanced Topics

# Architecture

LLM Answer Watcher follows Domain-Driven Design principles with strict separation of concerns.

## Core Domains

```text
llm_answer_watcher/
├── config/         # Configuration loading and validation
├── llm_runner/     # LLM client abstraction
├── extractor/      # Brand mention detection
├── storage/        # SQLite and JSON persistence
├── report/         # HTML report generation
├── utils/          # Shared utilities
└── cli.py          # CLI interface
```

## Design Patterns

### 1. Provider Abstraction

```python
class LLMClient(Protocol):
    def generate_answer(self, prompt: str) -> LLMResponse:
        ...

def build_client(provider: str, model: str) -> LLMClient:
    ...
```

### 2. API-First Contract

```python
def run_all(config: RuntimeConfig) -> dict:
    # Internal "POST /run" contract
    # OSS CLI calls in-process
    # Cloud will expose over HTTP
    return {"run_id": "...", "cost_usd": 0.01}
```

### 3. Dual-Mode CLI

```python
class OutputMode(Enum):
    HUMAN = "human"  # Rich formatting
    JSON = "json"    # Structured output
    QUIET = "quiet"  # Minimal output
```

See [API Contract](../api-contract/) for internal API details.

# API Contract

Internal API designed for future HTTP exposure.

## Core Contract

```python
def run_all(config: RuntimeConfig) -> dict:
    """
    Execute monitoring run.

    Args:
        config: Validated runtime configuration

    Returns:
        {
            "run_id": "YYYY-MM-DDTHH-MM-SSZ",
            "status": "success" | "partial" | "failed",
            "queries_completed": int,
            "queries_failed": int,
            "total_cost_usd": float,
            "output_dir": str,
            "brands_detected": {...}
        }
    """
```

## Future HTTP API

The internal contract is designed to become an HTTP API:

```http
POST /api/v1/run
Content-Type: application/json

{
  "config": {...},
  "return_format": "json"
}
```

## Provider Interface

```python
@dataclass
class LLMResponse:
    answer_text: str
    tokens_used: int
    cost_usd: float
    provider: str
    model_name: str
    timestamp_utc: str
```

See [Architecture](../architecture/) for overall design.

# Extending Providers

Add support for new LLM providers.

## Provider Interface

```python
from llm_answer_watcher.llm_runner.models import LLMClient, LLMResponse

class MyCustomClient:
    def __init__(self, model_name: str, api_key: str, system_prompt: str):
        self.model = model_name
        self.api_key = api_key
        self.system_prompt = system_prompt

    def generate_answer(self, prompt: str) -> LLMResponse:
        # Call your LLM API
        response = call_my_llm_api(prompt)

        return LLMResponse(
            answer_text=response.text,
            tokens_used=response.tokens,
            cost_usd=calculate_cost(response),
            provider="my_provider",
            model_name=self.model,
            timestamp_utc=utc_timestamp()
        )
```

## Registering Provider

```python
# llm_runner/models.py
def build_client(provider: str, model_name: str, ...) -> LLMClient:
    if provider == "my_provider":
        return MyCustomClient(...)
    # ...
```

## Testing Your Provider

```python
def test_my_provider(httpx_mock):
    httpx_mock.add_response(...)
    client = MyCustomClient(...)
    response = client.generate_answer("test")
    assert response.provider == "my_provider"
```

See [Architecture](../architecture/) for design patterns.

# Custom System Prompts

Customize system prompts for LLM queries.

## Built-in Prompts

Located in `llm_answer_watcher/system_prompts/`:

```text
system_prompts/
├── openai/
│   ├── gpt-4-default.json
│   └── extraction-default.json
├── anthropic/
│   └── default.json
└── mistral/
    └── default.json
```

## Using Custom Prompts

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/custom-prompt"
```

## Creating Custom Prompts

1. Create JSON file in `system_prompts/provider/`:

```json
{
  "role": "system",
  "content": "You are a helpful assistant focused on..."
}
```

1. Reference in config:

```yaml
system_prompt: "openai/custom-prompt"
```

## Prompt Guidelines

- Keep prompts neutral (avoid biasing toward your brand)
- Be concise yet comprehensive
- Test with evaluation framework

See [API Contract](../api-contract/) for technical details.

# Security

Security best practices for LLM Answer Watcher.

## API Key Management

### ✅ Do This

```bash
# Use environment variables
export OPENAI_API_KEY=sk-your-key

# Use secrets management
OPENAI_API_KEY=$(aws secretsmanager get-secret-value ...)

# Use .env files (add to .gitignore)
echo "OPENAI_API_KEY=sk-..." > .env
echo ".env" >> .gitignore
```

### ❌ Don't Do This

```yaml
# NEVER hardcode API keys in config files
models:
  - provider: "openai"
    api_key: "sk-hardcoded-key"  # DON'T DO THIS!
```

## SQL Injection Prevention

The tool uses parameterized queries:

```python
# ✅ Safe - parameterized
cursor.execute("SELECT * FROM runs WHERE id=?", (run_id,))

# ❌ Never done - string concatenation
cursor.execute(f"SELECT * FROM runs WHERE id='{run_id}'")
```

## XSS Prevention

Jinja2 autoescaping enabled:

```python
# ✅ Safe - autoescaping on
env = Environment(loader=..., autoescape=True)
```

## Best Practices

1. **Never commit secrets**
1. **Rotate API keys** regularly
1. **Use read-only file permissions** for configs
1. **Review logs** before sharing
1. **Keep dependencies updated**

## Reporting Security Issues

Email: [security contact] (replace with actual contact)

See [Contributing](../../contributing/development-setup/).

# Performance

Optimizing LLM Answer Watcher for speed and efficiency.

## Query Performance

### Parallel Queries (Future)

Currently synchronous. Async support planned:

```python
# Future: parallel execution
await asyncio.gather(*[
    query_model(intent, model)
    for intent in intents
    for model in models
])
```

### Current: Sequential

```python
# Current: one at a time
for intent in intents:
    for model in models:
        query_model(intent, model)
```

## Cost Optimization

### Use Cheaper Models

```yaml
models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # $0.15/1M vs $2.50/1M
```

### Regex vs LLM Extraction

```yaml
# Fast and cheap (recommended)
use_llm_rank_extraction: false

# Accurate but costly
use_llm_rank_extraction: true
```

## Database Performance

### Indexes

SQLite indexes on:

- `timestamp_utc`
- `intent_id`
- `brand`
- `rank_position`

### Vacuum

Periodically compact database:

```bash
sqlite3 output/watcher.db "VACUUM;"
```

## Caching

### Pricing Cache

LLM prices cached for 24 hours to reduce API calls.

### Future Caching

Planned:

- Response caching (identical queries)
- Extracted data caching

See [Architecture](../architecture/) for design details.
# Reference

# CLI Reference

Complete command-line interface reference.

## Commands

### `run`

Execute monitoring run.

```bash
llm-answer-watcher run --config PATH [OPTIONS]
```

**Options**:

- `--config PATH` (required): Configuration file
- `--format [human|json|quiet]`: Output format
- `--yes, -y`: Skip prompts
- `--force`: Override budget limits
- `--verbose, -v`: Verbose logging

### `validate`

Validate configuration.

```bash
llm-answer-watcher validate --config PATH [OPTIONS]
```

### `eval`

Run evaluation suite.

```bash
llm-answer-watcher eval --fixtures PATH [OPTIONS]
```

### `prices show`

Display LLM pricing.

```bash
llm-answer-watcher prices show [OPTIONS]
```

See [CLI Commands](../../user-guide/usage/cli-commands/) for detailed usage.

# Configuration Schema

Complete YAML configuration schema reference.

## Root Structure

```yaml
run_settings:  # Required
brands:        # Required
intents:       # Required
```

## `run_settings`

```yaml
run_settings:
  output_dir: string           # Required
  sqlite_db_path: string       # Required
  models: [ModelConfig]        # Required
  use_llm_rank_extraction: bool  # Optional, default: false
  extraction_settings: ExtractionSettings  # Optional
  budget: BudgetConfig         # Optional
  web_search: WebSearchConfig  # Optional
```

## `ModelConfig`

```yaml
provider: string              # Required: openai, anthropic, etc.
model_name: string            # Required
env_api_key: string           # Required
system_prompt: string         # Optional
```

## `brands`

```yaml
brands:
  mine: [string]              # Required
  competitors: [string]       # Required
```

## `intents`

```yaml
intents:
  - id: string                # Required
    prompt: string            # Required
    operations: [Operation]   # Optional
```

See [Configuration Overview](../../user-guide/configuration/overview/).

# Database Schema

SQLite database schema reference.

## Tables

### `schema_version`

```sql
CREATE TABLE schema_version (
    version INTEGER PRIMARY KEY
);
```

### `runs`

```sql
CREATE TABLE runs (
    run_id TEXT PRIMARY KEY,
    timestamp_utc TEXT NOT NULL,
    config_path TEXT,
    total_cost_usd REAL,
    queries_completed INTEGER,
    queries_failed INTEGER
);
```

### `answers_raw`

```sql
CREATE TABLE answers_raw (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    model_provider TEXT NOT NULL,
    model_name TEXT NOT NULL,
    answer_text TEXT NOT NULL,
    tokens_used INTEGER,
    estimated_cost_usd REAL,
    timestamp_utc TEXT NOT NULL,
    UNIQUE(run_id, intent_id, model_provider, model_name)
);
```

### `mentions`

```sql
CREATE TABLE mentions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    model_provider TEXT NOT NULL,
    model_name TEXT NOT NULL,
    brand TEXT NOT NULL,
    normalized_name TEXT NOT NULL,
    is_mine BOOLEAN NOT NULL,
    rank_position INTEGER,
    context_snippet TEXT,
    sentiment TEXT,              -- NEW: positive/neutral/negative
    mention_context TEXT,        -- NEW: primary_recommendation, alternative_listing, etc.
    timestamp_utc TEXT NOT NULL,
    UNIQUE(run_id, intent_id, model_provider, model_name, normalized_name)
);
```

**New Columns (v0.1.0+)**:

- `sentiment`: Emotional tone - `positive`, `neutral`, `negative`, or `NULL`
- `mention_context`: How brand was mentioned - `primary_recommendation`, `alternative_listing`, `competitor_negative`, `competitor_neutral`, `passing_reference`, or `NULL`

### `intent_classifications`

```sql
CREATE TABLE intent_classifications (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    query_text TEXT NOT NULL,
    query_hash TEXT NOT NULL,        -- SHA256 hash for caching
    intent_type TEXT NOT NULL,       -- transactional, informational, navigational, commercial_investigation
    buyer_stage TEXT NOT NULL,       -- awareness, consideration, decision
    urgency_signal TEXT NOT NULL,    -- high, medium, low
    classification_confidence REAL NOT NULL,  -- 0.0-1.0
    reasoning TEXT,                  -- Explanation of classification
    timestamp_utc TEXT NOT NULL,
    UNIQUE(run_id, intent_id)
);
```

**Purpose**: Stores query intent classifications for prioritizing high-value mentions.

**Query Hash**: Normalized SHA256 hash enables caching - same query text always produces same hash, avoiding redundant LLM calls.

## Indexes

```sql
-- Original indexes
CREATE INDEX idx_mentions_timestamp ON mentions(timestamp_utc);
CREATE INDEX idx_mentions_brand ON mentions(normalized_name);
CREATE INDEX idx_mentions_intent ON mentions(intent_id);

-- Sentiment/Intent indexes (NEW in v0.1.0+)
CREATE INDEX idx_mentions_sentiment ON mentions(sentiment);
CREATE INDEX idx_mentions_context ON mentions(mention_context);
CREATE INDEX idx_intent_type ON intent_classifications(intent_type);
CREATE INDEX idx_buyer_stage ON intent_classifications(buyer_stage);
CREATE INDEX idx_urgency_signal ON intent_classifications(urgency_signal);
```

## Schema Versioning

The database schema uses versioning for migrations:

```sql
SELECT version FROM schema_version;
-- Returns: 1 (current version)
```

Future schema changes will increment this version and provide migration scripts.

## Example Queries

### Sentiment Analysis

```sql
-- Brand mentions by sentiment
SELECT sentiment, COUNT(*) as count
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY sentiment;
```

### High-Value Intent Filtering

```sql
-- High-intent brand mentions
SELECT m.brand, ic.intent_type, ic.buyer_stage, ic.urgency_signal
FROM mentions m
JOIN intent_classifications ic ON m.intent_id = ic.intent_id AND m.run_id = ic.run_id
WHERE ic.intent_type = 'transactional'
  AND ic.buyer_stage = 'decision'
  AND ic.urgency_signal = 'high'
  AND m.sentiment = 'positive';
```

See [SQLite Database](../../data-analytics/sqlite-database/) for more queries.

# Python API

Using LLM Answer Watcher as a Python library.

## Programmatic Usage

```python
from llm_answer_watcher.config.loader import load_config_from_file
from llm_answer_watcher.llm_runner.runner import run_all

# Load configuration
config = load_config_from_file("config.yaml")

# Run monitoring
result = run_all(config)

print(f"Run ID: {result['run_id']}")
print(f"Cost: ${result['total_cost_usd']:.4f}")
print(f"Brands: {result['brands_detected']}")
```

## Core Modules

### Config Loading

```python
from llm_answer_watcher.config.loader import load_config_from_file
from llm_answer_watcher.config.schema import RuntimeConfig

config: RuntimeConfig = load_config_from_file("config.yaml")
```

### LLM Clients

```python
from llm_answer_watcher.llm_runner.models import build_client

client = build_client(
    provider="openai",
    model_name="gpt-4o-mini",
    api_key=api_key,
    system_prompt=prompt
)

response = client.generate_answer("What are the best tools?")
```

### Extraction

```python
from llm_answer_watcher.extractor.mention_detector import detect_mentions

mentions = detect_mentions(
    text=llm_response,
    brands_mine=["YourBrand"],
    brands_competitors=["CompetitorA"]
)
```

See [Architecture](../../advanced/architecture/) for design details.
# Contributing

# Development Setup

Set up your development environment for contributing.

## Prerequisites

- Python 3.12 or 3.13
- Git
- uv or pip

## Clone and Install

```bash
# Clone repository
git clone https://github.com/nibzard/llm-answer-watcher.git
cd llm-answer-watcher

# Install with uv (recommended)
uv sync --dev

# Or with pip
pip install -e ".[dev]"
```

## Development Tools

### Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=llm_answer_watcher --cov-report=html

# Run specific test
pytest tests/test_config_loader.py::test_load_valid_config
```

### Linting

```bash
# Check code quality
ruff check .

# Auto-fix issues
ruff check . --fix

# Format code
ruff format .
```

### Documentation

```bash
# Build docs
mkdocs build

# Serve docs locally
mkdocs serve
```

## Making Changes

1. Create a branch: `git checkout -b feature/my-feature`
1. Make changes
1. Run tests: `pytest`
1. Run linting: `ruff check .`
1. Commit: `git commit -m "feat: add feature"`
1. Push: `git push origin feature/my-feature`
1. Create Pull Request

See [Code Standards](../code-standards/) for coding guidelines.

# Code Standards

Coding standards and best practices.

## Python Style

### Modern Type Hints (Python 3.12+)

```python
# ✅ Good - use | for unions
def process(data: dict | None = None) -> str | None:
    pass

# ❌ Bad - old style
from typing import Union, Optional
def process(data: Optional[dict] = None) -> Union[str, None]:
    pass
```

### Docstrings

```python
def detect_mentions(text: str, brands: list[str]) -> list[Mention]:
    """
    Detect brand mentions in text.

    Args:
        text: Text to search
        brands: List of brand names

    Returns:
        List of detected mentions
    """
```

### Word Boundaries

```python
# ✅ Good - word boundary matching
pattern = r'\b' + re.escape(brand) + r'\b'

# ❌ Bad - substring matching
if brand.lower() in text.lower():
    ...
```

## Testing

### Coverage Requirements

- Core modules: 80%+ coverage
- Critical paths: 100% coverage

### Test Structure

```python
def test_feature():
    # Arrange
    config = create_test_config()

    # Act
    result = run_feature(config)

    # Assert
    assert result.status == "success"
```

## Commits

Use Conventional Commits:

```text
feat: add new provider
fix: correct rank extraction
docs: update README
test: add coverage for extractor
chore: update dependencies
```

See [Testing](../testing/) for testing guidelines.

# Testing Guidelines

Writing and running tests.

## Test Structure

```text
tests/
├── test_config_loader.py
├── test_openai_client.py
├── test_mention_detector.py
├── test_rank_extractor.py
└── ...
```

## Writing Tests

### Unit Tests

```python
def test_brand_detection():
    text = "Use HubSpot for CRM"
    brands = ["HubSpot", "Salesforce"]

    mentions = detect_mentions(text, brands)

    assert len(mentions) == 1
    assert mentions[0].brand == "HubSpot"
```

### Mocking LLM APIs

```python
def test_openai_client(httpx_mock):
    httpx_mock.add_response(
        method="POST",
        url="https://api.openai.com/v1/chat/completions",
        json={"choices": [{"message": {"content": "..."}}]}
    )

    client = OpenAIClient(...)
    response = client.generate_answer("test")

    assert response.provider == "openai"
```

### Time Mocking

```python
from freezegun import freeze_time

@freeze_time("2025-11-01 08:00:00")
def test_timestamp():
    run_id = run_id_from_timestamp()
    assert run_id == "2025-11-01T08-00-00Z"
```

## Running Tests

```bash
# All tests
pytest

# With coverage
pytest --cov=llm_answer_watcher

# Specific test
pytest tests/test_config_loader.py

# Verbose
pytest -v

# Skip slow tests
pytest -m "not slow"
```

## Coverage Requirements

- **Core modules**: 80%+
- **Critical paths**: 100%

```bash
pytest --cov=llm_answer_watcher --cov-report=html
open htmlcov/index.html
```

See [Code Standards](../code-standards/) for style guidelines.

# Testing Utilities

LLM Answer Watcher provides specialized testing utilities to help you write reliable tests without making real API calls or dealing with brittle HTTP mocking.

## Overview

The testing utilities follow patterns inspired by modern LLM abstraction layers:

- **MockLLMClient**: Deterministic responses for testing extraction logic
- **ChaosLLMClient**: Resilience testing with controlled failure injection
- **Protocol-based**: Both implement the `LLMClient` protocol

## MockLLMClient

### Basic Usage

The `MockLLMClient` provides deterministic responses without making real API calls:

```python
from llm_answer_watcher.llm_runner.mock_client import MockLLMClient

# Create client with configured responses
client = MockLLMClient(
    responses={
        "What are the best CRM tools?": "HubSpot and Salesforce are leading CRM platforms.",
        "best email warmup": "Warmly, HubSpot, and Instantly are top choices."
    },
    default_response="No specific answer available.",
    tokens_per_response=300,
    cost_per_response=0.001
)

# Use in tests
response = await client.generate_answer("What are the best CRM tools?")
assert response.answer_text == "HubSpot and Salesforce are leading CRM platforms."
assert response.tokens_used == 300
assert response.cost_usd == 0.001
```

### Configuration Options

```python
MockLLMClient(
    responses={"prompt": "answer"},  # Dict mapping prompts to answers
    default_response="Default answer",  # Fallback when prompt not found
    model_name="mock-gpt-4",  # Model name in responses
    provider="mock-openai",  # Provider name in responses
    tokens_per_response=100,  # Token count to report
    cost_per_response=0.0,  # Cost to report
    streaming_chunk_size=None,  # Enable streaming (see below)
    streaming_delay_ms=50  # Delay between chunks
)
```

### Integration Testing

MockLLMClient works seamlessly with the extraction pipeline:

```python
from llm_answer_watcher.config.schema import Brands
from llm_answer_watcher.extractor.parser import parse_answer

# Create mock client
client = MockLLMClient(
    responses={"best CRM": "1. HubSpot\n2. Salesforce\n3. Warmly"}
)

# Generate answer
response = await client.generate_answer("best CRM")

# Test extraction
brands = Brands(mine=["Warmly"], competitors=["HubSpot", "Salesforce"])
extraction = parse_answer(response.answer_text, brands)

assert extraction.appeared_mine is True
assert len(extraction.my_mentions) == 1
assert len(extraction.competitor_mentions) == 2
```

### Streaming Support

MockLLMClient supports optional streaming for testing streaming workflows:

```python
chunks = []

client = MockLLMClient(
    responses={"test": "Hello world from LLM"},
    streaming_chunk_size=5,  # Stream in 5-char chunks
    streaming_delay_ms=10  # 10ms delay between chunks
)

response = await client.generate_answer(
    "test",
    on_chunk=lambda chunk: chunks.append(chunk)
)

# Chunks received during streaming
assert chunks == ['Hello', ' worl', 'd fro', 'm LLM']

# Full response still returned
assert response.answer_text == "Hello world from LLM"
```

## ChaosLLMClient

### Basic Usage

The `ChaosLLMClient` wraps any `LLMClient` and probabilistically injects failures:

```python
from llm_answer_watcher.llm_runner.chaos_client import ChaosLLMClient

# Wrap a base client (e.g., MockLLMClient)
base = MockLLMClient(responses={"test": "answer"})

chaos = ChaosLLMClient(
    base_client=base,
    success_rate=0.7,  # 70% success, 30% failure
    rate_limit_prob=0.1,  # 10% chance of 429 error
    server_error_prob=0.1,  # 10% chance of 5xx error
    timeout_prob=0.05,  # 5% chance of timeout
    auth_error_prob=0.05,  # 5% chance of 401 error
    seed=42  # Optional: reproducible chaos
)

# May succeed or fail
try:
    response = await chaos.generate_answer("test")
    print("Success!")
except RuntimeError as e:
    print(f"Chaos injected: {e}")
```

### Factory Function

Use `create_chaos_client()` for balanced error distribution:

```python
from llm_answer_watcher.llm_runner.chaos_client import create_chaos_client

chaos = create_chaos_client(
    base_client=base,
    failure_rate=0.3,  # 30% overall failures
    seed=42
)

# Failures distributed evenly:
# - 7.5% rate limit (429)
# - 7.5% server errors (500/502/503)
# - 7.5% timeout
# - 7.5% auth error (401)
```

### Testing Retry Logic

Validate your retry logic handles transient failures:

```python
# High failure rate to force retries
chaos = ChaosLLMClient(
    base_client=base,
    success_rate=0.3,  # 70% failure rate
    seed=42
)

# Retry loop
max_attempts = 3
for attempt in range(max_attempts):
    try:
        response = await chaos.generate_answer("test")
        break  # Success!
    except RuntimeError as e:
        if attempt == max_attempts - 1:
            raise  # Give up after max attempts
        # Otherwise retry
```

### Reproducible Chaos

Use `seed` for deterministic test runs:

```python
# Two clients with same seed produce identical behavior
chaos1 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123)
chaos2 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123)

# Same sequence of successes/failures
for i in range(10):
    result1 = await chaos1.generate_answer("test")
    result2 = await chaos2.generate_answer("test")
    # Both succeed or both fail identically
```

## Error Types Injected

ChaosLLMClient injects realistic errors:

| Error Type   | Status Code | Description        | Retryable? |
| ------------ | ----------- | ------------------ | ---------- |
| Rate Limit   | 429         | Too many requests  | Yes        |
| Server Error | 500/502/503 | Server-side issues | Yes        |
| Timeout      | -           | Network timeout    | Yes        |
| Auth Error   | 401         | Invalid API key    | No         |

## Best Practices

### 1. Use MockLLMClient for Logic Tests

Test extraction, parsing, and business logic:

```python
def test_brand_detection():
    client = MockLLMClient(
        responses={"test": "Warmly and HubSpot are great tools."}
    )
    # Test extraction logic
```

### 2. Use ChaosLLMClient for Resilience Tests

Test error handling and retry logic:

```python
def test_retry_on_rate_limit():
    chaos = ChaosLLMClient(
        base_client=base,
        rate_limit_prob=1.0  # Always 429
    )
    # Test retry behavior
```

### 3. Avoid HTTP Mocking

Instead of:

```python
# ❌ Brittle HTTP mocking
httpx_mock.add_response(
    url="https://api.openai.com/...",
    json={"choices": [{"message": {"content": "..."}}]}
)
```

Use:

```python
# ✅ Clean protocol-based mocking
client = MockLLMClient(responses={"prompt": "answer"})
```

### 4. Test Statistical Distribution

For chaos testing, validate statistical properties:

```python
successes = 0
failures = 0
trials = 1000

chaos = ChaosLLMClient(base_client=base, success_rate=0.7, seed=42)

for _ in range(trials):
    try:
        await chaos.generate_answer("test")
        successes += 1
    except RuntimeError:
        failures += 1

success_rate = successes / trials
assert 0.65 <= success_rate <= 0.75  # Allow 5% tolerance
```

## Migration from HTTP Mocking

### Before (pytest-httpx)

```python
def test_openai_client(httpx_mock):
    httpx_mock.add_response(
        method="POST",
        url="https://api.openai.com/v1/chat/completions",
        json={
            "choices": [{"message": {"content": "test answer"}}],
            "usage": {"total_tokens": 100}
        }
    )

    client = OpenAIClient(...)
    response = await client.generate_answer("test")
    assert response.answer_text == "test answer"
```

### After (MockLLMClient)

```python
def test_extraction_pipeline():
    client = MockLLMClient(responses={"test": "test answer"})

    response = await client.generate_answer("test")
    assert response.answer_text == "test answer"

    # Now test the entire pipeline
    extraction = parse_answer(response.answer_text, brands)
    # ... test extraction logic
```

## See Also

- [Development Setup](../development-setup/) - Setting up your dev environment
- [Testing Guide](../testing/) - Overall testing strategy
- [Code Standards](../code-standards/) - Code quality requirements

# Documentation Guidelines

Contributing to documentation.

## Documentation Structure

```text
docs/
├── index.md
├── getting-started/
├── user-guide/
├── providers/
├── examples/
├── data-analytics/
├── evaluation/
├── advanced/
├── reference/
└── contributing/
```

## Writing Guidelines

### Style

- Use clear, concise language
- Write in active voice
- Include code examples
- Add links to related pages

### Formatting

```markdown
# Page Title

Brief introduction paragraph.

## Section Heading

Content with examples:

\`\`\`python
# Code example
config = load_config("config.yaml")
\`\`\`

### Subsection

More detailed content.
```

### Material for MkDocs Features

```markdown
!!! tip "Pro Tip"
    Use this feature for better results.

!!! warning
    This operation costs money.

=== "Python"
    \`\`\`python
    import module
    \`\`\`

=== "Bash"
    \`\`\`bash
    command --flag
    \`\`\`
```

## Building Docs

```bash
# Install dependencies
uv sync --dev

# Build docs
mkdocs build

# Serve locally
mkdocs serve

# Open browser to http://localhost:8000
```

## Previewing Changes

```bash
mkdocs serve --watch docs/
```

See [Development Setup](../development-setup/) for environment setup.
# Special Optional

# LLM Answer Watcher

**Monitor how Large Language Models talk about your brand versus competitors in buyer-intent queries**

[Get Started](getting-started/quick-start/) [View on GitHub](https://github.com/nibzard/llm-answer-watcher)

______________________________________________________________________

## What is LLM Answer Watcher?

LLM Answer Watcher is a production-ready CLI tool that helps you understand how AI models like ChatGPT, Claude, and others represent your brand when users ask buyer-intent questions.

As AI-powered search becomes mainstream, monitoring your brand's presence in LLM responses is crucial for:

- **Brand Visibility**: Track if your product appears in AI recommendations
- **Competitive Intelligence**: See which competitors are mentioned alongside you
- **Market Positioning**: Understand your ranking compared to alternatives
- **Trend Analysis**: Historical data shows how your presence changes over time

## Demo

See LLM Answer Watcher in action:

**What you're seeing:**

- Configuration validation with brand and competitor definitions
- Real-time progress bars showing query execution across LLM providers
- Brand mention extraction and ranking from AI responses
- Cost tracking and results summary

**Try it yourself:** Run `llm-answer-watcher demo` for an interactive demo (no API keys needed!)

______________________________________________________________________

## Key Features

### 🔍 Brand Mention Detection

Advanced word-boundary regex matching prevents false positives while accurately identifying your brand and competitors in LLM responses.

### 📊 Historical Tracking

All responses are stored in a local SQLite database, enabling powerful trend analysis and long-term visibility tracking.

### 🤖 Multi-Provider Support

Works with **6+ LLM providers**: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, and Perplexity, with an extensible architecture for adding more.

### 🌐 Browser Runners (BETA - New in v0.2.0)

Interact with web-based LLM interfaces (ChatGPT, Perplexity) using headless browser automation via Steel API. Captures true user experience with screenshots and HTML snapshots.

### ⚡ Async Parallelization (New in v0.2.0)

3-4x faster performance with async/await parallel query execution across multiple models and providers.

### 📈 Intelligent Rank Extraction

Automatically detects where your brand appears in ranked lists using pattern-based extraction and optional LLM-assisted ranking.

### 🎭 Sentiment Analysis & Intent Classification

- **Sentiment Analysis**: Analyze the tone (positive/neutral/negative) and context of each brand mention
- **Intent Classification**: Determine user intent type, buyer journey stage, and urgency signals
- **Prioritization**: Focus on high-value queries with ready-to-buy intent
- **ROI Tracking**: Understand which mentions drive real business value

### 💰 Dynamic Pricing & Budget Protection

- Real-time pricing from [llm-prices.com](https://www.llm-prices.com)
- Pre-run cost estimation
- Configurable spending limits
- Accurate web search cost calculation

### 🎯 Dual-Mode CLI

- **Human Mode**: Beautiful Rich output with spinners, colors, and formatted tables
- **Agent Mode**: Structured JSON output for AI agent automation
- **Quiet Mode**: Minimal tab-separated output for scripts

### 📋 Professional HTML Reports

Auto-generated reports with:

- Brand mention visualizations
- Rank distribution charts
- Historical trends
- Raw response inspection

### 🔒 Local-First & Secure

- All data stored locally on your machine
- BYOK (Bring Your Own Keys) - use your own API keys
- No external dependencies except LLM APIs
- Built-in SQL injection and XSS protection

## Quick Example

```bash
# Set your API keys
export OPENAI_API_KEY=sk-your-key-here
export ANTHROPIC_API_KEY=sk-ant-your-key-here

# Run with a config file
llm-answer-watcher run --config watcher.config.yaml
```

**Output:**

```text
🔍 Running LLM Answer Watcher...
├── Query: "What are the best email warmup tools?"
├── Models: OpenAI gpt-4o-mini, Anthropic claude-3-5-haiku
├── Brands: 2 monitored, 5 competitors
└── Output: ./output/2025-11-01T14-30-00Z/

✅ Queries completed: 6/6
💰 Total cost: $0.0142
📊 Report: ./output/2025-11-01T14-30-00Z/report.html
```

## Use Cases

### 1. Brand Monitoring

Track your product's visibility in AI-powered search results across multiple LLM providers.

### 2. Competitive Analysis

See which competitors appear most frequently and in what context they're recommended.

### 3. SEO for AI Era

Optimize your brand presence in LLM training data and real-time retrieval systems.

### 4. Market Research

Understand how AI models categorize and compare products in your space.

### 5. Product Development

Identify gaps where competitors are mentioned but your product isn't.

### 6. Sales Intelligence

Know what alternatives prospects might be comparing you against.

## Architecture Highlights

LLM Answer Watcher is built with production-ready patterns:

- **Domain-Driven Design**: Clear separation between config, LLM clients, extraction, storage, and reporting
- **Provider Abstraction**: Easy to add new LLM providers with unified interface
- **Plugin System**: Extensible runner architecture supporting both API and browser-based runners
- **Async/Await**: Parallel query execution for 3-4x performance improvement (v0.2.0+)
- **Retry Logic**: Exponential backoff with tenacity for resilient API calls
- **Type Safety**: Full Pydantic validation and modern Python 3.12+ type hints
- **Testability**: 750+ test cases with 100% coverage on critical paths
- **API-First Contract**: Internal structure designed to become HTTP API for Cloud product

## Documentation Structure

This documentation is organized progressively from beginner to advanced:

### [Getting Started](getting-started/quick-start/)

Everything you need to get up and running in 5 minutes.

### [User Guide](user-guide/configuration/overview/)

Comprehensive guides for configuration, usage, and features.

### [Supported Providers](providers/overview/)

Detailed information about each LLM provider integration.

### [Examples](examples/basic-monitoring/)

Real-world examples and use cases with complete configurations.

### [Data & Analytics](data-analytics/output-structure/)

Understanding output structure and running SQL analytics.

### [Evaluation Framework](evaluation/overview/)

Quality control and accuracy testing for extraction logic.

### [Advanced Topics](advanced/architecture/)

Deep dives into architecture, security, and extending the system.

### [Reference](reference/cli-reference/)

Complete CLI command reference and configuration schemas.

## For LLMs & AI Agents

This documentation is available in LLM-optimized formats following the [llmstxt.org](https://llmstxt.org) standard:

- **[llms.txt](https://nibzard.github.io/llm-answer-watcher/llms.txt)** - Concise navigation index (~800 tokens)
- **[llms-full.txt](https://nibzard.github.io/llm-answer-watcher/llms-full.txt)** - Complete documentation (~59K tokens)

These files are auto-generated on every documentation build and provide structured, markdown-formatted content optimized for LLM context injection.

## Philosophy

LLM Answer Watcher is built on these principles:

- **Boring is Good**: Simple, readable code over clever abstractions
- **Local-First**: Your data stays on your machine
- **Production-Ready**: Proper error handling, retry logic, and security from day one
- **Data is the Moat**: Historical SQLite tracking provides long-term value
- **Developer Experience**: Both human-friendly and AI agent-ready interfaces

## Next Steps

- **Quick Start**

  ______________________________________________________________________

  Install and run your first monitoring job in minutes.

  [Get Started →](getting-started/quick-start/)

- **Configuration**

  ______________________________________________________________________

  Learn how to configure models, brands, and intents.

  [Configuration Guide →](user-guide/configuration/overview/)

- **Examples**

  ______________________________________________________________________

  See real-world examples for common use cases.

  [View Examples →](examples/basic-monitoring/)

- **Analytics**

  ______________________________________________________________________

  Query your data with SQL for powerful insights.

  [Data & Analytics →](data-analytics/sqlite-database/)

## Community & Support

- **GitHub Issues**: [Report bugs or request features](https://github.com/nibzard/llm-answer-watcher/issues)
- **Contributing**: [Read our contributing guide](contributing/development-setup/)
- **License**: MIT - see [LICENSE](https://github.com/nibzard/llm-answer-watcher/blob/main/LICENSE)

______________________________________________________________________

Built with ❤️ by [Nikola Balić](https://github.com/nibzard)

# Frequently Asked Questions

## General

### What is LLM Answer Watcher?

LLM Answer Watcher is a CLI tool that monitors how large language models (like ChatGPT, Claude) talk about your brand versus competitors when answering buyer-intent queries.

### Why should I use this?

As AI-powered search becomes mainstream (ChatGPT, Perplexity, Google AI Overview), understanding your brand's presence in LLM responses is crucial for:

- Brand visibility tracking
- Competitive intelligence
- SEO for the AI era
- Market positioning

### Is it free?

The tool is **open source** (MIT license) and free to use. However, you pay for:

- LLM API calls (typically (0.001-)0.01 per query)
- Your own compute resources

### How much does it cost to run?

**Example costs per run**:

- 3 intents × 1 model (gpt-4o-mini): ~$0.006
- 5 intents × 2 models: ~$0.020
- 10 intents × 5 models: ~$0.150

See [Cost Management](../user-guide/features/cost-management/) for details.

## Installation & Setup

### What Python version do I need?

**Python 3.12 or 3.13** is required. The tool uses modern Python features.

### Can I use pip instead of uv?

Yes! Both work:

```bash
# With uv (recommended - faster)
uv sync

# With pip (traditional)
pip install -e .
```

### Which LLM providers are supported?

- OpenAI (GPT models)
- Anthropic (Claude models)
- Mistral AI
- X.AI (Grok models)
- Google (Gemini models)
- Perplexity

See [Providers](../providers/overview/) for complete list.

### Do I need API keys for all providers?

No! You only need API keys for providers you want to use. Start with just OpenAI if you want.

## Configuration

### How do I create a configuration file?

See [Basic Configuration](../getting-started/basic-configuration/). Minimum config:

```yaml
run_settings:
  output_dir: "./output"
  models:
    - provider: "openai"
      model_name: "gpt-4o-mini"
      env_api_key: "OPENAI_API_KEY"

brands:
  mine: ["YourBrand"]
  competitors: ["CompetitorA"]

intents:
  - id: "best-tools"
    prompt: "What are the best tools?"
```

### How many brands should I track?

**Your brands**: Include all variations (e.g., "HubSpot", "HubSpot CRM", "hubspot.com")

**Competitors**: Start with top 5-10 direct competitors. You can always add more.

### What makes a good intent prompt?

Good prompts are:

- **Natural**: How real users ask
- **Buyer-intent**: Imply evaluation/purchase
- **Specific**: Target a use case

Examples:

- ✅ "What are the best email warmup tools for startups?"
- ❌ "Tell me about email"

### Can I use the same config for multiple runs?

Yes! Configs are reusable. All data is timestamped and stored separately.

## Usage

### Why aren't my brands being detected?

Common causes:

1. **Name mismatch**: LLM used "HubSpot CRM" but you only configured "HubSpot"
1. **Solution**: Add all brand variations
1. **Brand not mentioned**: LLM didn't include your brand
1. **Solution**: This is valuable data! Your brand isn't top-of-mind for that query
1. **Word boundary issue**: "Hub" won't match in "GitHub"
1. **Solution**: This is intentional to prevent false positives

### How do I track historical trends?

All data is stored in SQLite at `./output/watcher.db`:

```sql
SELECT DATE(timestamp_utc), AVG(rank_position)
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY DATE(timestamp_utc);
```

See [Data Analytics](../data-analytics/sqlite-database/).

### Can I run this in CI/CD?

Yes! Use `--yes --format json` for automation:

```bash
llm-answer-watcher run --config config.yaml --yes --format json
```

See [Automation Guide](../user-guide/usage/automation/).

### What are the exit codes?

- `0`: Success
- `1`: Configuration error
- `2`: Database error
- `3`: Partial failure (acceptable)
- `4`: Complete failure

See [Exit Codes](../user-guide/usage/exit-codes/).

## Features

### What's the difference between regex and LLM extraction?

**Regex extraction** (default):

- Fast and cheap
- Pattern-based matching
- 90%+ accuracy

**LLM extraction** (`use_llm_rank_extraction: true`):

- More accurate for complex cases
- Costs extra (additional LLM calls)
- 95%+ accuracy

Start with regex. Only use LLM if needed.

### What is function calling?

Function calling uses LLM's built-in structured output feature for extraction. More accurate than regex.

Enable it:

```yaml
extraction_settings:
  method: "function_calling"
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
```

See [Function Calling](../user-guide/features/function-calling/).

### How do budget controls work?

Set spending limits:

```yaml
budget:
  enabled: true
  max_per_run_usd: 1.00
  max_per_intent_usd: 0.10
```

Tool validates **before running** and aborts if estimated cost exceeds limits.

See [Budget Controls](../user-guide/configuration/budget/).

### Can I use web search?

Yes, but it increases costs significantly ((10-)25 per 1,000 calls):

```yaml
web_search:
  enabled: true
  max_results: 10
```

See [Web Search](../user-guide/configuration/web-search/).

## Data & Privacy

### Where is my data stored?

**Locally on your machine**:

- SQLite database: `./output/watcher.db`
- JSON files: `./output/YYYY-MM-DDTHH-MM-SSZ/`
- HTML reports: `./output/YYYY-MM-DDTHH-MM-SSZ/report.html`

**No data leaves your machine** except LLM API calls.

### Is my data sent anywhere?

Only to configured LLM providers (OpenAI, Anthropic, etc.) for query processing. We don't collect any data.

### Are API keys secure?

API keys are:

- Loaded from environment variables
- Never logged or written to disk
- Never sent anywhere except the respective LLM provider

See [Security](../advanced/security/).

### Can I delete old data?

Yes! Simply delete directories or database records:

```bash
# Delete runs older than 90 days
find output/ -name "2024-*" -type d -mtime +90 -exec rm -rf {} +
```

## Troubleshooting

### "Configuration error: API key not found"

**Solution**:

```bash
# Check if key is set
echo $OPENAI_API_KEY

# If empty, export it
export OPENAI_API_KEY=sk-your-key-here
```

### "Rate limit exceeded"

**Solution**: LLM provider rate limit hit. Options:

1. Wait and retry
1. Reduce number of queries
1. Use slower model tiers
1. Upgrade API plan

### "No brands detected"

**Causes**:

1. Brand not mentioned by LLM
1. Brand name mismatch (add aliases)
1. Case sensitivity (should work - file a bug)

### "Database locked"

**Solution**: Another process is using the database:

```bash
# Find process
lsof output/watcher.db

# Kill if needed
kill -9 <PID>
```

### Build/Import Errors

**Solution**:

```bash
# Reinstall
pip install -e .

# Check Python version
python --version  # Should be 3.12+
```

## Advanced

### Can I extend it with new providers?

Yes! See [Extending Providers](../advanced/extending-providers/).

### Can I customize system prompts?

Yes! See [Custom System Prompts](../advanced/custom-system-prompts/).

### Is there a Python API?

Yes! See [Python API Reference](../reference/python-api/).

### Can I contribute?

Absolutely! See [Contributing Guide](../contributing/development-setup/).

## Still Have Questions?

- **GitHub Issues**: [Report bugs or ask questions](https://github.com/nibzard/llm-answer-watcher/issues)
- **Documentation**: Browse this site
- **Examples**: Check `examples/` directory in the repository

# Changelog

All notable changes to LLM Answer Watcher will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Planned

- Additional browser runners (Claude, Gemini web UIs)
- Enhanced cost tracking for browser runners
- DeepEval integration for quality metrics
- Trends command for historical analysis

## [0.2.0] - 2025-11-08

### Added - Major Features

- **🌐 Browser Runners (BETA)**: Steel API integration for web-based LLM interfaces
- ChatGPT web UI runner with session management
- Perplexity web UI runner with citation extraction
- Screenshot capture and HTML snapshot support
- Session reuse for cost optimization
- Plugin system for extensible browser automation
- See [Browser Runners Guide](../BROWSER_RUNNERS/) for details
- **⚡ Async/Await Parallelization**: 3-4x performance improvement
- Parallel query execution across models
- Async progress callbacks
- RuntimeWarning fixes for async operations
- **🔍 Google Search Grounding**: Enhanced Gemini model support
- Google Search grounding for Gemini models
- Accurate web search cost calculation
- Grounded responses with citations
- **🎯 Post-Intent Operations**: Dynamic workflow support
- Configurable operations to run after each intent
- Operation models with validation
- Config filename tracking in reports
- Model capability detection
- **📊 Advanced Analysis Features**:
- **Sentiment Analysis**: Analyze tone (positive/neutral/negative) and context of each brand mention
- **Intent Classification**: Classify user queries by intent type, buyer journey stage, and urgency signals
  - Intent types: transactional, informational, navigational, commercial_investigation
  - Buyer stages: awareness, consideration, decision
  - Urgency signals: high, medium, low
  - Confidence scoring and reasoning explanations
- Brand visibility score in reports
- HTML report filtering and web search badges
- **📚 Documentation Expansion**:
- Comprehensive MkDocs documentation with Material theme (60+ pages)
- Browser runners guide with Steel integration
- Google Search grounding documentation
- 44 example configurations across 8 directories

### Added - Database & Storage

- New database tables and columns for sentiment and intent data
- `mentions` table: `sentiment` and `mention_context` columns
- `intent_classifications` table with query hash caching
- 5 new indexes for filtering by sentiment, context, intent type, buyer stage, and urgency
- SQLite schema version 5 (migration support included)

### Added - Configuration

- Configuration options: `enable_sentiment_analysis` and `enable_intent_classification` (both default true)
- Runner plugin configuration system
- Browser runner specific settings (Steel API, screenshots, sessions)

### Changed

- **Breaking**: Configuration format updated to support runner plugins
- Improved test coverage to 100% for core modules
- Enhanced error messages for better debugging
- Function calling extraction schema expanded with sentiment/context fields
- Correct Responses API format with required type field
- Improved validation, error handling, and config validation

### Fixed

- Database schema mismatches and exception handling in CLI
- Rank display in HTML reports (shows actual positions not match positions)
- GPT-4.1 model support in OpenAI client
- Code review findings (validation, error handling, config)
- RuntimeWarnings for async operations
- Indentation in runner loop to process all models

### Cost Impact

- Intent classification: ~$0.00012 per query (one-time per unique query, cached)
- Sentiment extraction: ~33% increase per extraction call (integrated into function calling)
- Browser runners: $0.10-0.30/hour via Steel (not yet tracked in cost estimates)

### Known Limitations (v0.2.0)

- Browser runner cost tracking returns $0.00 (placeholder - actual Steel costs not calculated)
- Browser runners are BETA quality (added Nov 8, 2025)
- CSS selectors for browser runners may break if web UIs change
- No authentication handling documented for ChatGPT login
- Response completion detection is heuristic-based

## [0.1.0] - 2025-11-05

### Added

- Initial release of LLM Answer Watcher
- Multi-provider support: OpenAI, Anthropic, Mistral, X.AI Grok, Google Gemini, Perplexity
- Brand mention detection with word-boundary matching
- Rank extraction (pattern-based and LLM-assisted)
- SQLite database for historical tracking
- HTML report generation with Jinja2
- Dual-mode CLI (human-friendly Rich output, structured JSON for automation)
- Budget controls and cost estimation
- Dynamic pricing from llm-prices.com with 24-hour caching
- Web search cost calculation for OpenAI models
- Retry logic with exponential backoff
- Evaluation framework for extraction accuracy
- Configuration validation with Pydantic
- Exit codes for automation (0-4)
- Example configurations
- Comprehensive test suite (750+ tests)
- GitHub Actions CI/CD pipeline

### Core Modules

- `config/`: YAML loading and Pydantic validation
- `llm_runner/`: Multi-provider LLM client abstraction
- `extractor/`: Brand mention detection and rank extraction
- `storage/`: SQLite schema and JSON writers
- `report/`: HTML report generation
- `utils/`: Time utilities, logging, cost estimation, Rich console
- `evals/`: Evaluation framework

### Supported Models

- **OpenAI**: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
- **Anthropic**: claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus
- **Mistral**: mistral-large-latest, mistral-small-latest
- **X.AI**: grok-beta, grok-2-1212, grok-2-latest, grok-3
- **Google**: gemini-2.0-flash-exp, gemini-1.5-pro, gemini-1.5-flash
- **Perplexity**: sonar, sonar-pro, sonar-reasoning

### Documentation

- README with quick start and examples
- CLAUDE.md with development guidelines
- CONTRIBUTING.md with contribution guidelines
- SPECS.md with complete engineering specification
- TODO.md with milestone tracking

### Security

- Environment variable-based API key management
- SQL injection prevention (parameterized queries)
- XSS prevention (Jinja2 autoescaping)
- No API key logging

## Release Notes

### Version 0.1.0 - Production Ready

This is the first production-ready release of LLM Answer Watcher. The tool is feature-complete for core brand monitoring use cases:

**Highlights**:

- ✅ 8,200 lines of production Python code
- ✅ 17,400 lines of test code (750+ tests)
- ✅ 100% coverage on critical paths
- ✅ 6 LLM providers supported
- ✅ Complete evaluation framework
- ✅ Full documentation

**What's Working**:

- All core features tested and validated
- Multi-provider queries with retry logic
- Accurate brand mention detection (90%+ precision)
- Historical tracking in SQLite
- Professional HTML reports
- Budget protection
- CI/CD integration

**Known Limitations** (v0.1.0 - resolved in v0.2.0):

- ~~No async support (intentionally - keeping it simple)~~ - **ADDED in v0.2.0**
- ~~Web search only for OpenAI models~~ - **Google Search grounding added in v0.2.0**
- Perplexity request fees not yet in cost estimates
- Trends command not yet implemented (data collection works)

**Upgrade Notes**:

- This is the initial release - no upgrades needed
- SQLite schema version 1
- Configuration format stable

## Future Roadmap

### Planned Features

**v0.2.0** - ✅ **RELEASED 2025-11-08**:

- ✅ Async support for parallel queries (3-4x faster)
- ✅ Enhanced web search support (Google Search grounding)
- ✅ Browser runners (BETA)
- ⏳ `trends` command for historical analysis (moved to v0.3.0)
- ⏳ Dashboard UI for visualizing trends (moved to v0.3.0)
- ⏳ DeepEval integration for quality metrics (moved to v0.3.0)

**v0.3.0** (Q1 2025):

- `trends` command for historical analysis
- Dashboard UI for visualizing trends
- DeepEval integration for quality metrics
- Production-ready browser runners (cost tracking, authentication)
- Additional browser runners (Claude, Gemini web UIs)
- Cloud deployment option
- HTTP API (expose internal contract)
- Real-time alerts and webhooks
- Advanced analytics and insights
- Multi-user support

**v1.0.0** (Q3 2025):

- Enterprise features
- Advanced provider integrations
- Custom model support
- White-label options
- SaaS offering

## Contributing

We welcome contributions! See [CONTRIBUTING.md](../contributing/development-setup/) for guidelines.

## Links

- **Repository**: [github.com/nibzard/llm-answer-watcher](https://github.com/nibzard/llm-answer-watcher)
- **Issues**: [github.com/nibzard/llm-answer-watcher/issues](https://github.com/nibzard/llm-answer-watcher/issues)
- **Documentation**: This site
- **License**: MIT