Performance¶

Optimizing LLM Answer Watcher for speed and efficiency.

Query Performance¶

Currently synchronous. Async support planned:

# Future: parallel execution
await asyncio.gather(*[
    query_model(intent, model)
    for intent in intents
    for model in models
])

# Current: one at a time
for intent in intents:
    for model in models:
        query_model(intent, model)

models:
  - provider: "openai"
    model_name: "gpt-4o-mini"  # $0.15/1M vs $2.50/1M

# Fast and cheap (recommended)
use_llm_rank_extraction: false

# Accurate but costly
use_llm_rank_extraction: true

SQLite indexes on: - timestamp_utc - intent_id - brand - rank_position

Periodically compact database:

sqlite3 output/watcher.db "VACUUM;"

LLM prices cached for 24 hours to reduce API calls.

Planned: - Response caching (identical queries) - Extracted data caching

See Architecture for design details.