Performance¶
Optimizing LLM Answer Watcher for speed and efficiency.
Query Performance¶
Parallel Queries (Future)¶
Currently synchronous. Async support planned:
# Future: parallel execution
await asyncio.gather(*[
query_model(intent, model)
for intent in intents
for model in models
])
Current: Sequential¶
Cost Optimization¶
Use Cheaper Models¶
Regex vs LLM Extraction¶
# Fast and cheap (recommended)
use_llm_rank_extraction: false
# Accurate but costly
use_llm_rank_extraction: true
Database Performance¶
Indexes¶
SQLite indexes on:
- timestamp_utc
- intent_id
- brand
- rank_position
Vacuum¶
Periodically compact database:
Caching¶
Pricing Cache¶
LLM prices cached for 24 hours to reduce API calls.
Future Caching¶
Planned: - Response caching (identical queries) - Extracted data caching
See Architecture for design details.