Skip to content

Sentiment Analysis & Intent Classification

Advanced analysis features that extract sentiment, context, and intent from brand mentions and user queries using LLM function calling.

New in v0.1.0

These features were added to enhance brand mention analysis and enable prioritization of high-value queries.

Overview

LLM Answer Watcher includes two powerful analysis features:

  1. Sentiment Analysis: Analyzes the tone and context of each brand mention
  2. Intent Classification: Determines the user's intent and buyer journey stage for each query

Both features use OpenAI's function calling API for accurate, structured extraction.

Sentiment Analysis

What It Analyzes

For each brand mention, the system extracts:

Sentiment - Emotional tone: - positive: Brand recommended or praised - neutral: Brand mentioned without judgment - negative: Brand criticized or not recommended

Mention Context - How the brand was mentioned: - primary_recommendation: Brand is the top recommendation - alternative_listing: Brand listed as one of several options - competitor_negative: Brand mentioned as inferior to others - competitor_neutral: Brand compared without negative bias - passing_reference: Brief mention without detail

Example

Query: "What are the best email warmup tools?"

LLM Response: "The best tools are Lemwarm for automated warmup and Instantly for cold outreach. HubSpot is also an option but quite expensive."

Extracted Sentiments:

Brand Sentiment Context Reasoning
Lemwarm positive primary_recommendation Listed first with positive qualifier
Instantly positive primary_recommendation Listed alongside Lemwarm with use case
HubSpot neutral alternative_listing Mentioned as option with cost caveat

Configuration

Enable sentiment analysis in extraction_settings:

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "function_calling"

  # Enable sentiment analysis (default: true)
  enable_sentiment_analysis: true

Function Calling Required

Sentiment analysis only works with method: "function_calling". Regex extraction does not support sentiment analysis (fields will be None).

Cost Impact

Sentiment analysis is integrated into function calling extraction:

  • No extra LLM calls - sentiment extracted in same call as brand mentions
  • Cost increase: ~33% per extraction call due to larger response schema
  • Example: $0.0002 → $0.00027 per extraction with gpt-4o-mini

Database Storage

Sentiments are stored in the mentions table:

SELECT brand, sentiment, mention_context, timestamp_utc
FROM mentions
WHERE sentiment = 'positive'
  AND mention_context = 'primary_recommendation'
ORDER BY timestamp_utc DESC;

Schema:

ALTER TABLE mentions ADD COLUMN sentiment TEXT;
ALTER TABLE mentions ADD COLUMN mention_context TEXT;

Intent Classification

What It Classifies

For each user query, the system determines:

Intent Type - What the user wants: - transactional: Ready to buy/use a tool - commercial_investigation: Researching options before purchase - informational: Learning about a topic - navigational: Looking for a specific brand/site

Buyer Journey Stage - Where they are in the purchase process: - awareness: Learning about the category - consideration: Evaluating options - decision: Ready to choose/purchase

Urgency Signal - How urgent is the need: - high: Immediate need ("now", "urgent", "today") - medium: Near-term need ("soon", "this week") - low: Future or casual exploration

Classification Confidence - How confident the model is (0.0-1.0)

Reasoning - Explanation of why it was classified this way

Examples

High-Value Query

Query: "What are the best email warmup tools to buy now for my outreach campaign?"

Classification:

{
  "intent_type": "transactional",
  "buyer_stage": "decision",
  "urgency_signal": "high",
  "classification_confidence": 0.95,
  "reasoning": "Query contains 'buy now' and specific use case, indicating ready-to-purchase intent with high urgency"
}

Research Query

Query: "How do email warmup tools work?"

Classification:

{
  "intent_type": "informational",
  "buyer_stage": "awareness",
  "urgency_signal": "low",
  "classification_confidence": 0.92,
  "reasoning": "Query seeks explanation, indicating learning phase without purchase intent"
}

Comparison Query

Query: "Compare Lemwarm vs Instantly for cold email"

Classification:

{
  "intent_type": "commercial_investigation",
  "buyer_stage": "consideration",
  "urgency_signal": "medium",
  "classification_confidence": 0.88,
  "reasoning": "Direct comparison of specific brands indicates evaluation phase before purchase decision"
}

Configuration

Enable intent classification in extraction_settings:

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  # Enable intent classification (default: true)
  enable_intent_classification: true

Cost Impact

Intent classification adds one extra LLM call per unique query:

  • Cost: ~$0.00012 per query with gpt-4o-mini
  • When: Before extracting brand mentions
  • Caching: Results are cached by query hash, so repeated queries are free

Example cost breakdown: - 3 intents × 1 model = 3 queries - Intent classification: 3 × $0.00012 = $0.00036 - Extraction: 3 × $0.0002 = \(0.0006 - **Total**: ~\)0.001 per run

Database Storage

Intent classifications are stored in intent_classifications table:

SELECT intent_id, intent_type, buyer_stage, urgency_signal, reasoning
FROM intent_classifications
WHERE buyer_stage = 'decision'
  AND urgency_signal = 'high'
ORDER BY classification_confidence DESC;

Schema:

CREATE TABLE intent_classifications (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    run_id TEXT NOT NULL,
    intent_id TEXT NOT NULL,
    query_text TEXT NOT NULL,
    query_hash TEXT NOT NULL,
    intent_type TEXT NOT NULL,
    buyer_stage TEXT NOT NULL,
    urgency_signal TEXT NOT NULL,
    classification_confidence REAL NOT NULL,
    reasoning TEXT,
    timestamp_utc TEXT NOT NULL,
    UNIQUE(run_id, intent_id)
);

Query Hash Caching

Intent classifications are cached by query hash:

# Normalized query → hash
"What are the best email warmup tools?"
 "5d41402abc4b2a76b9719d911017c592..."

# Same hash for semantically identical queries
"  what are the BEST email warmup tools?  "
 "5d41402abc4b2a76b9719d911017c592..." (same hash)

Caching benefits: - Saves API calls: Repeated queries use cached results - Normalizes variations: Whitespace/case differences don't matter - Persistent cache: Stored in database across runs

Use Cases

1. Prioritize High-Value Queries

Focus on queries with high buyer intent:

SELECT m.brand, ic.intent_type, ic.buyer_stage, ic.urgency_signal
FROM mentions m
JOIN intent_classifications ic ON m.intent_id = ic.intent_id
WHERE ic.intent_type = 'transactional'
  AND ic.buyer_stage = 'decision'
  AND ic.urgency_signal = 'high'
  AND m.sentiment = 'positive';

Monitor how sentiment changes over time:

SELECT DATE(timestamp_utc) as date,
       sentiment,
       COUNT(*) as mentions
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY DATE(timestamp_utc), sentiment
ORDER BY date DESC;

3. Identify Context Patterns

See how your brand is typically mentioned:

SELECT mention_context,
       COUNT(*) as count,
       ROUND(AVG(CASE sentiment
           WHEN 'positive' THEN 1.0
           WHEN 'neutral' THEN 0.5
           WHEN 'negative' THEN 0.0
       END), 2) as sentiment_score
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY mention_context
ORDER BY count DESC;

4. ROI Analysis

Calculate value of brand mentions by intent:

SELECT ic.buyer_stage,
       COUNT(DISTINCT m.brand) as brands_mentioned,
       COUNT(*) as total_mentions
FROM mentions m
JOIN intent_classifications ic ON m.intent_id = ic.intent_id
WHERE m.is_mine = 1
GROUP BY ic.buyer_stage
ORDER BY CASE ic.buyer_stage
    WHEN 'decision' THEN 1
    WHEN 'consideration' THEN 2
    WHEN 'awareness' THEN 3
END;

Disabling Features

Disable Sentiment Analysis

extraction_settings:
  enable_sentiment_analysis: false

Result: sentiment and mention_context fields will be None in database.

Disable Intent Classification

extraction_settings:
  enable_intent_classification: false

Result: No rows in intent_classifications table, queries classified as None.

Disable Both

extraction_settings:
  enable_sentiment_analysis: false
  enable_intent_classification: false

Benefit: Reduces costs by ~33% for extraction calls and eliminates intent classification calls.

Limitations

Function Calling Only

Both features require method: "function_calling":

extraction_settings:
  method: "function_calling"  # Required
  enable_sentiment_analysis: true
  enable_intent_classification: true

Regex extraction does not support these features.

Provider Support

Currently only OpenAI supports function calling for extraction:

extraction_model:
  provider: "openai"  # Required
  model_name: "gpt-4o-mini"

Anthropic, Mistral, and other providers coming soon.

Confidence Thresholds

Low confidence classifications may be inaccurate:

-- Filter by confidence
SELECT *
FROM intent_classifications
WHERE classification_confidence >= 0.8;

Best Practices

1. Enable for High-Value Monitoring

Use sentiment/intent for business-critical queries:

# Production config - full analysis
extraction_settings:
  method: "function_calling"
  enable_sentiment_analysis: true
  enable_intent_classification: true

2. Disable for Cost Optimization

Skip for budget-constrained or high-frequency monitoring:

# Cost-optimized config
extraction_settings:
  method: "regex"  # No function calling
  enable_sentiment_analysis: false
  enable_intent_classification: false

3. Review Classification Reasoning

Check why queries were classified:

SELECT query_text, intent_type, buyer_stage, reasoning
FROM intent_classifications
WHERE classification_confidence < 0.8;

4. Track Sentiment Distribution

Monitor the health of your brand's mentions:

SELECT sentiment,
       COUNT(*) as mentions,
       ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) as percentage
FROM mentions
WHERE normalized_name = 'yourbrand'
GROUP BY sentiment;

Healthy distribution: 70%+ positive, <10% negative

Next Steps