Skip to content

Function Calling for Extraction

Function calling uses LLMs to extract structured data from responses with higher accuracy than regex-based extraction. This feature enables semantic understanding of brand mentions and rankings.

Overview

Function calling instructs the LLM to output structured JSON matching a specific schema, ensuring consistent, parseable extraction results.

When to Use

Use function calling when:

  • Regex extraction misses complex mentions
  • You need contextual understanding
  • Rankings are implicit (not in explicit lists)
  • Budget allows for additional API calls

Skip function calling when:

  • Regex works well for your use case
  • Optimizing for cost (regex is free)
  • Brand names are simple and unambiguous
  • Running frequent monitoring (hourly/daily)

Configuration

Basic Setup

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  method: "function_calling"
  fallback_to_regex: true
  min_confidence: 0.7

Advanced Configuration

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"  # Fast, cheap extraction model
    env_api_key: "OPENAI_API_KEY"
    system_prompt: "openai/extraction-default"

  # Extraction method
  method: "function_calling"  # Options: function_calling, regex, hybrid

  # Fall back to regex if function calling fails
  fallback_to_regex: true

  # Minimum confidence threshold (0.0-1.0)
  min_confidence: 0.7

  # Maximum extraction attempts
  max_retries: 2

Extraction Methods

Method 1: Function Calling Only

Use LLM for all extraction:

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "function_calling"
  fallback_to_regex: false  # Don't fall back

Cost: ~$0.001-0.003 per extraction

Method 2: Regex Only

Use pattern matching (no LLM):

run_settings:
  use_llm_rank_extraction: false

# No extraction_settings needed

Cost: Free

Try regex first, use LLM as fallback:

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"
    env_api_key: "OPENAI_API_KEY"

  method: "hybrid"
  fallback_to_regex: true

Cost: Variable (free for regex hits, paid for LLM fallback)

Function Schema

Competitor Detection Function

{
  "name": "extract_competitor_mentions",
  "description": "Extract mentions of competitor brands from LLM response",
  "parameters": {
    "type": "object",
    "properties": {
      "competitors": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "brand": {
              "type": "string",
              "description": "Exact brand name as mentioned"
            },
            "rank_position": {
              "type": "integer",
              "description": "Position in ranked list (1=first, null=not ranked)"
            },
            "confidence": {
              "type": "number",
              "description": "Confidence score 0.0-1.0"
            },
            "context": {
              "type": "string",
              "description": "Surrounding context of the mention"
            }
          },
          "required": ["brand", "confidence"]
        }
      }
    },
    "required": ["competitors"]
  }
}

Example LLM Response

Input (LLM answer):

The best email warmup tools are:
1. Instantly - Great for cold email
2. Warmly - Excellent personalization
3. Lemwarm - Simple and effective

Function Call Output:

{
  "competitors": [
    {
      "brand": "Instantly",
      "rank_position": 1,
      "confidence": 0.95,
      "context": "Great for cold email"
    },
    {
      "brand": "Warmly",
      "rank_position": 2,
      "confidence": 0.95,
      "context": "Excellent personalization"
    },
    {
      "brand": "Lemwarm",
      "rank_position": 3,
      "confidence": 0.90,
      "context": "Simple and effective"
    }
  ]
}

Confidence Scores

Confidence Threshold

Only accept extractions above confidence threshold:

extraction_settings:
  min_confidence: 0.7  # Reject extractions < 70% confidence

Confidence Levels

Range Quality Action
0.90-1.00 High Accept automatically
0.70-0.89 Medium Accept with review
0.50-0.69 Low Reject or flag for review
0.00-0.49 Very Low Reject

Interpreting Confidence

High confidence (0.9+):

  • Clear, unambiguous mention
  • Explicit ranking
  • Standard brand name

Medium confidence (0.7-0.9):

  • Slight ambiguity
  • Implicit ranking
  • Brand name variation

Low confidence (<0.7):

  • Ambiguous mention
  • Unclear ranking
  • Possible false positive

Cost Management

Extraction Costs

Function calling adds extra API calls:

Model Cost per 1K tokens Typical Extraction Cost
gpt-4o-mini $0.15 input / $0.60 output $0.001-0.002
gpt-4o $2.50 input / $10.00 output $0.010-0.020
claude-3-5-haiku $0.80 input / $4.00 output $0.003-0.005

Cost Optimization

1. Use cheap extraction models:

extraction_settings:
  extraction_model:
    provider: "openai"
    model_name: "gpt-4o-mini"  # Cheapest option

2. Use hybrid method:

extraction_settings:
  method: "hybrid"  # Free regex first, LLM fallback

3. Cache extraction results:

Extraction results are stored in SQLite and reused.

4. Limit extraction to important intents:

intents:
  - id: "high-priority"
    prompt: "..."
    use_extraction: true  # Enable for this intent

  - id: "low-priority"
    prompt: "..."
    use_extraction: false  # Skip for this intent

Advantages Over Regex

1. Semantic Understanding

Regex:

"I recommend HubSpot" → Detected
"HubSpot is not recommended" → Detected (false positive)

Function Calling:

"I recommend HubSpot" → Detected with positive context
"HubSpot is not recommended" → Not detected (understands negation)

2. Implicit Rankings

LLM Response:

"While Salesforce is the market leader, I prefer HubSpot for startups."

Regex: No ranking detected (no list structure)

Function Calling: Detects HubSpot as preferred (rank 1)

3. Context Extraction

Function calling extracts surrounding context:

{
  "brand": "HubSpot",
  "rank_position": 1,
  "context": "Great for startups with limited budget",
  "confidence": 0.92
}

4. Handles Variations

LLM mentions: "HS CRM", "HubSpot's CRM", "HubSpot platform"

Regex: Misses variations

Function Calling: Normalizes all to "HubSpot"

Debugging Function Calling

View Function Call Logs

# Enable verbose logging
export LOG_LEVEL=DEBUG

llm-answer-watcher run --config watcher.config.yaml --verbose

Check Extraction Results

# View parsed results
cat output/2025-11-05T14-30-00Z/intent_*_parsed_*.json | jq '.extraction_method'
# Output: "function_calling" or "regex"

Common Issues

Issue: Low confidence scores

Solution: Adjust threshold:

extraction_settings:
  min_confidence: 0.6  # Lower threshold

Issue: High costs

Solution: Switch to hybrid:

extraction_settings:
  method: "hybrid"  # Use regex when possible

Issue: Inconsistent results

Solution: Use specific system prompt:

extraction_settings:
  extraction_model:
    system_prompt: "openai/extraction-strict"  # More consistent

Best Practices

1. Start with Regex

Test regex extraction first:

run_settings:
  use_llm_rank_extraction: false

If accuracy is insufficient, enable function calling.

2. Use Hybrid Method

Best of both worlds:

extraction_settings:
  method: "hybrid"
  fallback_to_regex: true

3. Monitor Extraction Costs

SELECT
    DATE(timestamp_utc) as date,
    SUM(estimated_cost_usd) as total_cost,
    COUNT(*) as extractions
FROM answers_raw
WHERE extraction_method = 'function_calling'
  AND timestamp_utc >= datetime('now', '-30 days')
GROUP BY DATE(timestamp_utc);

4. Test with Eval Suite

llm-answer-watcher eval --fixtures evals/testcases/fixtures.yaml

5. Use Dedicated Extraction Model

Don't use expensive models for extraction:

# ❌ Bad - expensive
extraction_model:
  model_name: "gpt-4o"

# ✅ Good - cheap and fast
extraction_model:
  model_name: "gpt-4o-mini"

Next Steps