Skip to content

Testing Utilities

LLM Answer Watcher provides specialized testing utilities to help you write reliable tests without making real API calls or dealing with brittle HTTP mocking.

Overview

The testing utilities follow patterns inspired by modern LLM abstraction layers:

  • MockLLMClient: Deterministic responses for testing extraction logic
  • ChaosLLMClient: Resilience testing with controlled failure injection
  • Protocol-based: Both implement the LLMClient protocol

MockLLMClient

Basic Usage

The MockLLMClient provides deterministic responses without making real API calls:

from llm_answer_watcher.llm_runner.mock_client import MockLLMClient

# Create client with configured responses
client = MockLLMClient(
    responses={
        "What are the best CRM tools?": "HubSpot and Salesforce are leading CRM platforms.",
        "best email warmup": "Warmly, HubSpot, and Instantly are top choices."
    },
    default_response="No specific answer available.",
    tokens_per_response=300,
    cost_per_response=0.001
)

# Use in tests
response = await client.generate_answer("What are the best CRM tools?")
assert response.answer_text == "HubSpot and Salesforce are leading CRM platforms."
assert response.tokens_used == 300
assert response.cost_usd == 0.001

Configuration Options

MockLLMClient(
    responses={"prompt": "answer"},  # Dict mapping prompts to answers
    default_response="Default answer",  # Fallback when prompt not found
    model_name="mock-gpt-4",  # Model name in responses
    provider="mock-openai",  # Provider name in responses
    tokens_per_response=100,  # Token count to report
    cost_per_response=0.0,  # Cost to report
    streaming_chunk_size=None,  # Enable streaming (see below)
    streaming_delay_ms=50  # Delay between chunks
)

Integration Testing

MockLLMClient works seamlessly with the extraction pipeline:

from llm_answer_watcher.config.schema import Brands
from llm_answer_watcher.extractor.parser import parse_answer

# Create mock client
client = MockLLMClient(
    responses={"best CRM": "1. HubSpot\n2. Salesforce\n3. Warmly"}
)

# Generate answer
response = await client.generate_answer("best CRM")

# Test extraction
brands = Brands(mine=["Warmly"], competitors=["HubSpot", "Salesforce"])
extraction = parse_answer(response.answer_text, brands)

assert extraction.appeared_mine is True
assert len(extraction.my_mentions) == 1
assert len(extraction.competitor_mentions) == 2

Streaming Support

MockLLMClient supports optional streaming for testing streaming workflows:

chunks = []

client = MockLLMClient(
    responses={"test": "Hello world from LLM"},
    streaming_chunk_size=5,  # Stream in 5-char chunks
    streaming_delay_ms=10  # 10ms delay between chunks
)

response = await client.generate_answer(
    "test",
    on_chunk=lambda chunk: chunks.append(chunk)
)

# Chunks received during streaming
assert chunks == ['Hello', ' worl', 'd fro', 'm LLM']

# Full response still returned
assert response.answer_text == "Hello world from LLM"

ChaosLLMClient

Basic Usage

The ChaosLLMClient wraps any LLMClient and probabilistically injects failures:

from llm_answer_watcher.llm_runner.chaos_client import ChaosLLMClient

# Wrap a base client (e.g., MockLLMClient)
base = MockLLMClient(responses={"test": "answer"})

chaos = ChaosLLMClient(
    base_client=base,
    success_rate=0.7,  # 70% success, 30% failure
    rate_limit_prob=0.1,  # 10% chance of 429 error
    server_error_prob=0.1,  # 10% chance of 5xx error
    timeout_prob=0.05,  # 5% chance of timeout
    auth_error_prob=0.05,  # 5% chance of 401 error
    seed=42  # Optional: reproducible chaos
)

# May succeed or fail
try:
    response = await chaos.generate_answer("test")
    print("Success!")
except RuntimeError as e:
    print(f"Chaos injected: {e}")

Factory Function

Use create_chaos_client() for balanced error distribution:

from llm_answer_watcher.llm_runner.chaos_client import create_chaos_client

chaos = create_chaos_client(
    base_client=base,
    failure_rate=0.3,  # 30% overall failures
    seed=42
)

# Failures distributed evenly:
# - 7.5% rate limit (429)
# - 7.5% server errors (500/502/503)
# - 7.5% timeout
# - 7.5% auth error (401)

Testing Retry Logic

Validate your retry logic handles transient failures:

# High failure rate to force retries
chaos = ChaosLLMClient(
    base_client=base,
    success_rate=0.3,  # 70% failure rate
    seed=42
)

# Retry loop
max_attempts = 3
for attempt in range(max_attempts):
    try:
        response = await chaos.generate_answer("test")
        break  # Success!
    except RuntimeError as e:
        if attempt == max_attempts - 1:
            raise  # Give up after max attempts
        # Otherwise retry

Reproducible Chaos

Use seed for deterministic test runs:

# Two clients with same seed produce identical behavior
chaos1 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123)
chaos2 = ChaosLLMClient(base_client=base, success_rate=0.5, seed=123)

# Same sequence of successes/failures
for i in range(10):
    result1 = await chaos1.generate_answer("test")
    result2 = await chaos2.generate_answer("test")
    # Both succeed or both fail identically

Error Types Injected

ChaosLLMClient injects realistic errors:

Error Type Status Code Description Retryable?
Rate Limit 429 Too many requests Yes
Server Error 500/502/503 Server-side issues Yes
Timeout - Network timeout Yes
Auth Error 401 Invalid API key No

Best Practices

1. Use MockLLMClient for Logic Tests

Test extraction, parsing, and business logic:

def test_brand_detection():
    client = MockLLMClient(
        responses={"test": "Warmly and HubSpot are great tools."}
    )
    # Test extraction logic

2. Use ChaosLLMClient for Resilience Tests

Test error handling and retry logic:

def test_retry_on_rate_limit():
    chaos = ChaosLLMClient(
        base_client=base,
        rate_limit_prob=1.0  # Always 429
    )
    # Test retry behavior

3. Avoid HTTP Mocking

Instead of:

# ❌ Brittle HTTP mocking
httpx_mock.add_response(
    url="https://api.openai.com/...",
    json={"choices": [{"message": {"content": "..."}}]}
)

Use:

# ✅ Clean protocol-based mocking
client = MockLLMClient(responses={"prompt": "answer"})

4. Test Statistical Distribution

For chaos testing, validate statistical properties:

successes = 0
failures = 0
trials = 1000

chaos = ChaosLLMClient(base_client=base, success_rate=0.7, seed=42)

for _ in range(trials):
    try:
        await chaos.generate_answer("test")
        successes += 1
    except RuntimeError:
        failures += 1

success_rate = successes / trials
assert 0.65 <= success_rate <= 0.75  # Allow 5% tolerance

Migration from HTTP Mocking

Before (pytest-httpx)

def test_openai_client(httpx_mock):
    httpx_mock.add_response(
        method="POST",
        url="https://api.openai.com/v1/chat/completions",
        json={
            "choices": [{"message": {"content": "test answer"}}],
            "usage": {"total_tokens": 100}
        }
    )

    client = OpenAIClient(...)
    response = await client.generate_answer("test")
    assert response.answer_text == "test answer"

After (MockLLMClient)

def test_extraction_pipeline():
    client = MockLLMClient(responses={"test": "test answer"})

    response = await client.generate_answer("test")
    assert response.answer_text == "test answer"

    # Now test the entire pipeline
    extraction = parse_answer(response.answer_text, brands)
    # ... test extraction logic

See Also