Troubleshooting OSpec Projects
Common issues, debugging techniques, and solutions for OSpec projects
IntermediateTroubleshooting OSpec Projects
This guide helps you diagnose and resolve common issues when working with OSpec projects.
Quick Diagnostics Checklist
Before diving into specific issues, run through this checklist:
- OSpec document validates against schema
- All required secrets are configured
- Agent roles have appropriate permissions
- Network connectivity to external services
- Sufficient compute/storage resources available
- Dependencies and versions compatible
Common Issues and Solutions
1. OSpec Validation Failures
Issue: Schema Validation Errors
# ❌ Invalid OSpec - missing required fields
ospec_version: "1.0.0"
name: "My Project" # Missing 'id' and 'description'
outcome_type: "web-app"
Solution: Ensure all required fields are present:
# ✅ Valid OSpec
ospec_version: "1.0.0"
id: "my-project" # Added required id
name: "My Project"
description: "A sample project" # Added required description
outcome_type: "web-app"
acceptance: {} # Added required acceptance section
stack: {} # Added required stack section
Issue: Invalid Outcome Type
Error: Unknown outcome_type 'web-service'.
Valid types: web-app, api, cli, game, library, ml-pipeline, mobile-app, desktop-app, infrastructure, documentation
Solution: Use a supported outcome type or define a custom one:
# Option 1: Use supported type
outcome_type: "api" # Instead of 'web-service'
# Option 2: Define custom type (requires plugin)
outcome_type: "web-service"
plugins:
- "@company/web-service-plugin"
Issue: Malformed YAML Syntax
Error: Invalid YAML syntax at line 15:
Expected ':' after key but found '|'
Solution: Check YAML syntax:
# ❌ Invalid YAML
description: |
This is a multiline
description # Missing quotes for special characters
# ✅ Valid YAML
description: |
This is a multiline
description with proper indentation
2. Agent Assignment and Execution Issues
Issue: No Available Agents
Error: No agents available with capabilities: [python, fastapi, postgresql]
Solution: Check agent availability and requirements:
# List available agents
ospec agents list
# Check specific agent capabilities
ospec agents describe agent-123
# Reduce capability requirements if too restrictive
# Make requirements more flexible
requirements:
agents:
implementer:
capabilities: ["python"] # Remove specific framework requirements
preferred: ["fastapi", "postgresql"] # Mark as preferred instead
Issue: Agent Timeout
Error: Agent agent-123 timed out after 30 minutes
Last status: Implementing authentication system
Solution: Increase timeout or simplify scope:
# Increase timeout for complex tasks
timeouts:
implementation_minutes: 60 # Default is 30
# Or break down complex tasks
tasks:
- id: "setup-auth-structure"
estimated_hours: 1
- id: "implement-user-model"
estimated_hours: 2
- id: "add-login-endpoints"
estimated_hours: 2
3. Secret and Configuration Issues
Issue: Missing Secrets
Error: Required secret 'DATABASE_URL' not found in provider 'aws-secrets-manager'
Solution: Verify secret exists and permissions:
# Check if secret exists
aws secretsmanager describe-secret --secret-id DATABASE_URL
# Verify agent has access permissions
aws secretsmanager get-secret-value --secret-id DATABASE_URL
# Alternative: Use different secret provider
secrets:
provider: "environment" # Fallback to env vars
required:
- "DATABASE_URL"
- "API_KEY"
Issue: Secret Provider Configuration
Error: Unable to authenticate with secret provider 'vault'
Connection refused to vault.company.com:8200
Solution: Check provider configuration:
# Fix Vault configuration
secrets:
provider: "vault"
config:
url: "https://vault.company.com:8200" # Ensure HTTPS and correct port
auth_method: "aws" # or appropriate auth method
mount_path: "secret/ospec"
4. Dependency and Environment Issues
Issue: Conflicting Dependencies
Error: Cannot resolve dependency conflict:
react@17.0.0 required by component-library
react@18.0.0 required by web-app
Solution: Use dependency overrides or update constraints:
# Pin specific versions
stack:
frontend: "React@18"
components: "component-library@^2.0.0" # Version that supports React 18
# Or override dependencies
dependency_overrides:
react: "18.0.0"
Issue: Environment Setup Failure
Error: Failed to install Node.js 18.16.0
Platform linux/arm64 not supported
Solution: Check platform compatibility:
# Specify compatible platforms
deployment:
platforms: ["linux/amd64"] # Exclude unsupported platforms
# Or use different stack
stack:
runtime: "Node.js@18" # Let system choose compatible version
preferred_version: "18.16.0"
5. Network and Connectivity Issues
Issue: External API Timeouts
Error: Timeout connecting to api.external-service.com
Request timed out after 30 seconds
Solution: Adjust timeout settings and add retries:
# Configure network settings
network:
timeouts:
connect_seconds: 10
read_seconds: 60
retries:
max_attempts: 3
backoff_factor: 2
Issue: Firewall/Security Group Blocking
Error: Connection refused to database.internal.com:5432
Solution: Check network security rules:
# Test connectivity
telnet database.internal.com 5432
# For AWS, check security groups
aws ec2 describe-security-groups --group-ids sg-123456789
# Document network requirements
infrastructure:
network_access_required:
- destination: "database.internal.com"
port: 5432
protocol: "TCP"
- destination: "api.external.com"
port: 443
protocol: "HTTPS"
6. Resource and Performance Issues
Issue: Out of Memory
Error: Process killed due to memory limit exceeded
Agent used 2.1GB, limit was 2GB
Solution: Increase resource limits:
# Increase resource allocation
resources:
memory_gb: 4 # Increase from default 2GB
cpu_cores: 2
# Or optimize for memory usage
optimize_for: "memory" # vs "speed"
Issue: Slow Performance
Warning: Task 'run-tests' took 45 minutes (expected 10 minutes)
Solution: Optimize resource allocation and parallelization:
# Optimize performance settings
performance:
parallel_tasks: 4 # Run tasks in parallel when possible
cache_dependencies: true
optimize_builds: true
# Or adjust expectations
timeouts:
test_execution_minutes: 60
7. Testing and Validation Issues
Issue: Acceptance Criteria Failures
Test failed: HTTP endpoint /api/users returned 404, expected 200
Solution: Debug endpoint implementation:
# Check if service is running
curl -v http://localhost:8000/api/users
# Review service logs
ospec logs --service api-service --tail 100
# Verify routing configuration
# Add debugging information to acceptance criteria
acceptance:
http_endpoints:
- path: "/api/users"
method: "GET"
expected_status: 200
timeout_seconds: 30
debug: true # Enable detailed debugging
Issue: Test Environment Inconsistencies
Error: Tests pass locally but fail in CI environment
Missing environment variable NODE_ENV
Solution: Ensure environment parity:
# Define consistent test environment
testing:
environment:
NODE_ENV: "test"
DATABASE_URL: ""
# Validate environment setup
pre_test_checks:
- "environment_variables_present"
- "database_connectivity"
- "external_services_available"
8. Deployment Issues
Issue: Deployment Rollback
Error: Deployment failed health checks
Rolling back to previous version...
Solution: Improve health checks and deployment strategy:
# Better health check configuration
deployment:
health_checks:
path: "/health"
interval_seconds: 30
timeout_seconds: 10
retries: 3
initial_delay_seconds: 60 # Allow startup time
# Safer deployment strategy
strategy: "rolling"
rollback:
automatic: true
max_unavailable: "25%"
Issue: Container Build Failures
Error: Docker build failed
unable to resolve package dependencies
Solution: Fix Docker configuration:
# Use specific base image versions
FROM node:18.16.0-alpine
# Set proper working directory
WORKDIR /app
# Copy dependency files first for better caching
COPY package*.json ./
RUN npm ci --only=production
# Copy source code
COPY . .
# Use multi-stage builds for optimization
Debugging Techniques
1. Enable Verbose Logging
# Add debug configuration to OSpec
debug:
level: "verbose" # off, error, warn, info, verbose, debug
log_agent_actions: true
log_api_calls: true
save_intermediate_files: true
2. Use Interactive Debugging
# Start interactive debugging session
ospec debug --project-id proj_123 --agent-id agent_456
# Step through agent actions
> step
> inspect variables
> view logs
> continue
3. Examine Agent Context
# View agent's current context
ospec agents context --agent-id agent_123
# See what the agent "sees"
ospec agents vision --agent-id agent_123 --file src/app.py
4. Test Components Independently
# Test individual components
testing:
component_tests:
- component: "database"
test: "SELECT 1"
expected: "success"
- component: "api"
test: "GET /health"
expected: "200 OK"
- component: "auth"
test: "validate_token"
expected: "valid"
Recovery Procedures
1. Agent Recovery
# If agent becomes unresponsive
ospec agents restart --agent-id agent_123
# Or reassign to different agent
ospec projects reassign --project-id proj_123 --role implementer --agent-id agent_456
2. Project State Recovery
# Reset project to last known good state
ospec projects rollback --project-id proj_123 --checkpoint checkpoint_456
# Or restart from specific phase
ospec projects restart --project-id proj_123 --from-phase planning
3. Data Recovery
# Configure backup and recovery
disaster_recovery:
backup_frequency: "hourly"
retention_days: 30
recovery_procedures:
database:
- "stop_application"
- "restore_from_backup"
- "verify_data_integrity"
- "restart_application"
Prevention Strategies
1. Proactive Monitoring
# Set up monitoring and alerting
monitoring:
metrics:
- "agent_response_time"
- "deployment_success_rate"
- "test_failure_rate"
- "resource_utilization"
alerts:
- metric: "agent_response_time"
threshold: 300 # seconds
action: "notify_team"
- metric: "deployment_success_rate"
threshold: 0.95 # 95%
action: "escalate"
2. Regular Health Checks
# Schedule regular health checks
crontab -e
0 */6 * * * ospec health-check --all-projects
3. Documentation and Runbooks
# Document common procedures
documentation:
runbooks:
- name: "Agent Timeout Recovery"
location: "docs/runbooks/agent-timeout.md"
- name: "Deployment Rollback"
location: "docs/runbooks/deployment-rollback.md"
escalation:
- level: 1
contact: "dev-team-slack"
response_time_minutes: 30
- level: 2
contact: "engineering-manager"
response_time_minutes: 60
Getting Help
1. Community Resources
- Documentation: OSpec Docs
- GitHub Issues: Report bugs and request features
- Discord: Join community discussions
- Stack Overflow: Tag questions with
ospec
2. Diagnostic Information to Include
When seeking help, include:
# Generate diagnostic report
ospec diagnose --project-id proj_123 --output report.json
# Include:
# - OSpec version
# - Agent versions and capabilities
# - Error messages and stack traces
# - Relevant configuration sections
# - Steps to reproduce
3. Professional Support
For enterprise customers:
- Support Portal: Access to dedicated support team
- SLA Response Times: Guaranteed response times by severity
- Expert Consultation: Architecture and best practices guidance
- Training Programs: Team training and certification
Next Steps
Remember: Most issues can be prevented with careful specification design and thorough testing. When in doubt, start simple and add complexity gradually.