Writing Effective OSpecs
Best practices for creating OSpecs that eliminate decision paralysis for AI coding agents
Intermediate ⏱️ 30 minutesWriting Effective OSpecs
A good OSpec eliminates the thousands of micro-decisions that slow down AI coding agents. This guide covers best practices for creating specifications that provide clear direction on what to build and which proven tech stacks to use.
Core Principles
1. Eliminate Decision Paralysis
The most important job of an OSpec is to eliminate the endless “what should I use?” questions that slow down AI coding agents:
Decision Paralysis Examples:
- “Postgres or SQLite or MongoDB?”
- “npm or yarn or pnpm?”
- “Express or FastAPI or Rails?”
- “React or Vue or Svelte?”
- “Tailwind or styled-components or CSS modules?”
OSpec Solution:
stack:
# Specific version
frontend: "Next.js@14"
# Proven combination that works well together
backend: "Supabase"
styling: "TailwindCSS"
# Agent preference - no decision paralysis
package_manager: "npm"
# Integrated workflow
deploy: "Vercel"
metadata:
# Agent-specific optimization
optimized_for: ["claude-code"]
rationale: "This stack minimizes setup complexity and maximizes development velocity for web apps"
2. Outcome-First Thinking
Start with the end in mind. Define what success looks like before diving into implementation details.
Good Example:
name: "Customer Support Chatbot"
description: "AI-powered chatbot that handles common customer inquiries with 90%+ accuracy and escalates complex issues to human agents"
acceptance:
user_satisfaction: 4.2 # out of 5
resolution_rate: 0.9
escalation_rate: 0.15
response_time_ms: 2000
Poor Example:
name: "Python Chat Application"
description: "A chat app built with Python and React"
# Vague, technology-focused, no measurable outcomes
2. Testable Acceptance Criteria
Every outcome must be objectively verifiable. Use specific, measurable criteria.
Testable Criteria:
acceptance:
http_endpoints:
- path: "/api/users"
method: "GET"
status: 200
response_time_ms: 100
response_schema: "schemas/user_list.json"
user_flows:
- name: "user_registration"
steps:
- visit_signup_page
- enter_valid_email
- enter_secure_password
- click_signup_button
- receive_confirmation_email
- verify_email_address
success_rate: 0.95
Non-Testable Criteria:
acceptance:
- "Users should be happy"
- "The system should be fast"
- "It should look good"
3. Right Level of Detail
Balance specificity with flexibility. Be specific about outcomes, flexible about implementation.
Good Balance:
stack:
database: "Relational (PostgreSQL preferred)"
auth: "OAuth 2.0 compatible"
api: "RESTful with OpenAPI spec"
acceptance:
authentication:
providers: ["Google", "GitHub"]
session_duration_hours: 24
password_requirements:
min_length: 12
require_mixed_case: true
require_numbers: true
Too Rigid:
stack:
database: "PostgreSQL 14.9 exactly"
orm: "SQLAlchemy 2.0.15"
auth: "Flask-Login 0.6.2"
# Overly prescriptive, limits agent flexibility
Too Vague:
stack:
database: "Something modern"
auth: "Whatever works"
# Too ambiguous for reliable automation
Acceptance Criteria Patterns
HTTP API Patterns
acceptance:
http_endpoints:
# Basic endpoint validation
- path: "/api/health"
method: "GET"
status: 200
response_contains: ["status", "ok"]
# Authenticated endpoint
- path: "/api/profile"
method: "GET"
status: 200
auth_required: true
response_schema: "schemas/user_profile.json"
# CRUD operations
- path: "/api/posts"
method: "POST"
status: 201
auth_required: true
request_schema: "schemas/post_create.json"
response_schema: "schemas/post_response.json"
# Error handling
- path: "/api/posts/invalid-id"
method: "GET"
status: 404
response_contains: ["error", "Post not found"]
User Experience Flows
acceptance:
ux_flows:
- name: "purchase_flow"
description: "Complete e-commerce purchase"
steps:
- action: "browse_products"
success_indicators: ["products_displayed", "search_works"]
- action: "add_to_cart"
success_indicators: ["cart_updated", "quantity_correct"]
- action: "checkout"
success_indicators: ["payment_form_displayed"]
- action: "complete_payment"
success_indicators: ["order_confirmation", "email_sent"]
completion_time_max_seconds: 120
success_rate_threshold: 0.95
abandonment_rate_max: 0.1
Performance Requirements
acceptance:
performance:
response_times:
api_endpoints_p95_ms: 200
page_load_p95_ms: 1500
database_queries_p95_ms: 50
throughput:
concurrent_users: 1000
requests_per_second: 500
transactions_per_second: 100
resource_usage:
memory_usage_mb_max: 512
cpu_usage_percent_max: 70
disk_space_gb_max: 10
availability:
uptime_percentage: 99.9
max_downtime_minutes_per_month: 43
Stack Selection Strategies
Technology Decision Framework
- Team Expertise: Match stack to team skills
- Project Requirements: Performance, scalability, security
- Time Constraints: Proven vs. experimental technologies
- Maintenance Burden: Long-term support considerations
- Integration Needs: Existing systems compatibility
Common Stack Patterns
Rapid Prototyping Stack
stack:
frontend: "Next.js@14"
backend: "Supabase"
styling: "TailwindCSS"
auth: "Supabase Auth"
deploy: "Vercel"
# Pros: Fast development, managed services, good defaults
# Cons: Vendor lock-in, less customization
# Best for: MVPs, small teams, quick validation
Enterprise Application Stack
stack:
frontend: "React@18 with TypeScript"
backend: "Node.js with Express"
database: "PostgreSQL"
cache: "Redis"
auth: "Auth0"
monitoring: "Datadog"
deploy: "AWS EKS"
# Pros: Scalable, customizable, enterprise support
# Cons: More complex, requires DevOps expertise
# Best for: Large teams, complex requirements, long-term projects
High-Performance API Stack
stack:
api: "Rust with Axum"
database: "PostgreSQL with connection pooling"
cache: "Redis Cluster"
message_queue: "Apache Kafka"
deploy: "Kubernetes"
monitoring: "Prometheus + Grafana"
# Pros: Maximum performance, resource efficiency
# Cons: Steeper learning curve, fewer developers
# Best for: High-traffic APIs, resource-constrained environments
Mobile-First Stack
stack:
mobile: "React Native"
backend: "Firebase"
auth: "Firebase Auth"
database: "Firestore"
push_notifications: "Firebase Cloud Messaging"
analytics: "Firebase Analytics"
# Pros: Cross-platform, integrated ecosystem
# Cons: Platform limitations, vendor lock-in
# Best for: Mobile-first apps, small teams, consumer apps
Guardrails & Quality Standards
Security-First Configuration
guardrails:
# Code quality
tests_required: true
min_test_coverage: 0.8
lint: true
type_check: true
# Security scanning
security_scan: true
dependency_check: true
secret_scan: true
# License compliance
license_whitelist: ["MIT", "Apache-2.0", "BSD-3-Clause"]
# Human oversight for high-risk changes
human_approval_required:
- "authentication_changes"
- "payment_processing"
- "data_migration"
- "production_deployment"
- "security_configuration"
secrets:
provider: "vault://production"
required:
- "DATABASE_PASSWORD"
- "JWT_SECRET"
- "API_KEYS"
rotation_policy:
database_password: 30 # days
api_keys: 90
jwt_secret: 365
Progressive Quality Gates
# Different standards for different environments
guardrails:
development:
min_test_coverage: 0.6
lint: "warn"
security_scan: "warn"
staging:
min_test_coverage: 0.8
lint: "error"
security_scan: "error"
performance_regression_threshold: 0.1
production:
min_test_coverage: 0.9
lint: "block"
security_scan: "block"
human_approval_required: true
rollback_capability: true
monitoring_alerts: true
Documentation & Context
Agent-Friendly Documentation
Provide context that helps agents understand intent and constraints:
prompts:
specifier: |
You are converting user requirements into OSpec format.
Our team prefers TypeScript over JavaScript.
We use PostgreSQL for all production databases.
We have existing auth infrastructure with Auth0.
Always include comprehensive error handling.
implementer: |
Follow our coding standards:
- Use functional programming patterns where possible
- Include JSDoc comments for all public functions
- Write tests for all business logic
- Use proper error types, not generic Error objects
- Log all external API calls with request/response details
reviewer: |
Focus on these review areas:
1. Security: Input validation, auth checks, data sanitization
2. Performance: Database queries, caching, async operations
3. Reliability: Error handling, retry logic, circuit breakers
4. Maintainability: Code organization, documentation, tests
Context Files
Include additional context files:
metadata:
context_files:
- "docs/architecture.md"
- "docs/coding_standards.md"
- "docs/security_guidelines.md"
- "examples/reference_implementation.md"
related_projects:
- name: "User Service"
repo: "github.com/company/user-service"
integration_points: ["authentication", "user_profiles"]
- name: "Payment Gateway"
repo: "github.com/company/payment-gateway"
integration_points: ["payment_processing", "webhooks"]
Common Patterns & Templates
Microservice Template
ospec_version: "1.0.0"
id: "-service"
name: " Service"
description: "Microservice for operations"
outcome_type: "api"
acceptance:
http_endpoints:
- path: "/health"
status: 200
- path: "/metrics"
status: 200
- path: "/"
method: "GET"
status: 200
auth_required: true
stack:
framework: "Express.js with TypeScript"
database: "PostgreSQL"
cache: "Redis"
deploy: "Docker + Kubernetes"
monitoring: "Prometheus + Grafana"
guardrails:
tests_required: true
min_test_coverage: 0.85
security_scan: true
human_approval_required: ["production_deploy"]
CLI Tool Template
ospec_version: "1.0.0"
id: "-cli"
name: " CLI"
description: "Command-line tool for "
outcome_type: "cli"
acceptance:
commands:
- name: " --help"
exit_code: 0
output_contains: ["Usage:", "Options:"]
- name: " --version"
exit_code: 0
output_format: "semver"
- name: " "
exit_code: 0
performance:
max_execution_time_ms: 5000
stack:
language: "Rust"
cli_framework: "clap"
config: "serde with TOML"
packaging: "cargo"
guardrails:
tests_required: true
min_test_coverage: 0.8
cross_platform: true
binary_size_limit_mb: 50
Validation & Testing
Continuous Validation
# Validate your OSpec as you develop
validation:
schema_check: true
stack_compatibility: true
acceptance_criteria_feasibility: true
automated_checks:
- "ospec validate outcome.yaml"
- "ospec test-acceptance outcome.yaml"
- "ospec estimate-effort outcome.yaml"
A/B Testing Specifications
# Test different approaches
variants:
- id: "react-frontend"
stack:
frontend: "React"
bundler: "Webpack"
- id: "vue-frontend"
stack:
frontend: "Vue.js"
bundler: "Vite"
evaluation_criteria:
- development_speed
- bundle_size
- performance_metrics
- developer_satisfaction
Troubleshooting Common Issues
1. Vague Requirements
Problem: “Build a social media app” Solution: Ask specific questions
- What are the core features? (posts, comments, DMs?)
- Who are the users? (consumers, businesses, specific demographics?)
- What’s the scale? (100 users or 100,000?)
- What makes this different from existing solutions?
2. Over-Engineering
Problem: Specifying complex architecture for simple needs Solution: Start simple, add complexity incrementally
# Start with this
stack:
backend: "FastAPI"
database: "SQLite"
deploy: "Single server"
# Not this for a prototype
stack:
backend: "Microservices with Kubernetes"
database: "Distributed PostgreSQL cluster"
message_queue: "Apache Kafka"
service_mesh: "Istio"
3. Untestable Acceptance Criteria
Problem: “Users should love the interface” Solution: Define measurable proxies
acceptance:
user_experience:
user_satisfaction_score: 4.0 # out of 5
task_completion_rate: 0.9
average_session_duration_minutes: 15
bounce_rate_max: 0.3
4. Technology Lock-in
Problem: Specifying exact versions without flexibility Solution: Use ranges and alternatives
stack:
database: "PostgreSQL 14+" # Allow newer versions
alternatives: ["MySQL 8+", "SQLite for development"]
frontend: "React 18+"
alternatives: ["Vue.js 3+", "Svelte 4+"]
Advanced Techniques
Conditional Logic
# Adapt based on context
stack:
database: >
{{#if team_experience.postgresql}}
"PostgreSQL"
{{else if team_experience.mongodb}}
"MongoDB"
{{else}}
"SQLite"
{{/if}}
acceptance:
performance:
response_time_ms: >
{{#if expected_users > 10000}}
50
{{else if expected_users > 1000}}
100
{{else}}
200
{{/if}}
Environment-Specific Configuration
environments:
development:
stack:
database: "SQLite"
deploy: "local"
guardrails:
tests_required: false
staging:
stack:
database: "PostgreSQL"
deploy: "Docker"
guardrails:
tests_required: true
min_test_coverage: 0.7
production:
stack:
database: "PostgreSQL with replicas"
deploy: "Kubernetes"
guardrails:
tests_required: true
min_test_coverage: 0.9
human_approval_required: true
Dependency Management
dependencies:
internal:
- name: "user-service"
version: "^2.1.0"
endpoint: "${USER_SERVICE_URL}"
external:
- name: "stripe-api"
version: "2023-10-16"
rate_limits:
requests_per_second: 100
optional:
- name: "analytics-service"
fallback: "local_logging"
timeout_ms: 1000
Best Practices Summary
- Start with Outcomes: Define success criteria first
- Be Specific: Use measurable, testable requirements
- Allow Flexibility: Avoid over-constraining implementation
- Think Long-term: Consider maintenance and evolution
- Include Context: Help agents understand the “why”
- Validate Early: Check feasibility before implementation
- Iterate Frequently: Refine based on results
- Document Decisions: Explain trade-offs and constraints
- Plan for Failure: Include error handling and recovery
- Security First: Build in security from the beginning
Common Anti-Patterns to Avoid
- ❌ Technology-driven specifications (“Build with React”)
- ❌ Unmeasurable success criteria (“Make it user-friendly”)
- ❌ Over-prescriptive implementation details
- ❌ Ignoring non-functional requirements
- ❌ Single point of failure designs
- ❌ Hardcoded configuration values
- ❌ Missing error handling scenarios
- ❌ Inadequate test coverage requirements
- ❌ No consideration for monitoring/observability
- ❌ Unclear acceptance criteria
By following these guidelines, you’ll create OSpecs that consistently lead to successful, maintainable, and secure automated outcomes.