Qualspec
LLM-judged qualitative testing for Ruby. Evaluate AI agents, compare models, and test subjective qualities that traditional assertions can't capture.
Installation
gem "qualspec"
Configuration
Set your API key (OPEN_ROUTER_API_KEY also works as a fallback):
export QUALSPEC_API_KEY=your_openrouter_key
Environment Variables
| Variable | Description | Default |
|---|---|---|
QUALSPEC_API_KEY |
API key (required; falls back to OPEN_ROUTER_API_KEY) |
- |
QUALSPEC_API_URL |
API endpoint | https://openrouter.ai/api/v1 |
QUALSPEC_MODEL |
Default model for candidates | openrouter/auto |
QUALSPEC_JUDGE_MODEL |
Model used as judge | Same as QUALSPEC_MODEL |
QUALSPEC_MODELS_FILE |
Path to the named-models YAML | config/models.yml |
Models
The default model everywhere is openrouter/auto, which routes to a sensible
model for any request — so qualspec works even with nothing configured. A
candidate with no model: uses this default too.
Curated models live in config/models.yml and can be referenced by name:
candidate :flash, model: Qualspec.model(:deepseek_flash)
Qualspec.model(:glm) # => "z-ai/glm-5.2"
Qualspec.model(:unknown) # => "openrouter/auto" (falls back to default)
Qualspec.model # => "openrouter/auto"
Qualspec.models.all # => { "glm" => "z-ai/glm-5.2", ... }
Edit config/models.yml to add/rename models, or point QUALSPEC_MODELS_FILE
at your own. Override the process-wide default with QUALSPEC_MODEL.
Quick Start
Compare Models (CLI)
# eval/comparison.rb
Qualspec.evaluation "Model Comparison" do
candidates do
candidate "gpt4", model: "openai/gpt-4"
candidate "claude", model: "anthropic/claude-3-sonnet"
end
scenario "helpfulness" do
prompt "How do I center a div in CSS?"
eval "provides a working solution"
eval "explains the approach"
end
end
# Run comparison
qualspec eval/comparison.rb
# Generate HTML report
qualspec --html report.html eval/comparison.rb
Test Your Agent (RSpec)
require "qualspec/rspec"
RSpec.describe MyAgent do
include Qualspec::RSpec::Helpers
it "responds helpfully" do
response = MyAgent.call("Hello")
result = qualspec_evaluate(response, "responds in a friendly manner")
expect(result).to be_passing
end
end
Documentation
- Getting Started
- Evaluation Suites - CLI for model comparison (incl. cost/value tracking)
- RSpec Integration - Testing your agents
- Rubrics - Builtin and custom evaluation criteria
- Configuration - All options, models, cost tracking
- Recording - VCR integration
- Examples - Runnable scripts (replay free from cassettes)
License
MIT