Diogenes
"I am looking for an honest man." — Diogenes of Sinope
Diogenes is a Ruby gem that helps engineering teams make and defend decisions about when AI belongs in a feature — and when it doesn't.
It encodes a responsible AI decision framework directly into your Rails application as executable, auditable gates, and then actively monitors your AI features in production through grounding verification, document drift detection, and a regression-aware eval runner.
A mounted dashboard surfaces all of this in one place. Think Sidekiq Web for AI accountability.
The Problem
Most teams make two kinds of mistakes with AI features:
Mistake one: Deciding to build them based on excitement or pressure rather than defensible criteria. When something goes wrong, nobody can explain why the decision was made or what safeguards were in place.
Mistake two: Shipping them and assuming they continue to work. AI features degrade silently — documents go stale, retrieval quality drifts, models change. Traditional monitoring misses all of it because wrong-but-fluent outputs don't raise exceptions.
Diogenes addresses both.
Installation
gem 'diogenes'
bundle install
rails generate diogenes:install
bundle exec rake db:migrate
What Diogenes Does
1. The Decision Framework (Gates)
Before an AI feature can serve output to a user, it must pass a set of declared gates. Gates are validated at boot — misconfiguration fails loudly before anything reaches production.
class SupportAssistant
include Diogenes::Feature
gate :failure_mode, severity: :recoverable
gate :user_calibration, audience: :trained_agent
gate :human_in_loop, verified: true, max_daily_reviews: 80
gate :observability, logging: :full, alerting: :enabled
gate :necessity, alternatives_considered: true
def answer(query, agent:)
# your implementation
end
end
A feature that cannot satisfy a gate raises Diogenes::UnsafeFeatureError at boot with a plain-English explanation of what needs to change.
The five gates: :failure_mode, :user_calibration, :human_in_loop, :observability, :necessity. See docs/framework.md for full documentation.
2. Grounding Verification
For RAG pipelines, Diogenes ships a grounding verifier that runs a second LLM pass to check that AI output is actually supported by retrieved context — not confabulated.
class SupportAssistant
include Diogenes::Feature
include Diogenes::Grounding
verify_grounding threshold: 0.8, on_failure: :flag_for_review
def answer(query, agent:)
context = retriever.retrieve(query)
response = llm.complete(query, context: context)
verify_and_return(response, context: context, reviewed_by: agent)
end
end
The verifier returns a structured verdict — which claims are supported, unsupported, or contradicted by the retrieved context — and acts on it according to your configuration. Flag rates and verdicts are tracked in the audit log and surfaced in the dashboard.
Configure any LLM callable as the verifier backend — Diogenes has no opinion on which one:
Diogenes.configure do |config|
config.grounding.verifier_llm = -> (prompt) { Anthropic::Client.new.complete(prompt) }
end
3. Drift Detection
Documents get indexed once and go stale. Policies change, prices change, features change. Diogenes tracks when source documents were last updated versus when their embeddings were created, surfaces a staleness score, and can trigger re-indexing automatically.
# Inform Diogenes that a source document has changed
Diogenes::Drift.source_updated(
document_id: 'refund-policy-v2',
updated_at: Time.current,
diff_size: :major
)
# config/initializers/diogenes.rb
Diogenes.configure do |config|
config.drift.reindex_job = ReindexDocumentJob
config.drift.staleness_thresholds = { warning: 7.days, critical: 30.days }
config.drift.alert_webhook = ENV['DIOGENES_ALERT_WEBHOOK']
end
Stale documents surface in the dashboard drift tab, ranked by severity. Re-indexing queues your job with one click or via a Rake task.
4. Eval Runner
The hardest unsolved problem in production AI is knowing whether your feature is getting better or worse over time. Diogenes ships a lightweight eval framework: define golden question/answer pairs, run them on a schedule, track pass rates over time, and alert on regression.
# test/diogenes/evals/support_assistant_evals.rb
Diogenes::Evals.define(SupportAssistant) do
eval "basic refund question" do
query "How do I request a refund?"
expects all_of(
grounded_in("refund-policy"),
contains("billing page"),
does_not_contain("24 hours")
)
end
eval "question with no good answer" do
query "What is the API rate limit for legacy v1 endpoints?"
expects one_of(
low_confidence_response,
routes_to_human_review
)
end
end
bundle exec rake diogenes:evals:run[SupportAssistant]
When a passing eval starts failing, Diogenes records the regression point and diffs the last passing response against the first failing one. In most cases it can correlate the regression directly to a stale document in the drift tracker.
5. The Dashboard
Mount the Diogenes engine to get a live view of all of the above in one place:
# config/routes.rb
authenticate :user, ->(u) { u.admin? } do
mount Diogenes::Engine => '/diogenes'
end
The overview tab shows one row per gated feature — gates declared, grounding flag rate, drift score, and eval pass rate. A feature that is passing all its gates but has 11 stale documents and a declining eval pass rate is visible before it becomes a production incident.
See docs/dashboard.md for the full dashboard documentation including route structure, controller layout, and configuration reference.
The Audit Trail
Every AI call made through a Diogenes-gated feature produces an audit record:
Diogenes::AuditLog.for_feature(SupportAssistant)
# => [
# {
# feature: "SupportAssistant",
# gate_config: { failure_mode: :recoverable, ... },
# query_hash: "sha256:...",
# context_sources: ["refund-policy.md", "enterprise-terms.md"],
# grounding: { supported: [...], unsupported: [], contradicted: [] },
# verified_by: "agent@company.com",
# timestamp: 2024-01-15 14:23:01 UTC
# }
# ]
Audit records store hashes, not raw content — PII never enters the audit log directly. The host app controls content storage and retention.
Philosophy
Diogenes takes no position on whether AI is good or bad for your product. It takes one strong position: that decision should be made deliberately, defensibly, and with receipts.
A feature that passes all five gates and fails in production is a recoverable engineering problem. A feature that never asked the questions is a different kind of problem entirely.
See docs/framework.md for the full decision framework and docs/examples.md for two worked examples — one that passes, one that doesn't.
Contributing
See docs/contributing.md.
License
MIT