Diogenes

"I am looking for an honest man." — Diogenes of Sinope

Diogenes is a Ruby gem that helps engineering teams make and defend decisions about when AI belongs in a feature — and when it doesn't.

It encodes a responsible AI decision framework directly into your Rails application as executable, auditable gates, and then actively monitors your AI features in production through grounding verification, document drift detection, and a regression-aware eval runner.

A mounted dashboard surfaces all of this in one place. Think Sidekiq Web for AI accountability.

The Problem

Most teams make two kinds of mistakes with AI features:

Mistake one: Deciding to build them based on excitement or pressure rather than defensible criteria. When something goes wrong, nobody can explain why the decision was made or what safeguards were in place.

Mistake two: Shipping them and assuming they continue to work. AI features degrade silently — documents go stale, retrieval quality drifts, models change. Traditional monitoring misses all of it because wrong-but-fluent outputs don't raise exceptions.

Diogenes addresses both.

Installation

gem 'diogenes'

bundle install
rails generate diogenes:install
bundle exec rake db:migrate

What Diogenes Does

1. The Decision Framework (Gates)

Before an AI feature can serve output to a user, it must pass a set of declared gates. Gates are validated at boot — misconfiguration fails loudly before anything reaches production.

class SupportAssistant
  include Diogenes::Feature

  gate :failure_mode,      severity: :recoverable
  gate :user_calibration,  audience: :trained_agent
  gate :human_in_loop,     verified: true, max_daily_reviews: 80
  gate :observability,     logging: :full, alerting: :enabled
  gate :necessity,         alternatives_considered: true

  def answer(query, agent:)
    # your implementation
  end
end

A feature that cannot satisfy a gate raises Diogenes::UnsafeFeatureError at boot with a plain-English explanation of what needs to change.

The five gates: :failure_mode, :user_calibration, :human_in_loop, :observability, :necessity. See docs/framework.md for full documentation.

2. Grounding Verification

For RAG pipelines, Diogenes ships a grounding verifier that runs a second LLM pass to check that AI output is actually supported by retrieved context — not confabulated.

class SupportAssistant
  include Diogenes::Feature
  include Diogenes::Grounding

  verify_grounding threshold: 0.8, on_failure: :flag_for_review

  def answer(query, agent:)
    context = retriever.retrieve(query)
    response = llm.complete(query, context: context)

    verify_and_return(response, context: context, reviewed_by: agent)
  end
end

The verifier returns a structured verdict — which claims are supported, unsupported, or contradicted by the retrieved context — and acts on it according to your configuration. Flag rates and verdicts are tracked in the audit log and surfaced in the dashboard.

Configure any LLM callable as the verifier backend — Diogenes has no opinion on which one:

Diogenes.configure do |config|
  config.grounding.verifier_llm = -> (prompt) { Anthropic::Client.new.complete(prompt) }
end

3. Drift Detection

Documents get indexed once and go stale. Policies change, prices change, features change. Diogenes tracks when source documents were last updated versus when their embeddings were created, surfaces a staleness score, and can trigger re-indexing automatically.

# Inform Diogenes that a source document has changed
Diogenes::Drift.source_updated(
  document_id: 'refund-policy-v2',
  updated_at: Time.current,
  diff_size: :major
)

# config/initializers/diogenes.rb
Diogenes.configure do |config|
  config.drift.reindex_job = ReindexDocumentJob
  config.drift.staleness_thresholds = { warning: 7.days, critical: 30.days }
  config.drift.alert_webhook = ENV['DIOGENES_ALERT_WEBHOOK']
end

Stale documents surface in the dashboard drift tab, ranked by severity. Re-indexing queues your job with one click or via a Rake task.

4. Eval Runner

The hardest unsolved problem in production AI is knowing whether your feature is getting better or worse over time. Diogenes ships a lightweight eval framework: define golden question/answer pairs, run them on a schedule, track pass rates over time, and alert on regression.

# test/diogenes/evals/support_assistant_evals.rb

Diogenes::Evals.define(SupportAssistant) do
  eval "basic refund question" do
    query    "How do I request a refund?"
    expects  all_of(
      grounded_in("refund-policy"),
      contains("billing page"),
      does_not_contain("24 hours")
    )
  end

  eval "question with no good answer" do
    query    "What is the API rate limit for legacy v1 endpoints?"
    expects  one_of(
      low_confidence_response,
      routes_to_human_review
    )
  end
end

bundle exec rake diogenes:evals:run[SupportAssistant]

When a passing eval starts failing, Diogenes records the regression point and diffs the last passing response against the first failing one. In most cases it can correlate the regression directly to a stale document in the drift tracker.

5. The Dashboard

Mount the Diogenes engine to get a live view of all of the above in one place:

# config/routes.rb
authenticate :user, ->(u) { u.admin? } do
  mount Diogenes::Engine => '/diogenes'
end

The overview tab shows one row per gated feature — gates declared, grounding flag rate, drift score, and eval pass rate. A feature that is passing all its gates but has 11 stale documents and a declining eval pass rate is visible before it becomes a production incident.

See docs/dashboard.md for the full dashboard documentation including route structure, controller layout, and configuration reference.

The Audit Trail

Every AI call made through a Diogenes-gated feature produces an audit record:

Diogenes::AuditLog.for_feature(SupportAssistant)
# => [
#   {
#     feature:         "SupportAssistant",
#     gate_config:     { failure_mode: :recoverable, ... },
#     query_hash:      "sha256:...",
#     context_sources: ["refund-policy.md", "enterprise-terms.md"],
#     grounding:       { supported: [...], unsupported: [], contradicted: [] },
#     verified_by:     "agent@company.com",
#     timestamp:       2024-01-15 14:23:01 UTC
#   }
# ]

Audit records store hashes, not raw content — PII never enters the audit log directly. The host app controls content storage and retention.

Philosophy

Diogenes takes no position on whether AI is good or bad for your product. It takes one strong position: that decision should be made deliberately, defensibly, and with receipts.

A feature that passes all five gates and fails in production is a recoverable engineering problem. A feature that never asked the questions is a different kind of problem entirely.

See docs/framework.md for the full decision framework and docs/examples.md for two worked examples — one that passes, one that doesn't.

Contributing

See docs/contributing.md.

License

MIT