lex-llm-ledger
LLM observability persistence for LegionIO. Consumes metering and audit messages from
AMQP queues, decrypts audit payloads, enforces retention policies, and writes official
legion-data LLM lifecycle records for usage reporting and compliance.
Queues Consumed
| Queue | Exchange | Binding | Content |
|---|---|---|---|
llm.metering.write |
llm.metering (topic) |
metering.# |
Cleartext token/cost metrics |
llm.audit.prompts |
llm.audit (topic) |
audit.prompt.# |
Encrypted prompt+response pairs |
llm.audit.tools |
llm.audit (topic) |
audit.tool.# |
Encrypted tool call records |
Official Tables
llm_conversations- Conversation container and retention/classification metadatallm_messages- Model-visible user and assistant messagesllm_message_inference_requests- Operation, correlation, request payload, and policy contextllm_message_inference_responses- Provider, provider instance, model, dispatch path, visible response, and thinking payloadllm_message_inference_metrics- Tokens, latency, cost, and finance allocationllm_tool_calls- Provider-requested tool call lineagellm_registry_events- Provider/model availability events
Prompt and metering consumers write the official lifecycle tables directly.
llm_tool_records and llm_registry_availability_records remain operational
projection tables while the official tool/registry event cutover continues.
The legacy backfill reconciles those rows into llm_tool_calls and
llm_registry_events when they can be linked to official inference responses.
Legacy-only prompt/metering writer mode hard-stops instead of silently writing
stale projections.
Event Spine Target
The existing tables are useful reporting projections, but the uplift target is end-to-end visibility for every LLM-related lifecycle event. Ledger should add a canonical llm_events stream/table and keep metering_records, prompt_records, and tool_records as specialized query views or companion tables.
Every event should share these correlation keys:
conversation_idrequest_idexchange_idmessage_idparent_message_idmessage_seqcorrelation_idtrace_idspan_idevent_idevent_seq
Event types should cover at least:
- request received, normalized, classified, enriched, and context-assembled
- routing candidates built, candidates excluded, offering selected, failover attempted, escalation attempted
- provider request started, provider response received, provider error/timeout/cancel
- response normalized, streamed chunk emitted, final response returned
- MCP/tool call planned, started, completed, failed, denied, or timed out
- fleet request published, broker accepted/unroutable, worker accepted, worker rejected, fleet response received
- metering emitted, audit emitted, ledger write queued, ledger write succeeded/failed/spooled
This lets operators reconstruct a conversation without replaying prompt bodies. Example: conversation 123 had 32 messages, one failed, five executed on Anthropic direct, four locally, the rest on GPU fleet, with per-step response time, token totals, cost allocation, and failover history.
Ledger has three distinct outputs:
- Legal/evidence reconstruction - immutable, correlated, retention-controlled event evidence sufficient to answer a legal or security request. This favors completeness, ordering, integrity, and capture-mode correctness.
- Operational analytics - structured projections for high-level patterns, cost, latency, quality, routing behavior, fleet utilization, tool usage, and failure rates. This favors queryability and aggregation without requiring raw prompt bodies.
- Governed training/evaluation datasets - policy-approved derived datasets for model improvement, team/org use-case tuning, eval generation, routing-quality analysis, and tool-use learning. This must be derived from ledger events through explicit consent, classification, redaction/de-identification, retention, and export controls.
Training/eval export is not automatic reuse of raw audit. A future dataset builder should select eligible events, apply redaction and capture-mode policy, preserve provenance back to event_id/conversation_id, and write a dataset manifest that records data classes, consent basis, source filters, transform versions, and approval state.
Key Design Decisions
- Consumer only - never publishes to any exchange
- Passive exchange references - does not declare
llm.meteringorllm.audit(owned by legion-llm) - DecryptionUnavailable causes NACK - messages requeue until the node has Vault credentials
- PHI TTL cap - records flagged
contains_phiare capped at 30 days regardless of retention label - Idempotent official writes - duplicate request/response/message references resolve to existing official rows
Routing Uplift Target
The 2026-04-25 legion-llm routing redesign moves routing to operation-aware model offerings. Ledger should persist the enriched metadata published by legion-llm without owning routing policy.
Target metering, prompt, and tool records should be able to store:
- selected offering identity:
offering_id,provider_family,instance_id,canonical_model,provider_model,operation,transport,region,endpoint_hash - routing details: requested route, selected route, excluded candidates, lateral failover chain, vertical escalation chain, and policy decisions
- identity details: caller principal/canonical name/kind/source, accepting runtime identity, executing runtime identity for fleet requests, fleet lane, fleet class, network boundary, placement policy, fleet correlation ID, hashed reply target, and credential lease/grant metadata
- token and cost allocation: conversation ID, input/output/total tokens, selected-offering cost, pricing tier, configured baseline/comparable provider cost, avoided cost, and aggregation keys for tier, fleet class, provider family, instance, model, transport, and lane
- compliance details:
contains_pii,contains_phi,contains_pci,data_classes,jurisdictions,retention_policy, andcapture_mode - model provenance: management state, model depot registry ID, artifact digest, signature verification status, rollout ring, and approval state
- tool provenance: source type/server, policy tags, approval/denial state, redacted or hashed resource identifiers, and input/output classification flags
- registry/availability events: worker heartbeat, lane availability, offering availability, model sync state, degraded/draining/blocked transitions, and capacity changes from
llm.registry
The uplift must validate the existing runners and migrations against this target. Current tables already capture core metering, prompt audit, and tool audit, but they need additional correlation fields, routing/offering fields, token context fields, cost allocation fields, identity/fleet fields, and event-spine coverage for request/response/MCP lifecycle events that are not prompt or tool records.
Audit capture modes expected from legion-llm:
none- do not publish prompt/tool body auditmetadata_only- store routing/classification/token/cost metadata onlyredacted- store redacted bodies plus redaction metadataencrypted_raw- store encrypted full payloads for approved consumersraw- plaintext full payloads for local/dev or explicitly approved environments
Prompt/tool audit should be durable. If transport is unavailable, legion-llm should spool audit records or use a durable local audit queue unless capture mode is none or policy explicitly allows best-effort audit.
For async :fleet inference, ledger records should preserve the original caller identity and record both runtimes: the process that accepted/enqueued the request and the worker process that executed the provider call. Fleet records should also persist the selected lane, worker fleet class (endpoint, datacenter, cloud_vpc, etc.), placement policy, and model provenance so investigators can tell whether a request ran on the caller's own machine, another endpoint, a datacenter GPU, or a cloud-adjacent worker. The raw RabbitMQ reply_to queue should remain transport-only; persisted records should use a stable hash plus the correlation_id for reconstruction.
Fleet registry history should arrive through RabbitMQ rather than endpoint workers writing directly to the database. legion-llm and provider workers publish availability events to llm.registry; ledger consumes those events and persists durable history for operator diagnostics, audit, and legal reconstruction.
Ledger should be able to answer spend-allocation questions without replaying raw prompts: how many input/output tokens a conversation used, how tokens split across Anthropic direct versus fleet GPU versus endpoint MacBook fleet, and estimated dollars saved by local/fleet execution compared with a configured cloud/frontier baseline.
Ledger is not on the LLM execution critical path. If the database is unavailable, ledger consumers should retry, requeue, DLQ, or spool according to transport policy while legion-llm continues routing and executing requests. Compliance profiles that require durable audit before response are the explicit exception and should fail closed upstream with a clear policy error.
Requirements
legion-data>= 1.8.0 (official LLM lifecycle schema)legion-json>= 1.2 (JSON serialization)legion-logging>= 1.3 (structured exception logging)legion-settings>= 1.3 (extension-scoped retention settings)legion-transport>= 1.4.14 (AMQP transport)legion-crypt>= 1.5 (for decrypting audit messages, optional at runtime)
Configuration
Ledger runs with safe defaults and reads extension settings from
extensions.llm.ledger:
{
"extensions": {
"llm": {
"ledger": {
"retention": {
"default_days": 90,
"phi_ttl_days": 30
}
}
}
}
}
default_days controls records with the default retention label. phi_ttl_days
caps PHI records even when the event asks for longer or permanent retention.
Encrypted audit messages must include an iv header; missing-IV messages are
rejected as malformed encrypted audit records rather than retried.
Usage
# Metering write (called by Metering actor)
Legion::Extensions::Llm::Ledger::Runners::Metering.write_metering_record(payload, )
# Usage summary
Legion::Extensions::Llm::Ledger::Runners::UsageReporter.summary(period: 'day', group_by: 'provider_instance')
# Budget check
Legion::Extensions::Llm::Ledger::Runners::UsageReporter.budget_check(budget_id: 'budget_q1', budget_usd: 100.0)
# Provider health
Legion::Extensions::Llm::Ledger::Runners::ProviderStats.health_report
# One-time legacy reconciliation
Legion::Extensions::Llm::Ledger::Backfill::LegacyLlmRecords.run
Development
bundle install
bundle exec rspec
bundle exec rubocop
License
MIT