legion-data
Persistent database storage for the LegionIO async job engine and AI coding assistant platform. Provides database connectivity via the Sequel ORM, automatic schema migrations (71 numbered migrations), Sequel models for the full LegionIO control plane, and a parallel local SQLite database for on-node agentic cognitive state.
Version: 1.6.25 | Ruby: >= 3.4 | License: Apache-2.0
Supported Databases
| Database | Adapter | Gem | Default |
|---|---|---|---|
| SQLite | sqlite |
sqlite3 (bundled) |
Yes |
| MySQL | mysql2 |
mysql2 (optional) |
No |
| PostgreSQL | postgres |
pg (optional) |
No |
SQLite is the default and requires no additional gems. For MySQL or PostgreSQL, install the corresponding gem and configure the adapter.
Installation
gem install legion-data
Or add to your Gemfile:
gem 'legion-data'
# For production databases, add one of these:
# gem 'mysql2', '>= 0.5.5'
# gem 'pg', '>= 1.5'
Architecture Overview
Legion::Data (singleton module)
├── .setup # Connect, migrate, load models, set up local DB
├── .connection # Sequel::Database handle (shared/central)
├── .local # Legion::Data::Local (local SQLite accessor)
├── .stats # Combined { shared: ..., local: ... } metrics
├── .reload_static_cache # Refresh in-memory StaticCache after extension hot-load
├── .shutdown # Close both shared and local connections
│
├── Connection # Sequel database connection management
│ ├── .adapter # Reads adapter from settings (:sqlite, :mysql2, :postgres)
│ ├── .setup # Establish connection (dev_mode fallback to SQLite if unreachable)
│ ├── .sequel # Raw Sequel::Database accessor
│ ├── .stats # Pool metrics, tuning snapshot, adapter-specific DB stats
│ └── .shutdown # Disconnect and close query file logger
│
├── Migration # Auto-migration system (71 numbered Sequel DSL migrations)
│
├── Model # Sequel model autoloader
│ └── Models: Extension, Function, Runner, Node, Task, TaskLog, Setting,
│ DigitalWorker, Relationship, AuditLog, AuditRecord, Chain,
│ RbacRoleAssignment, RbacRunnerGrant, RbacCrossTeamGrant,
│ IdentityProvider, Principal, Identity, IdentityGroup,
│ IdentityGroupMembership,
│ ApolloEntry, ApolloRelation, ApolloExpertise, ApolloAccessLog (PG only)
│
├── Local # Parallel local SQLite for agentic cognitive state
│ ├── .setup # Lazy init — creates legionio_local.db on first access
│ ├── .connection # Sequel::SQLite::Database handle
│ ├── .model(:table) # Create Sequel::Model bound to local connection
│ ├── .register_migrations(name:, path:) # Extensions add their own migration dirs
│ ├── .stats # Local SQLite metrics (PRAGMAs, file size, registered migrations)
│ └── .shutdown # Close local connection
│
├── Extract # 10-handler text extraction registry (txt/md/csv/json/jsonl/html/xlsx/docx/pdf/pptx/vtt)
├── Spool # Filesystem write buffer for DB-unavailable scenarios
├── Rls # PostgreSQL row-level security helpers (tenant isolation)
├── StorageTiers # Hot/warm/cold archival lifecycle
├── EventStore # Append-only governance event store with hash chain integrity
├── Vector # Reusable pgvector helpers (cosine_search, l2_search, ensure_extension!)
└── Settings # Default configuration with per-adapter credential presets
Two-Database Architecture
legion-data maintains two independent databases:
- Shared DB (SQLite / MySQL / PostgreSQL) — control plane data: extensions, tasks, runners, nodes, settings, audit logs, relationships. Shared across the cluster.
- Local DB (always SQLite) — agentic cognitive state: memory traces, trust scores, dream journals. On-node only; no cross-database joins.
Deleting legionio_local.db provides cryptographic erasure — no residual data.
Usage
require 'legion/data'
# Set up shared DB + local SQLite, run migrations, load models
Legion::Data.setup
# Access the Sequel database handle
Legion::Data.connection # => Sequel::Database
# Access models
Legion::Data::Model::Extension.all # => Sequel::Dataset
Legion::Data::Model::Task.first(id: 42)
Legion::Data::Model::Setting.where(key: 'my_setting').first
# Access local cognitive state DB
Legion::Data.local.connection # => Sequel::SQLite::Database
Legion::Data.local.connected? # => true
Legion::Data.local.db_path # => "legionio_local.db"
# Check connection health
Legion::Data.connected? # => true
Legion::Data.stats # => { shared: {...}, local: {...} }
# Shut down both connections
Legion::Data.shutdown
Local Database (Agentic Cognitive State)
Extensions register their own migration directories and create models bound to the local connection:
# Register extension migrations (called during extension setup)
Legion::Data::Local.register_migrations(name: :memory, path: '/path/to/migrations')
# Create a model class bound to the local DB
MyMemoryTrace = Legion::Data::Local.model(:memory_traces)
MyMemoryTrace.all # queries legionio_local.db, never the shared DB
Text Extraction
Legion::Data::Extract provides a handler registry for extracting text from documents, used by lex-knowledge for corpus ingestion:
text = Legion::Data::Extract.extract('/path/to/document.pdf')
text = Legion::Data::Extract.extract('/path/to/data.csv')
Supported formats: .txt, .md, .csv, .json, .jsonl, .html, .xlsx, .docx, .pdf, .pptx, .vtt
Filesystem Spool (Write Buffer)
When the database is unavailable, Legion::Data::Spool buffers writes to ~/.legionio/data/spool/ and replays once the connection is restored:
spool = Legion::Data::Spool.for(Legion::Extensions::MyLex)
spool.write({ task_id: SecureRandom.uuid, data: payload })
spool.drain { |entry| process(entry) }
Row-Level Security (PostgreSQL)
Legion::Data::Rls provides tenant isolation via PostgreSQL session variables (migration 043):
Legion::Data::Rls.with_tenant(tenant_id) do
Legion::Data::Model::Task.all # scoped to tenant_id via RLS policy
end
Permission Checks
Legion::Data.can_write?(:tasks) # => true (SQLite always true)
Legion::Data.can_read?(:tasks) # => true
Legion::Data.reset_privileges! # clear cached privilege checks
Configuration
All settings live under the data key. The adapter controls which options apply.
SQLite (default)
{
"data": {
"adapter": "sqlite",
"creds": {
"database": "legionio.db"
}
}
}
MySQL
{
"data": {
"adapter": "mysql2",
"creds": {
"username": "legion",
"password": "legion",
"database": "legionio",
"host": "127.0.0.1",
"port": 3306
}
}
}
PostgreSQL
{
"data": {
"adapter": "postgres",
"creds": {
"user": "legion",
"password": "legion",
"database": "legionio",
"host": "127.0.0.1",
"port": 5432
}
}
}
PostgreSQL with pgvector is required for Apollo models:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Full Configuration Reference
{
"data": {
"adapter": "sqlite",
"connected": false,
"dev_mode": false,
"dev_fallback": true,
"connect_on_start": true,
"max_connections": 25,
"pool_timeout": 5,
"preconnect": "concurrently",
"single_threaded": false,
"test": true,
"log": false,
"query_log": false,
"log_warn_duration": 1,
"sql_log_level": "debug",
"connection_validation": true,
"connection_validation_timeout": 600,
"connection_expiration": true,
"connection_expiration_timeout": 14400,
"read_replica_url": null,
"replicas": [],
"creds": { "database": "legionio.db" },
"migrations": {
"continue_on_fail": false,
"auto_migrate": true
},
"models": {
"continue_on_load_fail": false,
"autoload": true
},
"local": {
"enabled": true,
"database": "legionio_local.db",
"migrations": { "auto_migrate": true }
},
"cache": {
"connected": false,
"auto_enable": false,
"static_cache": false,
"ttl": 60
}
}
}
Dev Mode Fallback
When dev_mode: true and a network database is unreachable, the shared connection automatically falls back to SQLite:
{ "data": { "dev_mode": true, "dev_fallback": true } }
HashiCorp Vault Integration
When Vault is connected, credentials are fetched dynamically from database/creds/legion, overriding any static creds block.
Caching
Two independent caching tiers, both disabled by default:
| Tier | Setting | Models | Backend |
|---|---|---|---|
| StaticCache | data.cache.static_cache: true |
Extension, Runner, Function | In-process frozen Ruby hash |
| External Cache | data.cache.auto_enable: true + Legion::Cache |
Relationship, Node, Setting | Redis/Memcached/Memory |
# After hot-loading extensions, refresh the static cache:
Legion::Data.reload_static_cache
Read Replicas (PostgreSQL)
{
"data": {
"read_replica_url": "postgres://user:pass@replica1/db",
"replicas": ["postgres://user:pass@replica2/db"]
}
}
Data Models
| Model | Table | Description |
|---|---|---|
Extension |
extensions |
Installed LEX extensions |
Function |
functions |
Available functions per extension (with embeddings) |
Runner |
runners |
Runner definitions (AMQP routing keys) |
Node |
nodes |
Cluster node registry |
Task |
tasks |
Task instances |
TaskLog |
task_logs |
Task execution logs |
Setting |
settings |
Persistent settings store |
DigitalWorker |
digital_workers |
Digital worker registry |
Relationship |
relationships |
Task trigger/action chains between functions |
Chain |
chains |
Task execution chains |
AuditLog |
audit_log |
Tamper-evident audit trail with hash chain |
AuditRecord |
audit_records |
Structured audit records |
RbacRoleAssignment |
rbac_role_assignments |
RBAC principal -> role mappings |
RbacRunnerGrant |
rbac_runner_grants |
Per-runner permission grants |
RbacCrossTeamGrant |
rbac_cross_team_grants |
Cross-team access grants |
IdentityProvider |
identity_providers |
Identity provider registrations |
Principal |
principals |
Authentication principals |
Identity |
identities |
Identity records tied to principals |
IdentityGroup |
identity_groups |
Identity groups |
IdentityGroupMembership |
identity_group_memberships |
Group membership records |
ApolloEntry |
apollo_entries |
Knowledge entries — PostgreSQL only (pgvector) |
ApolloRelation |
apollo_relations |
Relations between Apollo entries — PostgreSQL only |
ApolloExpertise |
apollo_expertise |
Per-agent domain expertise — PostgreSQL only |
ApolloAccessLog |
apollo_access_log |
Apollo access audit log — PostgreSQL only |
Apollo models require PostgreSQL with the pgvector extension. They are skipped silently on SQLite and MySQL.
Dependencies
| Gem | Purpose |
|---|---|
sequel (>= 5.70) |
ORM and migration framework |
sqlite3 (>= 2.0) |
SQLite adapter (default, bundled) |
csv (>= 3.2) |
CSV extraction handler |
legion-json |
JSON serialization via Legion::JSON |
legion-logging (>= 1.5.0) |
Structured logging |
legion-settings (>= 1.3.26) |
Configuration management |
mysql2 (>= 0.5.5) |
MySQL adapter (optional) |
pg (>= 1.5) |
PostgreSQL adapter (optional) |
Migrations
71 numbered Sequel DSL migrations run automatically on startup (auto_migrate: true). Key milestones:
| Range | What was added |
|---|---|
| 001–011 | Core schema: nodes, settings, extensions, runners, functions, tasks, digital workers, value metrics |
| 012 | Apollo tables (PG only: pgvector, uuid-ossp, 4 tables) |
| 013–014 | Relationships table with trigger/action FK chains |
| 015 | RBAC tables |
| 017–019 | Audit log with tamper-evident hash chain |
| 020–025 | Webhooks, archive tables, memory traces, tenant partitions |
| 026 | Function embeddings (description + vector on functions) |
| 028–030 | Agent clusters and approval queue |
| 047–048 | Apollo knowledge capture + financial logging (UAIS cost recovery, 7 tables) |
| 050 | Critical indexes across 13 tables |
| 058–067 | Audit records, chains, knowledge tiers, tool embedding cache, identity system (providers, principals, identities, groups) |
| 068–071 | Entity type on audit records, principal on nodes, approval queue resume, engine on relationships |
Run migrations standalone:
bundle exec legionio_migrate
CLI Executable
exe/legionio_migrate runs database migrations standalone, outside the full LegionIO service:
bundle exec legionio_migrate
Role in LegionIO
legion-data is optional but provides core platform persistence. It initializes during Legion::Service startup (after transport). Key responsibilities:
- Extension and function registry
- Task scheduling, logging, and relationship chains
- Node cluster membership tracking
- Persistent settings storage
- Digital worker registry (AI-as-labor platform)
- RBAC assignment tables
- Audit log with tamper-evident hash chain
- Governance event store with append-only integrity
- Apollo shared knowledge store (PostgreSQL + pgvector, used by
lex-apollo) - Local SQLite for agentic cognitive state — always on-node, independent of shared DB
- Financial logging for UAIS cost recovery
- Global tool embedding cache (L4 tier for
Legion::Tools::EmbeddingCache) - Unified identity system (providers, principals, identities, groups)
Contributing
git clone https://github.com/LegionIO/legion-data
cd legion-data
bundle install
bundle exec rspec # all tests must pass
bundle exec rubocop -A # zero offenses expected
Follow the LegionIO contribution guide. Open a PR against main.
Maintained by: Matthew Iverson (@Esity)