Engram

Long-term memory for AI agents in Ruby — stored in your own Postgres.

Engram lets an agent remember a user across sessions. It recalls the facts relevant to the current message and injects them into the prompt, so the model stops asking the same questions twice. No external memory-as-a-service: your memories live in your database.

Status: pre-1.0. Two things are implemented and tested: recall with prompt injection (v0.1), and extracting and consolidating memories from conversations (v0.2). The public API may still change before 1.0.

Why

LLMs are stateless. Every request starts from zero, so an assistant forgets that the user is on the Pro plan, is vegetarian, or already tried clearing the cache. The usual fixes fall short: stuffing whole transcripts into the prompt is expensive and noisy, and plain RAG retrieves documents, not personal facts. Engram is the memory layer in between.

Before and after

Without a memory layer, every session starts blank:

Day 1
  User:  I'm on the Pro plan, and please keep answers short.
  Agent: Got it.

Day 5 (new session — the model has forgotten)
  User:  Why am I being rate limited?
  Agent: Which plan are you on? Can you share more about your setup?

With engram, the facts from day 1 are recalled and added to the prompt before the model answers:

# Day 1: engram extracts and stores
#   "User is on the Pro plan", "User prefers short answers"
current_user.memory.observe(conversation)

# Day 5: engram recalls the relevant facts, then asks the model
chat = Engram.with_memory(RubyLLM.chat, memory: current_user.memory)
chat.ask("Why am I being rate limited?")

  Agent: You're on the Pro plan, which has a per-minute request cap, and you're
         hitting it. (Kept short, as you prefer.)

Installation

# Gemfile
gem "engram"

The core has zero runtime dependencies. Optional adapters need:

Engram::Adapters::PgvectorStore → neighbor + ActiveRecord + Postgres/pgvector
Engram::Adapters::RubyLLMEmbedder → ruby_llm

Quick start (plain Ruby)

require "engram"

memory = Engram::Memory.new(scope: "user:42")  # zero-config: in-memory + null embedder

memory.add("Subscription tier is Pro")
memory.add("Prefers concise answers")

memory.recall("why am I being rate limited?")
# => [#<Engram::Record content="Subscription tier is Pro" ...>]

Rails

bin/rails generate engram:install   # migration + initializer + model
bin/rails db:migrate

class User < ApplicationRecord
  has_memory      # scope defaults to "user:<id>"
end

current_user.memory.add("Works at Acme Corp")
current_user.memory.recall("where does the user work?")

RubyLLM integration

chat = Engram.with_memory(RubyLLM.chat, memory: current_user.memory)
chat.ask("why am I being rate limited?")
# recall + inject happen automatically before the model sees the message

Automatic memory (v0.2)

Instead of adding facts by hand, let engram derive them from a conversation turn. It extracts candidate facts, then consolidates them against what's already known — add / update / forget / noop.

Engram.configure do |config|
  config.completion = Engram::Adapters::RubyLLMCompletion.new
  config.consolidator = :llm   # or :heuristic for deterministic, no-LLM dedup
end

memory = current_user.memory
memory.observe([
  {role: "user", content: "I switched from the Free plan to Pro"}
])
# extracts "User is on the Pro plan", and if a "Free plan" memory exists, updates it

In Rails, run it off the request path: current_user.memory.observe_later(messages).

Tuning and maintenance (v0.3)

Observation is idempotent per turn: observing the same messages twice does nothing the second time, so retries do not create duplicate memories or repeat LLM calls. In Rails, use a persistent store so this also holds across job retries and processes:

Engram.configure do |c|
  c.processed_turns = Engram::Rails::CacheProcessedTurns.new
end

Recall is plain similarity search by default. You can blend in importance and recency:

Engram.configure do |c|
  c.importance_weight = 0.3
  c.recency_weight = 0.2
  c.touch_on_recall = true   # update last_accessed_at when a memory is recalled
end

Prune memories you no longer need:

# Forget memories untouched for 90 days, but keep anything important
current_user.memory.forget_stale(older_than: 90 * 24 * 60 * 60, min_importance: 0.7)

How it works

A loop around your LLM calls. Before a call: recall relevant memories and inject them. After a turn (v0.2): extract new facts, consolidate them, and persist. The store (Postgres + pgvector) is the only thing that persists between sessions.

Architecture

Ports-and-adapters. A pure-Ruby core depends on MemoryStore and Embedder ports; pgvector, RubyLLM, and Rails are swappable adapters. This keeps the domain fast to test (in-memory + null adapters, no DB or API keys) and lets the v0.2 Extractor/Consolidator slot in without rework.

Development

bundle install
bundle exec rspec          # unit suite (no DB, no network)
bundle exec standardrb     # lint
bundle exec rake eval      # recall quality harness (precision@k)

Integration tests exercise the real Postgres + pgvector adapter (tagged :integration, skipped by default):

DATABASE_URL=postgres://postgres:postgres@localhost:5432/engram_test \
  bundle exec rspec --tag integration

For honest recall numbers, run the eval with a real embedder instead of the test stub. ruby_llm is not a dependency, so install it separately first:

gem install ruby_llm
ENGRAM_EMBEDDER=ruby_llm OPENAI_API_KEY=... ruby eval/run.rb

On the bundled fixture set, recall@3 is 100% (4/4) with OpenAI's text-embedding-3-small, and the consolidation dedup checks pass. The fixture is deliberately small. Treat it as a retrieval smoke test, not a benchmark.

Roadmap

v0.1 (done): recall + inject foundation, adapters, Rails + RubyLLM integration.
v0.2 (done): extract and consolidate (ADD / UPDATE / FORGET), background jobs.
v0.3 (done): idempotent observation, importance/recency recall, forgetting and decay.
later: memory types per policy, additional storage backends, larger eval benchmarks.

License

MIT. See LICENSE.txt.