Class: Engram::Extractors::LLMExtractor

Inherits:
Object
  • Object
show all
Includes:
Ports::Extractor
Defined in:
lib/engram/extractors/llm_extractor.rb

Overview

Derives durable, user-specific facts from a conversation turn via an LLM.

Constant Summary collapse

SYSTEM =
<<~PROMPT
  You extract durable, user-specific facts worth remembering across future sessions.
  Rules:
  - Only stable facts about the user (preferences, attributes, decisions, history).
  - Ignore ephemeral chit-chat, questions, and the assistant's own messages.
  - Normalize each fact to a terse third-person statement (e.g. "User is on the Pro plan").
  - Classify kind as fact, preference, instruction, or episodic.
  - Do not extract secrets, API keys, passwords, tokens, or transient task progress.
  - Set confidence in [0,1]; importance in [0,1].
  Return an empty list if there is nothing worth remembering.
PROMPT
SCHEMA =

Shaped for OpenAI strict structured outputs: every object sets additionalProperties: false and lists all of its properties in ‘required`. The extractor still defends against missing/empty fields, so requiring them here only constrains the model’s output, it does not change downstream behaviour.

{
  type: "object",
  additionalProperties: false,
  properties: {
    facts: {
      type: "array",
      items: {
        type: "object",
        additionalProperties: false,
        properties: {
          content: {type: "string"},
          kind: {type: "string", enum: %w[fact preference instruction episodic semantic]},
          importance: {type: "number"},
          confidence: {type: "number"}
        },
        required: %w[content kind importance confidence]
      }
    }
  },
  required: %w[facts]
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(completion:, embedder:, min_confidence: 0.5) ⇒ LLMExtractor

Returns a new instance of LLMExtractor.



47
48
49
50
51
# File 'lib/engram/extractors/llm_extractor.rb', line 47

def initialize(completion:, embedder:, min_confidence: 0.5)
  @completion = completion
  @embedder = embedder
  @min_confidence = min_confidence
end

Instance Method Details

#extract(messages:, scope:) ⇒ Object



53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/engram/extractors/llm_extractor.rb', line 53

def extract(messages:, scope:)
  result = @completion.complete(system: SYSTEM, user: transcript(messages), schema: SCHEMA)
  facts(result).filter_map do |fact|
    fact = fact.transform_keys(&:to_s)
    content = fact["content"].to_s.strip
    next if content.empty?
    next if (fact["confidence"] || 1.0).to_f < @min_confidence

    Engram::Record.new(
      content: content,
      scope: scope,
      kind: fact["kind"] || "fact",
      importance: (fact["importance"] || 1.0).to_f,
      metadata: {confidence: (fact["confidence"] || 1.0).to_f},
      embedding: @embedder.embed(content)
    )
  end
end