Class: RobotLab::DocumentStore

Inherits:
Object
  • Object
show all
Defined in:
lib/robot_lab/document_store.rb

Overview

Embedding-based document store for semantic search over arbitrary text.

Documents are embedded using fastembed (BAAI/bge-small-en-v1.5 by default) and stored in memory. Queries are embedded the same way, then compared by cosine similarity to find the closest documents.

The embedding model is initialised lazily on first use — the ONNX model file is downloaded on that first call (cached locally afterwards).

Examples:

store = RobotLab::DocumentStore.new
store.store(:q4_report, "Q4 revenue came in at $4.2M, up 18% YoY…")
store.store(:q3_report, "Q3 showed 15% growth, driven by APAC…")

results = store.search("revenue growth", limit: 2)
results.each { |r| puts "#{r[:key]} (#{r[:score].round(3)}): #{r[:text][0..60]}" }

Via Memory

memory.store_document(:readme, File.read("README.md"))
memory.search_documents("how to configure redis", limit: 3)

Constant Summary collapse

DEFAULT_MODEL =

Default embedding model used when none is specified.

"BAAI/bge-small-en-v1.5"

Instance Method Summary collapse

Constructor Details

#initialize(model_name: DEFAULT_MODEL) ⇒ DocumentStore

Returns a new instance of DocumentStore.

Parameters:

  • model_name (String) (defaults to: DEFAULT_MODEL)

    fastembed model name (default: BAAI/bge-small-en-v1.5)



33
34
35
36
37
38
# File 'lib/robot_lab/document_store.rb', line 33

def initialize(model_name: DEFAULT_MODEL)
  @model_name = model_name
  @documents  = {}  # key (Symbol) => { text: String, vector: Array<Float> }
  @mutex      = Mutex.new
  @model      = nil  # lazy: initialised on first embed call
end

Instance Method Details

#clearself

Remove all stored documents.

Returns:

  • (self)


109
110
111
112
# File 'lib/robot_lab/document_store.rb', line 109

def clear
  @mutex.synchronize { @documents.clear }
  self
end

#delete(key) ⇒ self

Remove the document stored under key.

Parameters:

  • key (Symbol, String)

Returns:

  • (self)


101
102
103
104
# File 'lib/robot_lab/document_store.rb', line 101

def delete(key)
  @mutex.synchronize { @documents.delete(key.to_sym) }
  self
end

#empty?Boolean

Whether the store contains no documents.

Returns:

  • (Boolean)


93
94
95
# File 'lib/robot_lab/document_store.rb', line 93

def empty?
  @mutex.synchronize { @documents.empty? }
end

#keysArray<Symbol>

Keys of all stored documents.

Returns:

  • (Array<Symbol>)


86
87
88
# File 'lib/robot_lab/document_store.rb', line 86

def keys
  @mutex.synchronize { @documents.keys }
end

#search(query, limit: 5) ⇒ Array<Hash>

Search for documents semantically similar to query.

Parameters:

  • query (String)

    natural-language search query

  • limit (Integer) (defaults to: 5)

    maximum number of results (default 5)

Returns:

  • (Array<Hash>)

    results sorted by score descending. Each hash contains :key, :text, and :score (Float 0.0..1.0).



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/robot_lab/document_store.rb', line 60

def search(query, limit: 5)
  return [] if empty?

  query_vec = query_vector(query)
  results   = []

  @mutex.synchronize do
    @documents.each do |key, doc|
      score = cosine_similarity(query_vec, doc[:vector])
      results << { key: key, text: doc[:text], score: score }
    end
  end

  results.sort_by { |r| -r[:score] }.first(limit)
end

#sizeInteger

Number of stored documents.

Returns:

  • (Integer)


79
80
81
# File 'lib/robot_lab/document_store.rb', line 79

def size
  @mutex.synchronize { @documents.size }
end

#store(key, text) ⇒ self

Embed text and store it under key.

If a document already exists under key it is replaced.

Parameters:

  • key (Symbol, String)

    identifier for this document

  • text (String)

    the document text to embed and store

Returns:

  • (self)


47
48
49
50
51
52
# File 'lib/robot_lab/document_store.rb', line 47

def store(key, text)
  key    = key.to_sym
  vector = passage_vector(text)
  @mutex.synchronize { @documents[key] = { text: text, vector: vector } }
  self
end