Class: RobotLab::DocumentStore
- Inherits:
-
Object
- Object
- RobotLab::DocumentStore
- Defined in:
- lib/robot_lab/document_store.rb
Overview
Embedding-based document store for semantic search over arbitrary text.
Documents are embedded using fastembed (BAAI/bge-small-en-v1.5 by default) and stored in memory. Queries are embedded the same way, then compared by cosine similarity to find the closest documents.
The embedding model is initialised lazily on first use — the ONNX model file is downloaded on that first call (cached locally afterwards).
Constant Summary collapse
- DEFAULT_MODEL =
Default embedding model used when none is specified.
"BAAI/bge-small-en-v1.5"
Instance Method Summary collapse
-
#clear ⇒ self
Remove all stored documents.
-
#delete(key) ⇒ self
Remove the document stored under
key. -
#empty? ⇒ Boolean
Whether the store contains no documents.
-
#initialize(model_name: DEFAULT_MODEL) ⇒ DocumentStore
constructor
A new instance of DocumentStore.
-
#keys ⇒ Array<Symbol>
Keys of all stored documents.
-
#search(query, limit: 5) ⇒ Array<Hash>
Search for documents semantically similar to
query. -
#size ⇒ Integer
Number of stored documents.
-
#store(key, text) ⇒ self
Embed
textand store it underkey.
Constructor Details
#initialize(model_name: DEFAULT_MODEL) ⇒ DocumentStore
Returns a new instance of DocumentStore.
33 34 35 36 37 38 |
# File 'lib/robot_lab/document_store.rb', line 33 def initialize(model_name: DEFAULT_MODEL) @model_name = model_name @documents = {} # key (Symbol) => { text: String, vector: Array<Float> } @mutex = Mutex.new @model = nil # lazy: initialised on first embed call end |
Instance Method Details
#clear ⇒ self
Remove all stored documents.
109 110 111 112 |
# File 'lib/robot_lab/document_store.rb', line 109 def clear @mutex.synchronize { @documents.clear } self end |
#delete(key) ⇒ self
Remove the document stored under key.
101 102 103 104 |
# File 'lib/robot_lab/document_store.rb', line 101 def delete(key) @mutex.synchronize { @documents.delete(key.to_sym) } self end |
#empty? ⇒ Boolean
Whether the store contains no documents.
93 94 95 |
# File 'lib/robot_lab/document_store.rb', line 93 def empty? @mutex.synchronize { @documents.empty? } end |
#keys ⇒ Array<Symbol>
Keys of all stored documents.
86 87 88 |
# File 'lib/robot_lab/document_store.rb', line 86 def keys @mutex.synchronize { @documents.keys } end |
#search(query, limit: 5) ⇒ Array<Hash>
Search for documents semantically similar to query.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/robot_lab/document_store.rb', line 60 def search(query, limit: 5) return [] if empty? query_vec = query_vector(query) results = [] @mutex.synchronize do @documents.each do |key, doc| score = cosine_similarity(query_vec, doc[:vector]) results << { key: key, text: doc[:text], score: score } end end results.sort_by { |r| -r[:score] }.first(limit) end |
#size ⇒ Integer
Number of stored documents.
79 80 81 |
# File 'lib/robot_lab/document_store.rb', line 79 def size @mutex.synchronize { @documents.size } end |
#store(key, text) ⇒ self
Embed text and store it under key.
If a document already exists under key it is replaced.
47 48 49 50 51 52 |
# File 'lib/robot_lab/document_store.rb', line 47 def store(key, text) key = key.to_sym vector = passage_vector(text) @mutex.synchronize { @documents[key] = { text: text, vector: vector } } self end |