Class: Documentrix::Documents::Cache::SQLiteCache
- Inherits:
-
Object
- Object
- Documentrix::Documents::Cache::SQLiteCache
- Includes:
- Common
- Defined in:
- lib/documentrix/documents/cache/sqlite_cache.rb
Overview
SQLiteCache is a cache implementation that uses SQLite database for storing document embeddings and related metadata.
This class provides a persistent cache storage solution for document embeddings, leveraging SQLite's capabilities to store both the embedding vectors and associated text data, tags, and source information. It supports efficient vector similarity searches using the sqlite_vec extension for fast nearest neighbor queries.
Instance Attribute Summary collapse
-
#embedding_length ⇒ Object
readonly
length of the embeddings vector.
-
#filename ⇒ Object
readonly
filename for the database,
:memory:is in memory.
Attributes included from Common
Instance Method Summary collapse
- #[](key) ⇒ Documentrix::Documents::Record, NilClass
-
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
-
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix
prefixfrom the cache by executing a SQL query. -
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
-
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting records that match those tags and have the prefix
prefix. -
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
-
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
-
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
-
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
-
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
-
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
-
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
-
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) ⇒ void
constructor
The initialize method sets up the cache by calling super and setting various instance variables.
-
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
-
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
-
#size ⇒ Integer
The size method returns the total number of records stored in the cache, that is the ones with prefix
prefix. -
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache.
-
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
Methods included from Common
#clear, #collections, #pre, #unpre
Methods included from Utils::Math
Constructor Details
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) ⇒ void
The initialize method sets up the cache by calling super and setting various instance variables.
26 27 28 29 30 31 32 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 26 def initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) super(prefix:) @embedding_length = @filename = filename @debug = debug setup_database(filename) end |
Instance Attribute Details
#embedding_length ⇒ Object (readonly)
length of the embeddings vector
36 37 38 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 36 def @embedding_length end |
#filename ⇒ Object (readonly)
filename for the database, :memory: is in memory
34 35 36 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 34 def filename @filename end |
Instance Method Details
#[](key) ⇒ Documentrix::Documents::Record, NilClass
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 45 def [](key) result = execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key = ? }, pre(key) )&.first or return key, text, norm, source, digest, , = *result = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) end |
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 68 def []=(key, value) value = convert_value_to_record(value) digest = compute_file_digest(value.source) = value..pack("f*") execute(%{BEGIN}) execute(%{INSERT INTO embeddings(embedding) VALUES(?)}, [ ]) , = execute(%{ SELECT last_insert_rowid() }).flatten execute(%{ INSERT INTO records(key,text,embedding_id,norm,source,digest,tags) VALUES(?,?,?,?,?,?,?) }, [ pre(key), value.text, , value.norm, value.source, digest, JSON(value.) ]) execute(%{COMMIT}) end |
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix prefix
from the cache by executing a SQL query.
159 160 161 162 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 159 def clear_all_with_prefix execute(%{DELETE FROM records WHERE key LIKE ?}, [ start_with_prefix ]) self end |
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
If a digest is provided, the method will only remove records that do NOT match this digest. This allows for updating a source by wiping old versions while preserving records that are already up-to-date.
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 175 def clear_by_source(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ) else execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ) end self end |
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting
records that match those tags and have the prefix prefix.
143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 143 def ( = nil) = Documentrix::Utils::Tags.new().to_a if .present? records = () keys = '(%s)' % records.transpose.first.map { "'%s'" % quote(_1) }.join(?,) execute(%{DELETE FROM records WHERE key IN #{keys}}) else clear_all_with_prefix end self end |
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
320 321 322 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 320 def convert_to_vector(vector) vector end |
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
100 101 102 103 104 105 106 107 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 100 def delete(key) result = key?(key) execute( %{ DELETE FROM records WHERE records.key = ? }, pre(key) ) result end |
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 284 def each(prefix: start_with_prefix, &block) block or return enum_for(__method__, prefix:) execute(%{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key LIKE ? }, [ prefix ]).each do |key, text, norm, source, digest, , | = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) value = convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) block.(key, value) end self end |
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
This is a high-performance override for SQLite that avoids loading embeddings and parsing JSON for every record.
235 236 237 238 239 240 241 242 243 244 245 246 247 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 235 def each_source(&block) block or return enum_for(__method__) execute(%{ SELECT DISTINCT source FROM records WHERE key LIKE ? AND source IS NOT NULL }, [ start_with_prefix ]).each do |source,| source = source.full? or next block.(source) end nil end |
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 366 def find_records(needle, tags: nil, max_records: nil, min_similarity: -1) needle.size != @embedding_length and raise ArgumentError, "needle embedding length != %s" % @embedding_length needle_binary = needle.pack("f*") max_records = [ max_records, size, 4_096 ].compact.min records = () rowids_where = '(%s)' % records.transpose.last&.join(?,) execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding, 1 - vec_distance_cosine(?, vec_f32(embeddings.embedding)) AS similarity FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE embeddings.rowid IN #{rowids_where} AND embeddings.embedding MATCH ? AND similarity >= ? AND embeddings.k = ? ORDER BY similarity DESC }, [ needle_binary, needle_binary, min_similarity, max_records ] ).map do |key, text, norm, source, digest, , , similarity| key = unpre(key) = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:, similarity:) end end |
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 329 def () if .present? = Documentrix::Utils::Tags.new().to_a unless .empty? = ' AND (%s)' % .map { 'tags LIKE "%%%s%%"' % quote(_1) }.join(' OR ') end end records = execute(%{ SELECT key, tags, embedding_id FROM records WHERE key LIKE ?#{} }, [ start_with_prefix ]) if records = records.select { |key, , | ( & JSON(.to_s).to_a).size >= 1 } end records end |
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
308 309 310 311 312 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 308 def full_each(&block) block or return enum_for(__method__) each(prefix: ?%, &block) end |
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
88 89 90 91 92 93 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 88 def key?(key) execute( %{ SELECT count(records.key) FROM records WHERE records.key = ? }, pre(key) ).flatten.first == 1 end |
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
This operation updates every record whose key starts with +old_prefix+,
rewriting the prefix to +new_prefix+. It uses SQLite’s built‑in replace()
string function, which means the change is atomic and performed entirely
inside the database engine—no Ruby‑side iteration or temporary data
structures are needed.
263 264 265 266 267 268 269 270 271 272 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 263 def move_prefix(old_prefix, new_prefix) execute( %{ UPDATE records SET key = replace(key, '#{quote(old_prefix)}', '#{quote(new_prefix)}') WHERE key LIKE ? }, old_prefix + '%' ) end |
#size ⇒ Integer
The size method returns the total number of records stored in the cache,
that is the ones with prefix prefix.
128 129 130 131 132 133 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 128 def size execute( %{SELECT COUNT(*) FROM records WHERE key LIKE ?}, [ start_with_prefix ] ).flatten.first end |
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache. If a digest is provided, it verifies if the source exists and matches the specified digest using the provided operator.
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 209 def source_exist?(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ).first else !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ).first end end |
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
113 114 115 116 117 118 119 120 121 122 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 113 def result = Documentrix::Utils::Tags.new execute(%{ SELECT DISTINCT(tags) FROM records WHERE key LIKE ? }, [ start_with_prefix ] ).flatten.each do JSON(_1).each { |t| result.add(t) } end result end |