Class: Documentrix::Documents::Cache::SQLiteCache
- Inherits:
-
Object
- Object
- Documentrix::Documents::Cache::SQLiteCache
- Includes:
- Common
- Defined in:
- lib/documentrix/documents/cache/sqlite_cache.rb
Overview
SQLiteCache is a cache implementation that uses SQLite database for storing document embeddings and related metadata.
This class provides a persistent cache storage solution for document embeddings, leveraging SQLite's capabilities to store both the embedding vectors and associated text data, tags, and source information. It supports efficient vector similarity searches using the sqlite_vec extension for fast nearest neighbor queries.
Instance Attribute Summary collapse
-
#embedding_length ⇒ Object
readonly
length of the embeddings vector.
-
#filename ⇒ Object
readonly
filename for the database,
:memory:is in memory.
Attributes included from Common
Instance Method Summary collapse
- #[](key) ⇒ Documentrix::Documents::Record, NilClass
-
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
-
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix
prefixfrom the cache by executing a SQL query. -
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
-
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting records that match those tags and have the prefix
prefix. -
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
-
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
-
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
-
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
-
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
-
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
-
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
-
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) ⇒ void
constructor
The initialize method sets up the cache by calling super and setting various instance variables.
-
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
-
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
-
#size ⇒ Integer
The size method returns the total number of records stored in the cache, that is the ones with prefix
prefix. -
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache.
-
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
Methods included from Common
#clear, #collections, #pre, #unpre
Methods included from Utils::Math
Constructor Details
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) ⇒ void
The initialize method sets up the cache by calling super and setting various instance variables.
27 28 29 30 31 32 33 34 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 27 def initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) super(prefix:) @embedding_length = @filename = filename @debug = debug @busy_timeout = busy_timeout setup_database(filename) end |
Instance Attribute Details
#embedding_length ⇒ Object (readonly)
length of the embeddings vector
38 39 40 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 38 def @embedding_length end |
#filename ⇒ Object (readonly)
filename for the database, :memory: is in memory
36 37 38 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 36 def filename @filename end |
Instance Method Details
#[](key) ⇒ Documentrix::Documents::Record, NilClass
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 47 def [](key) result = execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key = ? }, pre(key) )&.first or return key, text, norm, source, digest, , = *result = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) end |
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 70 def []=(key, value) value = convert_value_to_record(value) digest = compute_file_digest(value.source) = value..pack("f*") execute(%{BEGIN IMMEDIATE}) execute(%{INSERT INTO embeddings(embedding) VALUES(?)}, [ ]) , = execute(%{ SELECT last_insert_rowid() }).flatten execute(%{ INSERT INTO records(key,text,embedding_id,norm,source,digest,tags) VALUES(?,?,?,?,?,?,?) }, [ pre(key), value.text, , value.norm, value.source, digest, JSON(value.) ]) execute(%{COMMIT}) end |
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix prefix
from the cache by executing a SQL query.
161 162 163 164 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 161 def clear_all_with_prefix execute(%{DELETE FROM records WHERE key LIKE ?}, [ start_with_prefix ]) self end |
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
If a digest is provided, the method will only remove records that do NOT match this digest. This allows for updating a source by wiping old versions while preserving records that are already up-to-date.
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 179 def clear_by_source(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ) else execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ) end self end |
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting
records that match those tags and have the prefix prefix.
145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 145 def ( = nil) = Documentrix::Utils::Tags.new().to_a if .present? records = () keys = '(%s)' % records.transpose.first.map { "'%s'" % quote(_1) }.join(?,) execute(%{DELETE FROM records WHERE key IN #{keys}}) else clear_all_with_prefix end self end |
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
324 325 326 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 324 def convert_to_vector(vector) vector end |
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
102 103 104 105 106 107 108 109 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 102 def delete(key) result = key?(key) execute( %{ DELETE FROM records WHERE records.key = ? }, pre(key) ) result end |
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 288 def each(prefix: start_with_prefix, &block) block or return enum_for(__method__, prefix:) execute(%{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key LIKE ? }, [ prefix ]).each do |key, text, norm, source, digest, , | = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) value = convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) block.(key, value) end self end |
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
This is a high-performance override for SQLite that avoids loading embeddings and parsing JSON for every record.
239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 239 def each_source(&block) block or return enum_for(__method__) execute(%{ SELECT DISTINCT source FROM records WHERE key LIKE ? AND source IS NOT NULL }, [ start_with_prefix ]).each do |source,| source = source.full? or next block.(source) end nil end |
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 370 def find_records(needle, tags: nil, max_records: nil, min_similarity: -1) needle.size != @embedding_length and raise ArgumentError, "needle embedding length != %s" % @embedding_length needle_binary = needle.pack("f*") max_records = [ max_records, size, 4_096 ].compact.min records = () rowids_where = '(%s)' % records.transpose.last&.join(?,) execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding, 1 - vec_distance_cosine(?, vec_f32(embeddings.embedding)) AS similarity FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE embeddings.rowid IN #{rowids_where} AND embeddings.embedding MATCH ? AND similarity >= ? AND embeddings.k = ? ORDER BY similarity DESC }, [ needle_binary, needle_binary, min_similarity, max_records ] ).map do |key, text, norm, source, digest, , , similarity| key = unpre(key) = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:, similarity:) end end |
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 333 def () if .present? = Documentrix::Utils::Tags.new().to_a unless .empty? = ' AND (%s)' % .map { 'tags LIKE "%%%s%%"' % quote(_1) }.join(' OR ') end end records = execute(%{ SELECT key, tags, embedding_id FROM records WHERE key LIKE ?#{} }, [ start_with_prefix ]) if records = records.select { |key, , | ( & JSON(.to_s).to_a).size >= 1 } end records end |
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
312 313 314 315 316 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 312 def full_each(&block) block or return enum_for(__method__) each(prefix: ?%, &block) end |
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
90 91 92 93 94 95 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 90 def key?(key) execute( %{ SELECT count(records.key) FROM records WHERE records.key = ? }, pre(key) ).flatten.first == 1 end |
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
This operation updates every record whose key starts with +old_prefix+,
rewriting the prefix to +new_prefix+. It uses SQLite’s built‑in replace()
string function, which means the change is atomic and performed entirely
inside the database engine—no Ruby‑side iteration or temporary data
structures are needed.
267 268 269 270 271 272 273 274 275 276 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 267 def move_prefix(old_prefix, new_prefix) execute( %{ UPDATE records SET key = replace(key, '#{quote(old_prefix)}', '#{quote(new_prefix)}') WHERE key LIKE ? }, old_prefix + '%' ) end |
#size ⇒ Integer
The size method returns the total number of records stored in the cache,
that is the ones with prefix prefix.
130 131 132 133 134 135 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 130 def size execute( %{SELECT COUNT(*) FROM records WHERE key LIKE ?}, [ start_with_prefix ] ).flatten.first end |
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache. If a digest is provided, it verifies if the source exists and matches the specified digest using the provided operator.
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 213 def source_exist?(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ).first else !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ).first end end |
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
115 116 117 118 119 120 121 122 123 124 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 115 def result = Documentrix::Utils::Tags.new execute(%{ SELECT DISTINCT(tags) FROM records WHERE key LIKE ? }, [ start_with_prefix ] ).flatten.each do JSON(_1).each { |t| result.add(t) } end result end |