Class: Documentrix::Documents::Cache::SQLiteCache
- Inherits:
-
Object
- Object
- Documentrix::Documents::Cache::SQLiteCache
- Includes:
- Common
- Defined in:
- lib/documentrix/documents/cache/sqlite_cache.rb
Overview
SQLiteCache is a cache implementation that uses SQLite database for storing document embeddings and related metadata.
This class provides a persistent cache storage solution for document embeddings, leveraging SQLite's capabilities to store both the embedding vectors and associated text data, tags, and source information. It supports efficient vector similarity searches using the sqlite_vec extension for fast nearest neighbor queries.
Instance Attribute Summary collapse
-
#embedding_length ⇒ Object
readonly
length of the embeddings vector.
-
#filename ⇒ Object
readonly
filename for the database,
:memory:is in memory.
Attributes included from Common
Instance Method Summary collapse
- #[](key) ⇒ Documentrix::Documents::Record, NilClass
-
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
-
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix
prefixfrom the cache by executing a SQL query. -
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
-
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting records that match those tags and have the prefix
prefix. -
#collections(prefix) ⇒ Array<Symbol>
Returns an array of collection names that match the given prefix.
-
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
-
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
-
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
-
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
-
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
-
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
-
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
-
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) ⇒ void
constructor
The initialize method sets up the cache by calling super and setting various instance variables.
-
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
-
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
-
#size ⇒ Integer
The size method returns the total number of records stored in the cache, that is the ones with prefix
prefix. -
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache.
-
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
Methods included from Common
Methods included from Utils::Math
Constructor Details
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) ⇒ void
The initialize method sets up the cache by calling super and setting various instance variables.
27 28 29 30 31 32 33 34 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 27 def initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false, busy_timeout: 5000) super(prefix:) @embedding_length = @filename = filename @debug = debug @busy_timeout = busy_timeout setup_database(filename) end |
Instance Attribute Details
#embedding_length ⇒ Object (readonly)
length of the embeddings vector
38 39 40 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 38 def @embedding_length end |
#filename ⇒ Object (readonly)
filename for the database, :memory: is in memory
36 37 38 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 36 def filename @filename end |
Instance Method Details
#[](key) ⇒ Documentrix::Documents::Record, NilClass
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 47 def [](key) result = execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key = ? }, pre(key) )&.first or return key, text, norm, source, digest, , = *result = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) end |
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 70 def []=(key, value) value = convert_value_to_record(value) digest = compute_file_digest(value.source) = value..pack("f*") execute(%{BEGIN IMMEDIATE}) execute(%{INSERT INTO embeddings(embedding) VALUES(?)}, [ ]) , = execute(%{ SELECT last_insert_rowid() }).flatten execute(%{ INSERT INTO records(key,text,embedding_id,norm,source,digest,tags) VALUES(?,?,?,?,?,?,?) }, [ pre(key), value.text, , value.norm, value.source, digest, JSON(value.) ]) execute(%{COMMIT}) end |
#clear_all_with_prefix ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear_all_with_prefix method deletes all records for prefix prefix
from the cache by executing a SQL query.
177 178 179 180 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 177 def clear_all_with_prefix execute(%{DELETE FROM records WHERE key LIKE ?}, [ start_with_prefix ]) self end |
#clear_by_source(source, digest: nil, operator: ?=) ⇒ self
Removes all records associated with the specified source from the cache.
If a digest is provided, the method will only remove records that do NOT match this digest. This allows for updating a source by wiping old versions while preserving records that are already up-to-date.
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 195 def clear_by_source(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ) else execute( %{ DELETE FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ) end self end |
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting
records that match those tags and have the prefix prefix.
161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 161 def ( = nil) = Documentrix::Utils::Tags.new().to_a if .present? records = () keys = '(%s)' % records.transpose.first.map { "'%s'" % quote(_1) }.join(?,) execute(%{DELETE FROM records WHERE key IN #{keys}}) else clear_all_with_prefix end self end |
#collections(prefix) ⇒ Array<Symbol>
Returns an array of collection names that match the given prefix. This is a high-performance override for SQLite that only queries keys.
116 117 118 119 120 121 122 123 124 125 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 116 def collections(prefix) execute( %{ SELECT DISTINCT key FROM records WHERE key LIKE ? }, [ "#{prefix}%" ] ).flatten.each_with_object(Set.new) do |key, set| if key =~ /\A#{prefix}(.+)-/ set << $1.to_sym end end.to_a end |
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
340 341 342 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 340 def convert_to_vector(vector) vector end |
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
102 103 104 105 106 107 108 109 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 102 def delete(key) result = key?(key) execute( %{ DELETE FROM records WHERE records.key = ? }, pre(key) ) result end |
#each(prefix: start_with_prefix) {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 304 def each(prefix: start_with_prefix, &block) block or return enum_for(__method__, prefix:) execute(%{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key LIKE ? }, [ prefix ]).each do |key, text, norm, source, digest, , | = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) value = convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:) block.(key, value) end self end |
#each_source {|source| ... } ⇒ Enumerator
Yields each unique, full source present in the cache records.
This is a high-performance override for SQLite that avoids loading embeddings and parsing JSON for every record.
255 256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 255 def each_source(&block) block or return enum_for(__method__) execute(%{ SELECT DISTINCT source FROM records WHERE key LIKE ? AND source IS NOT NULL }, [ start_with_prefix ]).each do |source,| source = source.full? or next block.(source) end nil end |
#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 386 def find_records(needle, tags: nil, max_records: nil, min_similarity: -1) needle.size != @embedding_length and raise ArgumentError, "needle embedding length != %s" % @embedding_length needle_binary = needle.pack("f*") max_records = [ max_records, size, 4_096 ].compact.min records = () rowids_where = '(%s)' % records.transpose.last&.join(?,) execute( %{ SELECT records.key, records.text, records.norm, records.source, records.digest, records.tags, embeddings.embedding, 1 - vec_distance_cosine(?, vec_f32(embeddings.embedding)) AS similarity FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE embeddings.rowid IN #{rowids_where} AND embeddings.embedding MATCH ? AND similarity >= ? AND embeddings.k = ? ORDER BY similarity DESC }, [ needle_binary, needle_binary, min_similarity, max_records ] ).map do |key, text, norm, source, digest, , , similarity| key = unpre(key) = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, digest:, tags:, embedding:, similarity:) end end |
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 349 def () if .present? = Documentrix::Utils::Tags.new().to_a unless .empty? = ' AND (%s)' % .map { 'tags LIKE "%%%s%%"' % quote(_1) }.join(' OR ') end end records = execute(%{ SELECT key, tags, embedding_id FROM records WHERE key LIKE ?#{} }, [ start_with_prefix ]) if records = records.select { |key, , | ( & JSON(.to_s).to_a).size >= 1 } end records end |
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
328 329 330 331 332 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 328 def full_each(&block) block or return enum_for(__method__) each(prefix: ?%, &block) end |
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
90 91 92 93 94 95 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 90 def key?(key) execute( %{ SELECT count(records.key) FROM records WHERE records.key = ? }, pre(key) ).flatten.first == 1 end |
#move_prefix(old_prefix, new_prefix) ⇒ Documentrix::Documents::Cache::SQLiteCache
Move a key prefix in the cache.
This operation updates every record whose key starts with +old_prefix+,
rewriting the prefix to +new_prefix+. It uses SQLite’s built‑in replace()
string function, which means the change is atomic and performed entirely
inside the database engine—no Ruby‑side iteration or temporary data
structures are needed.
283 284 285 286 287 288 289 290 291 292 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 283 def move_prefix(old_prefix, new_prefix) execute( %{ UPDATE records SET key = replace(key, '#{quote(old_prefix)}', '#{quote(new_prefix)}') WHERE key LIKE ? }, old_prefix + '%' ) end |
#size ⇒ Integer
The size method returns the total number of records stored in the cache,
that is the ones with prefix prefix.
146 147 148 149 150 151 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 146 def size execute( %{SELECT COUNT(*) FROM records WHERE key LIKE ?}, [ start_with_prefix ] ).flatten.first end |
#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean
The source_exist? method checks if any records associated with the given source exist in the cache. If a digest is provided, it verifies if the source exists and matches the specified digest using the provided operator.
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 229 def source_exist?(source, digest: nil, operator: ?=) operator = '!=' if operator != ?= if digest !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? AND digest #{operator} ? }, [ start_with_prefix, source, digest ] ).first else !!execute( %{ SELECT 1 FROM records WHERE key LIKE ? AND source = ? }, [ start_with_prefix, source ] ).first end end |
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
131 132 133 134 135 136 137 138 139 140 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 131 def result = Documentrix::Utils::Tags.new execute(%{ SELECT DISTINCT(tags) FROM records WHERE key LIKE ? }, [ start_with_prefix ] ).flatten.each do JSON(_1).each { |t| result.add(t) } end result end |